Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16031 )

Change subject: KUDU-1802: Avoid call to master when deserializing scan tokens
......................................................................


Patch Set 3:

> Are we going to hit RPC size limit or task description limit issues?

Today in Spark we use 1 scan token per task. So this shouldn't result in 
multiples being sent. It moves any potential large schema throughput "problem" 
from the `master -> task` to the `driver /coordinator-> task`. Do you know if 
Impala behaves differently then Spark?

> An alternate to consider is just the ability to ask the client to serialize a 
> "table metadata" token, and broadcast that across the tasks (eg in a spark 
> job) separately from the per-task tokens.

I had considered this, and plan to potentially leverage something like this for 
jobs that write data to Kudu (KUDU-3135). The benefit of this approach is that 
it works today with no changes to applications/frameworks already using scan 
tokens. Given there is a 1-1 relationship between tasks and scan tokens in 
spark, it felt more natural to include the schema in the token.


--
To view, visit http://gerrit.cloudera.org:8080/16031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c1b8392de37dd5e8b7bd8b78a21603ff8b1d1b
Gerrit-Change-Number: 16031
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Fri, 05 Jun 2020 23:16:05 +0000
Gerrit-HasComments: No

Reply via email to