Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16031 )
Change subject: KUDU-1802: Avoid call to master when deserializing scan tokens ...................................................................... Patch Set 3: > Are we going to hit RPC size limit or task description limit issues? Today in Spark we use 1 scan token per task. So this shouldn't result in multiples being sent. It moves any potential large schema throughput "problem" from the `master -> task` to the `driver /coordinator-> task`. Do you know if Impala behaves differently then Spark? > An alternate to consider is just the ability to ask the client to serialize a > "table metadata" token, and broadcast that across the tasks (eg in a spark > job) separately from the per-task tokens. I had considered this, and plan to potentially leverage something like this for jobs that write data to Kudu (KUDU-3135). The benefit of this approach is that it works today with no changes to applications/frameworks already using scan tokens. Given there is a 1-1 relationship between tasks and scan tokens in spark, it felt more natural to include the schema in the token. -- To view, visit http://gerrit.cloudera.org:8080/16031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c1b8392de37dd5e8b7bd8b78a21603ff8b1d1b Gerrit-Change-Number: 16031 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Fri, 05 Jun 2020 23:16:05 +0000 Gerrit-HasComments: No
