Hi Todd, Thank you for the question pointed out, we will modify the patch according to your design.
Yao 2018-05-18 1:13 GMT+08:00 Todd Lipcon (Code Review) <ger...@cloudera.org>: > Todd Lipcon *posted comments* on this change. > > View Change <http://gerrit.cloudera.org:8080/10406> > > Patch set 4: > > Hi Xu. Thanks for working on this! It will be really great to decouple the > available parallelism from the number of partitions. > > I have a couple concerns about the design in this patch, though. > > 1) periodic reporting of the bounds to the master: for our eventual > scalability goals of thousands of nodes, each with thousands of tablets, I > think this may be too much information to try to keep in memory on the > master and to report on a periodic basis. Especially considering that many > tablets may be cold (bounds not changing), it's a lot of redundant > processing. Additionally, with it kept only in memory (not persisted) this > means that shortly following a restart or failover, the master will not > have the chunk bounds populated and the query plans submitted by users may > change dramatically. > > 2) including the bounds in every GetTabletLocations call is also somewhat > expensive from a size/performance perspective. In the case of writes or > scans, they aren't needed. It would be better to avoid the expense for > these common cases and only worry about the bounds for the specific use > case you are targeting (generating splits for tasks) > > > 3) from a security perspective, I'm not sure it's safe to include the > bounds information in the 'getTabletLocations' RPC response. When we start > adding more fine-grained authorization it's likely that we will want to > provide location information to administrators/tools (eg to facilitate > balancing or other ops workflows) but those administrators may not have > access to read the underlying data. The bounds themselves can likely expose > user data (eg imagine that the primary key is an email address or > government ID number that should not be exposed) > > > I'd like to consider an alternate design in which we add a new RPC to the > tablet server itself, something like: > > message SplitKeyRangeRequest { > bytes tablet_id > bytes start_pk > bytes end_pk > > // Number of bytes to try to return in each chunk. This is a hint. > // The tablet server may return chunks larger or smaller than this value. > int64 target_chunk_size_bytes > > // The columns to consider when chunking. > // If specified, then the size estimate used for 'target_chunk_size_bytes' > // should only include these columns. This can be used if a query will > // only scan a certain subset of the columns. > repeated int32 col_ids > } > > (perhaps the 'col_ids' feature could be left out initially) > > message SplitKeyRangeResponse { > repeated bytes split_pks > repeated int64 size_estimates > } > > The advantages I see for this design are: > > - no need to centralize the split metadata (so we can be more > fine-grained without worrying about master load or memory) > - for the purposes of locality scheduling, we only need the partition > locations, so the "planner" (spark driver or impala front-end) doesn't care > about the sub-partition splits. The individual tasks can then contact their > local tserver to further sub-divide the range > - on-demand computation of the splits means we will be able to do a > better job of lazily opening cold tablets later. > > What do you think? > > > To view, visit change 10406 <http://gerrit.cloudera.org:8080/10406>. To > unsubscribe, visit settings <http://gerrit.cloudera.org:8080/settings>. > Gerrit-Project: kudu > Gerrit-Branch: master > Gerrit-MessageType: comment > Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a > Gerrit-Change-Number: 10406 > Gerrit-PatchSet: 4 > Gerrit-Owner: Xu Yao <ocla...@gmail.com> > Gerrit-Reviewer: Kudu Jenkins > Gerrit-Reviewer: Todd Lipcon <t...@apache.org> > Gerrit-Comment-Date: Thu, 17 May 2018 17:13:07 +0000 > Gerrit-HasComments: No >