Hi Todd,

Thank you for the question pointed out, we will modify the patch according
to your design.

Yao

2018-05-18 1:13 GMT+08:00 Todd Lipcon (Code Review) <ger...@cloudera.org>:

> Todd Lipcon *posted comments* on this change.
>
> View Change <http://gerrit.cloudera.org:8080/10406>
>
> Patch set 4:
>
> Hi Xu. Thanks for working on this! It will be really great to decouple the
> available parallelism from the number of partitions.
>
> I have a couple concerns about the design in this patch, though.
>
> 1) periodic reporting of the bounds to the master: for our eventual
> scalability goals of thousands of nodes, each with thousands of tablets, I
> think this may be too much information to try to keep in memory on the
> master and to report on a periodic basis. Especially considering that many
> tablets may be cold (bounds not changing), it's a lot of redundant
> processing. Additionally, with it kept only in memory (not persisted) this
> means that shortly following a restart or failover, the master will not
> have the chunk bounds populated and the query plans submitted by users may
> change dramatically.
>
> 2) including the bounds in every GetTabletLocations call is also somewhat
> expensive from a size/performance perspective. In the case of writes or
> scans, they aren't needed. It would be better to avoid the expense for
> these common cases and only worry about the bounds for the specific use
> case you are targeting (generating splits for tasks)
>
>
> 3) from a security perspective, I'm not sure it's safe to include the
> bounds information in the 'getTabletLocations' RPC response. When we start
> adding more fine-grained authorization it's likely that we will want to
> provide location information to administrators/tools (eg to facilitate
> balancing or other ops workflows) but those administrators may not have
> access to read the underlying data. The bounds themselves can likely expose
> user data (eg imagine that the primary key is an email address or
> government ID number that should not be exposed)
>
>
> I'd like to consider an alternate design in which we add a new RPC to the
> tablet server itself, something like:
>
> message SplitKeyRangeRequest {
>   bytes tablet_id
>   bytes start_pk
>   bytes end_pk
>
>   // Number of bytes to try to return in each chunk. This is a hint.
>   // The tablet server may return chunks larger or smaller than this value.
>   int64 target_chunk_size_bytes
>
>   // The columns to consider when chunking.
>   // If specified, then the size estimate used for 'target_chunk_size_bytes'
>   // should only include these columns. This can be used if a query will
>   // only scan a certain subset of the columns.
>   repeated int32 col_ids
> }
>
> (perhaps the 'col_ids' feature could be left out initially)
>
> message SplitKeyRangeResponse {
>   repeated bytes split_pks
>   repeated int64 size_estimates
> }
>
> The advantages I see for this design are:
>
>    - no need to centralize the split metadata (so we can be more
>    fine-grained without worrying about master load or memory)
>    - for the purposes of locality scheduling, we only need the partition
>    locations, so the "planner" (spark driver or impala front-end) doesn't care
>    about the sub-partition splits. The individual tasks can then contact their
>    local tserver to further sub-divide the range
>    - on-demand computation of the splits means we will be able to do a
>    better job of lazily opening cold tablets later.
>
> What do you think?
>
>
> To view, visit change 10406 <http://gerrit.cloudera.org:8080/10406>. To
> unsubscribe, visit settings <http://gerrit.cloudera.org:8080/settings>.
> Gerrit-Project: kudu
> Gerrit-Branch: master
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
> Gerrit-Change-Number: 10406
> Gerrit-PatchSet: 4
> Gerrit-Owner: Xu Yao <ocla...@gmail.com>
> Gerrit-Reviewer: Kudu Jenkins
> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
> Gerrit-Comment-Date: Thu, 17 May 2018 17:13:07 +0000
> Gerrit-HasComments: No
>

Reply via email to