Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17779 )

Change subject: KUDU-2671 update range partitioning with custom hash schema
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/client/client.cc
File src/kudu/client/client.cc:

http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/client/client.cc@1020
PS4, Line 1020: range->has_table_wide_hash_schema_
> Just curious, maybe i'm missing something but is there a difference between
Yep, there is a difference: an empty hash_schema_ means no hash bucketing for 
this particular range, but table-wide it might be some hash bucketing (i.e. 
having non-zero hash dimensions for all other ranges of the table).

It seems I forgot to mention a semantically important point of this patch: now 
it's possible to create a table where a range might have no hash bucketing, 
even if table-wide there is hash bucketing.

I think I need to add a blurb about that in the commit description and add a 
test scenario for that as well (TODO).


http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/common/common.proto@353
PS4, Line 353:   // This data structure represents a range partition with a 
custom hash schema.
             :   message RangeWithHashSchemaPB {
             :     // Row operations containing the lower and upper range bound 
for the range.
             :     optional RowOperationsPB range_bounds = 1;
             :     // Hash schema for the range.
             :     repeated HashBucketSchemaPB hash_schema = 2;
             :   }
> I wonder if it makes sense to decouple the idea of ranges and tablets a bit
Tablets and ranges are already decoupled de facto -- a single range might have 
hash bucketing, so a range turns into a number of tablets (the number of 
tablets corresponding to a range is the number of buckets in the range's hash 
schema).  This PartitionSchemaPB data structure provides information on a 
so-called schema of a table: the hashing rules used to be non-discriminant of 
ranges prior, and now the requirement is to have a custom hash schema per range.

I'm planning a follow-up change that would add a registry of different hash 
schemas, and here instead of hash_schema there will be hash_schema_index 
pointing to an element in the registry.  That will help to save of the size of 
the PB structure, so that will be less network traffic between masters and 
clients when fetching tables' schemas.  Same for the savings w.r.t. storing the 
information in the system catalog.

I couldn't parse the last sentence of your comment, though.  Maybe, you could 
rephrase it a bit?



--
To view, visit http://gerrit.cloudera.org:8080/17779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37aae56a33170894f30d6cd73a5698d6cbb7a697
Gerrit-Change-Number: 17779
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Comment-Date: Sat, 28 Aug 2021 22:16:36 +0000
Gerrit-HasComments: Yes

Reply via email to