Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17779 )
Change subject: KUDU-2671 update range partitioning with custom hash schema ...................................................................... Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/client/client.cc File src/kudu/client/client.cc: http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/client/client.cc@1020 PS4, Line 1020: range->has_table_wide_hash_schema_ > Just curious, maybe i'm missing something but is there a difference between Yep, there is a difference: an empty hash_schema_ means no hash bucketing for this particular range, but table-wide it might be some hash bucketing (i.e. having non-zero hash dimensions for all other ranges of the table). It seems I forgot to mention a semantically important point of this patch: now it's possible to create a table where a range might have no hash bucketing, even if table-wide there is hash bucketing. I think I need to add a blurb about that in the commit description and add a test scenario for that as well (TODO). http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/common/common.proto File src/kudu/common/common.proto: http://gerrit.cloudera.org:8080/#/c/17779/4/src/kudu/common/common.proto@353 PS4, Line 353: // This data structure represents a range partition with a custom hash schema. : message RangeWithHashSchemaPB { : // Row operations containing the lower and upper range bound for the range. : optional RowOperationsPB range_bounds = 1; : // Hash schema for the range. : repeated HashBucketSchemaPB hash_schema = 2; : } > I wonder if it makes sense to decouple the idea of ranges and tablets a bit Tablets and ranges are already decoupled de facto -- a single range might have hash bucketing, so a range turns into a number of tablets (the number of tablets corresponding to a range is the number of buckets in the range's hash schema). This PartitionSchemaPB data structure provides information on a so-called schema of a table: the hashing rules used to be non-discriminant of ranges prior, and now the requirement is to have a custom hash schema per range. I'm planning a follow-up change that would add a registry of different hash schemas, and here instead of hash_schema there will be hash_schema_index pointing to an element in the registry. That will help to save of the size of the PB structure, so that will be less network traffic between masters and clients when fetching tables' schemas. Same for the savings w.r.t. storing the information in the system catalog. I couldn't parse the last sentence of your comment, though. Maybe, you could rephrase it a bit? -- To view, visit http://gerrit.cloudera.org:8080/17779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37aae56a33170894f30d6cd73a5698d6cbb7a697 Gerrit-Change-Number: 17779 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Sat, 28 Aug 2021 22:16:36 +0000 Gerrit-HasComments: Yes
