Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16859 )

Change subject: [master] Range specific hashing at table creation time.
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/integration-tests/table_locations-itest.cc
File src/kudu/integration-tests/table_locations-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/integration-tests/table_locations-itest.cc@442
PS2, Line 442: TEST_F(TableLocationsTest, TestRangeSpecificHashing) {
We chatted about this over a call, but for posterity, it'd be nice to test the 
table-wide default case as well.


http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/catalog_manager.cc@1714
PS2, Line 1714:         PartitionSchema::HashBucketSchemas hash_bucket_schemas;
              :         if (j < req.range_hash_schemas_size()) {
              :           const auto& hash_schemas_pb = 
req.range_hash_schemas(j++);
              :           if (!hash_schemas_pb.hash_schemas().empty()) {
              :             RETURN_NOT_OK(
              :                 
PartitionSchema::ExtractHashBucketSchemasFromPB(schema,
              :                                                                 
hash_schemas_pb.hash_schemas(),
              :                                                                 
&hash_bucket_schemas));
              :           }
              :         }
IMO the intermixed parsing of 'range_hash_schemas' and 'ops' makes this pretty 
confusing, since it opens the door to questions like: did we actually get all 
of the requested 'range_hash_schemas'? What if 'ops.size()' and 
'range_hash_schemas_size()' don't match?

I think it would be less confusing and still correct to do:

for (int i = 0; i < ops.size(); i++) {
  switch (op.type) {
    ... don't change the existing parsing
  }
}
// Pull out all of the requested hash bucket schemas.
for (int i = 0; i < req.range_hash_schema_size(); i++) {
  HashBucketSchemas hash_bucket_schemas;
  RETURN_NOT_OK(Extract(&hash_bucket_schemas)); // hash_bucket_schemas may be 
empty.
  range_hash_schema.emplace_back(std::move(hash_bucket_schemas));
}
if (!range_hash_schema.empty()) {
  if (!split_rows.empty()) {
    return an error;
  }
  if (range_bounds.size() != range_hash_schema.size()) {
    return an error;
  }
}
CreatePartitions(...)


http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto@504
PS2, Line 504:   optional PartitionSchemaPB partition_schema = 7;
nit: can you also comment how this is used if 'range_hash_schemas' is specified?


http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto@505
PS2, Line 505:   repeated PerRangeHashBucketSchemasPB range_hash_schemas = 12;
nit: add docs to how this should be populated, its ordering constraints with 
'split_rows_rand_bounds', invariants w.r.t split_rows, etc.



--
To view, visit http://gerrit.cloudera.org:8080/16859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d
Gerrit-Change-Number: 16859
Gerrit-PatchSet: 2
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Comment-Date: Sat, 12 Dec 2020 00:43:46 +0000
Gerrit-HasComments: Yes

Reply via email to