Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16859 )
Change subject: [master] Range specific hashing at table creation time. ...................................................................... Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/integration-tests/table_locations-itest.cc File src/kudu/integration-tests/table_locations-itest.cc: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/integration-tests/table_locations-itest.cc@442 PS2, Line 442: TEST_F(TableLocationsTest, TestRangeSpecificHashing) { We chatted about this over a call, but for posterity, it'd be nice to test the table-wide default case as well. http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/catalog_manager.cc@1714 PS2, Line 1714: PartitionSchema::HashBucketSchemas hash_bucket_schemas; : if (j < req.range_hash_schemas_size()) { : const auto& hash_schemas_pb = req.range_hash_schemas(j++); : if (!hash_schemas_pb.hash_schemas().empty()) { : RETURN_NOT_OK( : PartitionSchema::ExtractHashBucketSchemasFromPB(schema, : hash_schemas_pb.hash_schemas(), : &hash_bucket_schemas)); : } : } IMO the intermixed parsing of 'range_hash_schemas' and 'ops' makes this pretty confusing, since it opens the door to questions like: did we actually get all of the requested 'range_hash_schemas'? What if 'ops.size()' and 'range_hash_schemas_size()' don't match? I think it would be less confusing and still correct to do: for (int i = 0; i < ops.size(); i++) { switch (op.type) { ... don't change the existing parsing } } // Pull out all of the requested hash bucket schemas. for (int i = 0; i < req.range_hash_schema_size(); i++) { HashBucketSchemas hash_bucket_schemas; RETURN_NOT_OK(Extract(&hash_bucket_schemas)); // hash_bucket_schemas may be empty. range_hash_schema.emplace_back(std::move(hash_bucket_schemas)); } if (!range_hash_schema.empty()) { if (!split_rows.empty()) { return an error; } if (range_bounds.size() != range_hash_schema.size()) { return an error; } } CreatePartitions(...) http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto@504 PS2, Line 504: optional PartitionSchemaPB partition_schema = 7; nit: can you also comment how this is used if 'range_hash_schemas' is specified? http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto@505 PS2, Line 505: repeated PerRangeHashBucketSchemasPB range_hash_schemas = 12; nit: add docs to how this should be populated, its ordering constraints with 'split_rows_rand_bounds', invariants w.r.t split_rows, etc. -- To view, visit http://gerrit.cloudera.org:8080/16859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d Gerrit-Change-Number: 16859 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Sat, 12 Dec 2020 00:43:46 +0000 Gerrit-HasComments: Yes
