Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16859 )
Change subject: [master] Range specific hashing at table creation time. ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/16859/1/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/16859/1/src/kudu/common/partition.cc@451 PS1, Line 451: // The number of ranges before being split should match the size of range_hash_schemas. > I guess I need to understand what are split rows to better understand what' added a comment in the second patch, lmk if you think it needs further clarification. http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/common/partition.cc@418 PS2, Line 418: if (!split_rows.empty()) { : for (const auto& hash_schemas : range_hash_schemas) { : if (!hash_schemas.empty()) { : return Status::InvalidArgument("Both 'split_rows' and 'range_hash_schemas' cannot be " : "populated at the same time."); : } : } > Should we be checking here that ranger_hash_schemas is empty if split_rows Ultimately, if 'range_hash_schema' is a vector filled with empty vectors it essentially serves the same purpose as 'range_hash_schema' being empty. At that point, the table wide hash schema will be applied to each new range. 'range_hash_schema' doesn't add any new functionality unless one of its vectors is defined and that is where the incompatibility with 'split_rows' comes in. Now, the reason why I need 'range_hash_schema' to remain the same way is due to the ordering of the bounds and their schemas being passed in. If I omit an entry in 'range_hash_schema' due to its respective bound not having a user defined schema while other bounds have their respective schemas defined, then I won't have a way to match the schemas to the bound. A way around this is to modify the current 'range_bounds' parameter to 'vector<tuple<KuduPartialRow, KuduPartialRow,vector<HashBucketSchema>>>'. This way, the bounds are matched to their respective hash schemas. If you think this is more clear (or using a struct instead of a tuple), I can consider making that change. http://gerrit.cloudera.org:8080/#/c/16859/1/src/kudu/integration-tests/table_locations-itest.cc File src/kudu/integration-tests/table_locations-itest.cc: http://gerrit.cloudera.org:8080/#/c/16859/1/src/kudu/integration-tests/table_locations-itest.cc@474 PS1, Line 474: EQ(14, > nit: can just use ASSERT_FALSE Done http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/integration-tests/table_locations-itest.cc File src/kudu/integration-tests/table_locations-itest.cc: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/integration-tests/table_locations-itest.cc@442 PS2, Line 442: TEST_F(TableLocationsTest, TestRangeSpecificHashing) { > We chatted about this over a call, but for posterity, it'd be nice to test will add this in next iteration. http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/catalog_manager.cc@1714 PS2, Line 1714: PartitionSchema::HashBucketSchemas hash_bucket_schemas; : if (j < req.range_hash_schemas_size()) { : const auto& hash_schemas_pb = req.range_hash_schemas(j++); : if (!hash_schemas_pb.hash_schemas().empty()) { : RETURN_NOT_OK( : PartitionSchema::ExtractHashBucketSchemasFromPB(schema, : hash_schemas_pb.hash_schemas(), : &hash_bucket_schemas)); : } : } > IMO the intermixed parsing of 'range_hash_schemas' and 'ops' makes this pre good catch, will move out 'range_hash_schema' logic out of for loop for 'ops'. http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto@504 PS2, Line 504: optional PartitionSchemaPB partition_schema = 7; > nit: can you also comment how this is used if 'range_hash_schemas' is speci Done http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/master/master.proto@505 PS2, Line 505: repeated PerRangeHashBucketSchemasPB range_hash_schemas = 12; > nit: add docs to how this should be populated, its ordering constraints wit Done -- To view, visit http://gerrit.cloudera.org:8080/16859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d Gerrit-Change-Number: 16859 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Tue, 15 Dec 2020 00:11:54 +0000 Gerrit-HasComments: Yes
