Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16859 )
Change subject: [master] Range specific hashing at table creation time. ...................................................................... Patch Set 3: (10 comments) Overall looks good. Mostly nits and some test suggestions. http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/16859/2/src/kudu/common/partition.cc@418 PS2, Line 418: if (!split_rows.empty()) { : for (const auto& hash_schemas : range_hash_schemas) { : if (!hash_schemas.empty()) { : return Status::InvalidArgument("Both 'split_rows' and 'range_hash_schemas' cannot be " : "populated at the same time."); : } : } > I meant to say if the user was using the new client API, the size of 'range Ack http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/common/partition.cc File src/kudu/common/partition.cc: http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/common/partition.cc@417 PS3, Line 417: : if (!split_rows.empty()) { : for (const auto& hash_schemas : range_hash_schemas) { : if (!hash_schemas.empty()) { : return Status::InvalidArgument("Both 'split_rows' and 'range_hash_schemas' cannot be " : "populated at the same time."); : } : } : } : : if (!range_hash_schemas.empty() && range_bounds.size() != range_hash_schemas.size()) { : return Status::InvalidArgument("The number of range bounds does not match the number of user" : "supplied hash schemas."); : } Do we have a partition-test case in which split rows is non-empty and range_hash_schemas is also non-empty but contains empty schemas? If not, can you add one? http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/integration-tests/table_locations-itest.cc File src/kudu/integration-tests/table_locations-itest.cc: http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/integration-tests/table_locations-itest.cc@216 PS3, Line 216: auto* partition_schema_pb = req.mutable_partition_schema(); Just FYI, this renders req.has_partition_schema() = true See https://developers.google.com/protocol-buffers/docs/reference/cpp-generated and search for "mutable_" http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/integration-tests/table_locations-itest.cc@459 PS3, Line 459: w} nit: add a space http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/integration-tests/table_locations-itest.cc@473 PS3, Line 473: user nit: these are all "user" schemas. "per-range" would be more accurate. http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/integration-tests/table_locations-itest.cc@473 PS3, Line 473: table wide nit: capitalize "Table-wide" http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/integration-tests/table_locations-itest.cc@491 PS3, Line 491: // Tablets returned in sorted order by partition key range. : EXPECT_EQ(string("\0\0\0\0" "\0\0\0\0" "a" "\0\0", 11), : resp.tablet_locations(0).partition().partition_key_start()); : EXPECT_EQ(string("\0\0\0\0" "\0\0\0\1" "a" "\0\0", 11), : resp.tablet_locations(1).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\0" "c" "\0\0", 7), : resp.tablet_locations(2).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\0" "e" "\0\0", 7), : resp.tablet_locations(3).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\1" "\0\0\0\0" "a" "\0\0", 11), : resp.tablet_locations(4).partition().partition_key_start()); : EXPECT_EQ(string("\0\0\0\1" "\0\0\0\1" "a" "\0\0", 11), : resp.tablet_locations(5).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\1" "c" "\0\0", 7), : resp.tablet_locations(6).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\1" "e" "\0\0", 7), : resp.tablet_locations(7).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\2" "\0\0\0\0" "a" "\0\0", 11), : resp.tablet_locations(8).partition().partition_key_start()); : EXPECT_EQ(string("\0\0\0\2" "\0\0\0\1" "a" "\0\0", 11), : resp.tablet_locations(9).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\2" "c" "\0\0", 7), : resp.tablet_locations(10).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\2" "e" "\0\0", 7), : resp.tablet_locations(11).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\3" "\0\0\0\0" "a" "\0\0", 11), : resp.tablet_locations(12).partition().partition_key_start()); : EXPECT_EQ(string("\0\0\0\3" "\0\0\0\1" "a" "\0\0", 11), : resp.tablet_locations(13).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\3" "c" "\0\0", 7), : resp.tablet_locations(14).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\3" "e" "\0\0", 7), : resp.tablet_locations(15).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\4" "c" "\0\0", 7), : resp.tablet_locations(16).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\4" "e" "\0\0", 7), : resp.tablet_locations(17).partition().partition_key_start()); : : EXPECT_EQ(string("\0\0\0\5" "c" "\0\0", 7), : resp.tablet_locations(18).partition().partition_key_start()); nit: it would be easier to read and extend as: set<string> key_starts({ string(...partitions for a), ... string(...partitions for c), ... string(...partitions for e), ... }); vector<string> sorted_key_starts(key_starts.begin(), key_starts.end()); ASSERT_EQ(key_starts.size(), resp.tablet_locations_size()); for (int i = 0; i < sorted_key_starts.size(); i++) { EXPECT_EQ(sorted_key_starts[i], resp.tablet_locations(i).partition().partition_key_start()); } That way it's really obvious that we have the right number of hashes per range, and it's also clear what "correct" ordering is. http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/master/master.proto@507 PS3, Line 507: per user's request nit: remove? http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/master/master.proto@507 PS3, Line 507: per nit: for http://gerrit.cloudera.org:8080/#/c/16859/3/src/kudu/master/master.proto@508 PS3, Line 508: Only populated when user creates range bounds and not split rows, populated in the same order. nit: we can be more specific and refer to other fields. Also mind filling this out with proper grammar? How about: "Only populated when 'split_rows_range_bounds' specifies range bounds, and must be empty if any split rows are specified. If set, must have the same number of elements as there are range bounds, and must be in the same order. If empty, 'partition_schema' is assumed for every range bound." -- To view, visit http://gerrit.cloudera.org:8080/16859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d Gerrit-Change-Number: 16859 Gerrit-PatchSet: 3 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Tue, 15 Dec 2020 08:42:39 +0000 Gerrit-HasComments: Yes
