Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17879 )
Change subject: WIP KUDU-2671: Follow up pruning patch. ...................................................................... Patch Set 5: (11 comments) First quick review. http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG@9 PS5, Line 9: This patch modifies how partition key ranges are removed in : PartitionPruner::RemovePartitionKeyRange(). The format of the : field 'range_bounds_to_partition_key_ranges_' is also modified : so the entire set will be reverse sorted as well. Could you add the rationale explaining why it makes sense doing so? Were there any issues (incorrect scan results, etc.) before this patch? http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG@16 PS5, Line 16: returned from the : meta cache are incorrect Could you clarify in the commit message what you meant by 'incorrect'? Does it break the functionality of the partition pruner somehow? The current ordering has been there from the very beginning of the Kudu client. Those entries are encoded partition boundaries for tablets (where the hash-related part comes first) and those strings are naturally ordered correspondingly. I recall we discussed that a few times: in the alternate approach which allowed to have any number of hash dimensions per range the ordering was different. However, since we decided to keep the current approach of the fixed number of hash dimensions across all the ranges in the table, the ordering is kept as is as well. http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG@25 PS5, Line 25: dummy_value IIRC, INT8_MIN has always meant 'no boundary', no? If so, then simply rename the variable accordingly. If not, you could use optional<> instead. I guess the former approach most likely is fine here. http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc File src/kudu/client/scan_token-test.cc: http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@143 PS5, Line 143: uint32_t token_count = 0; It seems 'token_count' isn't used anywhere but in the debug output which is commented out. http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@151 PS5, Line 151: //std::cout << "Token " << token_count << " =\n"; nit: remove this http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@156 PS5, Line 156: //std::cout << batch.NumRows() << "\n"; ditto http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@225 PS5, Line 225: void CheckLiveRowCount(const char* table_name, : int64_t expected_count) { Maybe, simply return the number of rows via an output parameter? The function itself can return Status. That way failure domains would be clearly separated: some failures in GetTableInfo() and LookupTablet() and the comparison between the expected number of rows. http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scanner-internal.cc@518 PS5, Line 518: //std::cout << "Partition key from pruner is " << partition_key.DebugString() << "\n"; nit: remove this http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner-test.cc File src/kudu/common/partition_pruner-test.cc: http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner-test.cc@1174 PS5, Line 1174: nit: not sure these extra lines add extra readability :) http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.h File src/kudu/common/partition_pruner.h: http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.h@109 PS5, Line 109: bool custom_hash_schemas_per_range_ = false; : bool scan_contains_predicates_ = false; Even if the names of these member fields are self-explanatory, it would be great to document when these fields become valid. http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.cc File src/kudu/common/partition_pruner.cc: http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.cc@429 PS5, Line 429: scan_contains_predicates_ Is it possible to rely on scan_spec.predicates() emptiness instead? If not, please add a comment explaining why. -- To view, visit http://gerrit.cloudera.org:8080/17879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a1bf5344c0ef856072d3ed102714dce5ba21060 Gerrit-Change-Number: 17879 Gerrit-PatchSet: 5 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 16 May 2022 19:35:53 +0000 Gerrit-HasComments: Yes
