Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17879 )

Change subject: WIP KUDU-2671: Follow up pruning patch.
......................................................................


Patch Set 5:

(11 comments)

First quick review.

http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG@9
PS5, Line 9: This patch modifies how partition key ranges are removed in
           : PartitionPruner::RemovePartitionKeyRange(). The format of the
           : field 'range_bounds_to_partition_key_ranges_' is also modified
           : so the entire set will be reverse sorted as well.
Could you add the rationale explaining why it makes sense doing so?  Were there 
any issues (incorrect scan results, etc.) before this patch?


http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG@16
PS5, Line 16: returned from the
            : meta cache are incorrect
Could you clarify in the commit message what you meant by 'incorrect'?  Does it 
break the functionality of the partition pruner somehow?


The current ordering has been there from the very beginning of the Kudu client. 
 Those entries are encoded partition boundaries for tablets (where the 
hash-related part comes first) and those strings are naturally ordered 
correspondingly.

I recall we discussed that a few times: in the alternate approach which allowed 
to have any number of hash dimensions per range the ordering was different.  
However, since we decided to keep the current approach of the fixed number of 
hash dimensions across all the ranges in the table, the ordering is kept as is 
as well.


http://gerrit.cloudera.org:8080/#/c/17879/5//COMMIT_MSG@25
PS5, Line 25: dummy_value
IIRC, INT8_MIN has always meant 'no boundary', no?  If so, then simply rename 
the variable accordingly.  If not, you could use optional<> instead.  I guess 
the former approach most likely is fine here.


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc
File src/kudu/client/scan_token-test.cc:

http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@143
PS5, Line 143:     uint32_t token_count = 0;
It seems 'token_count' isn't used anywhere but in the debug output which is 
commented out.


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@151
PS5, Line 151:       //std::cout << "Token " << token_count << " =\n";
nit: remove this


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@156
PS5, Line 156:         //std::cout << batch.NumRows() << "\n";
ditto


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scan_token-test.cc@225
PS5, Line 225:   void CheckLiveRowCount(const char* table_name,
             :                          int64_t expected_count) {
Maybe, simply return the number of rows via an output parameter?  The function 
itself can return Status.

That way failure domains would be clearly separated: some failures in 
GetTableInfo() and LookupTablet() and the comparison between the expected 
number of rows.


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/client/scanner-internal.cc@518
PS5, Line 518:     //std::cout << "Partition key from pruner is " << 
partition_key.DebugString() << "\n";
nit: remove this


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner-test.cc@1174
PS5, Line 1174:
nit: not sure these extra lines add extra readability :)


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.h@109
PS5, Line 109:   bool custom_hash_schemas_per_range_ = false;
             :   bool scan_contains_predicates_ = false;
Even if the names of these member fields are self-explanatory, it would be 
great to document when these fields become valid.


http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/17879/5/src/kudu/common/partition_pruner.cc@429
PS5, Line 429: scan_contains_predicates_
Is it possible to rely on scan_spec.predicates() emptiness instead?  If not, 
please add a comment explaining why.



--
To view, visit http://gerrit.cloudera.org:8080/17879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a1bf5344c0ef856072d3ed102714dce5ba21060
Gerrit-Change-Number: 17879
Gerrit-PatchSet: 5
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 16 May 2022 19:35:53 +0000
Gerrit-HasComments: Yes

Reply via email to