Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 20:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@337
PS20, Line 337: store the value obtained
              :   // after seeking by validx_iter_
store the value for later use where?


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@338
PS20, Line 338: this value
this is ambiguous -- "this value" meaning "the value obtained" or the value of 
'cache_seeked_value'


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h@274
PS20, Line 274:   // Details
would be great to have another section here with an example like a 
year/month/day with a predicate on month or day


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@430
PS20, Line 430:       
spec->lower_bound_key()->encoded_key().compare(base_data_->min_encoded_key_) > 
0) {
we should make sure that if there is a lower bound key that the skip scan knows 
to start at that point instead of before it


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@501
PS20, Line 501: min_col_id
this should be an index, right? same below in a few places


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@506
PS20, Line 506: predicates
this predicate list is post-optimization, right? In that case, if the user has 
specified a predicate on a leading prefix, then that prefix would have been 
converted into a range scan at an upper level and removed from the predicate 
list before we get here.


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@514
PS20, Line 514: or
||


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@518
PS20, Line 518:         LOG(WARNING) << col_name << " column is not found in 
schema";
DFATAL? should we ever expect to get this?


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@535
PS20, Line 535:     use_skip_scan_ = true;
do we have any way to expose this back to the scan in terms of a metric? how 
many cfile blocks skip scan was attempted, and how many times it was 
auto-disabled, how many times it skipped (how many rows it skipped), etc


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@863
PS20, Line 863:   if (use_skip_scan_) {
if we are using skip scan, should we also be able to _not_ scan the column that 
has the range predicate? it seems that would be a key advantage of this 
approach. For example if your PK is (entity, timestamp) and you set a range on 
timestamp, you'd want to be able to avoid reading the timestamp column at all 
because the skipping using the PK index already tells you which rows match the 
rpedicate.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 28 Aug 2018 22:45:08 +0000
Gerrit-HasComments: Yes

Reply via email to