Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291: efficiently support predicates on non-prefix key components ...................................................................... Patch Set 20: (10 comments) http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@337 PS20, Line 337: store the value obtained : // after seeking by validx_iter_ store the value for later use where? http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@338 PS20, Line 338: this value this is ambiguous -- "this value" meaning "the value obtained" or the value of 'cache_seeked_value' http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h@274 PS20, Line 274: // Details would be great to have another section here with an example like a year/month/day with a predicate on month or day http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@430 PS20, Line 430: spec->lower_bound_key()->encoded_key().compare(base_data_->min_encoded_key_) > 0) { we should make sure that if there is a lower bound key that the skip scan knows to start at that point instead of before it http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@501 PS20, Line 501: min_col_id this should be an index, right? same below in a few places http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@506 PS20, Line 506: predicates this predicate list is post-optimization, right? In that case, if the user has specified a predicate on a leading prefix, then that prefix would have been converted into a range scan at an upper level and removed from the predicate list before we get here. http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@514 PS20, Line 514: or || http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@518 PS20, Line 518: LOG(WARNING) << col_name << " column is not found in schema"; DFATAL? should we ever expect to get this? http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@535 PS20, Line 535: use_skip_scan_ = true; do we have any way to expose this back to the scan in terms of a metric? how many cfile blocks skip scan was attempted, and how many times it was auto-disabled, how many times it skipped (how many rows it skipped), etc http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@863 PS20, Line 863: if (use_skip_scan_) { if we are using skip scan, should we also be able to _not_ scan the column that has the range predicate? it seems that would be a key advantage of this approach. For example if your PK is (entity, timestamp) and you set a range on timestamp, you'd want to be able to avoid reading the timestamp column at all because the skipping using the PK index already tells you which rows match the rpedicate. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Anupama Gupta <[email protected]> Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 28 Aug 2018 22:45:08 +0000 Gerrit-HasComments: Yes
