Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@260 PS9, Line 260: cur_idx_ >= skip_scan_upper_bound_idx_. this is no longer true (checked inside the method) http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@264 PS9, Line 264: // 1. key_iter_->cur_val_ stores the first entry containing both the next unique : // prefix key and the predicate value, remaining to be scanned in the current batch. Is this an implementation detail or part of the interface of this method? If it's the former then we should remove it from the header file. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@266 PS9, Line 266: // 2. skip_scan_lower_bound_idx will store the row id of the entry found in 1. : // 3. skip_scan_upper_bound_idx is the row id which is an exclusive upper bound : // on our current scan range. These variables are local to this method, so I don't see why a caller would care about this. If we want to document this, we should document it in the implementation file. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@269 PS9, Line 269: // 4. cur_idx_ is either same as skip_scan_lower_bound_idx or remains unchanged (this occurs : // when cur_idx_ lies somewhere in between the current scan range). This seems to be one of the key postconditions from my perspective, since it affects class-level state that is accessed by other methods in this class. Can this be phrased in a way that doesn't reference local implementation variables but rather provides an explanation about what this does? http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@271 PS9, Line 271: remaining nit: 'remaining' http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@639 PS9, Line 639: // 1. Place the validx_iter_ at the entry containing the next : // unique "prefix_key" value. : // 2. Place the iterator at or after the entry formed by the "prefix_key" : // found in 1. and the predicate value. : // 3. If the seek in 2. results in exact match with the predicate value, : // store the row id that represents the last relevant entry (upper bound) wrt the : // current "prefix_key"(found in 1.) How about using something a bit higher level to describe this algorithm: 1. Search within the ad-hoc index for the next unique prefix (set of keys prior to the first predicate column). Searching is done using validx_iter_. 2. Read that unique prefix from the ad-hoc index, append the lower bound of the first predicate column, and seek to that. If this matches, this is the lower bound of our desired scan. 3. If we found our desired lower bound, find an upper bound for the scan by searching for the next row key matching one value higher than the highest value that will match our first predicate. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@671 PS9, Line 671: skip_scan_searched_cur_prefix_ The values need to be inverted due to the variable name change; it doesn't make sense to seek to the next prefix key if we didn't search the current prefix. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@820 PS9, Line 820: nit: unexpected indentation -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 06 Aug 2018 21:04:26 +0000 Gerrit-HasComments: Yes