Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.h@217 PS5, Line 217: // This is a three seek approach for index skip scan implementation: : // 1. Place the validx_iter_ at the entry containing the next : // unique "prefix_key" value. : // 2. Place the iterator at or after the entry formed by the "prefix_key" : // found in 1. and the predicate value. : // 3. If the seek in 2. results in exact match with the predicate value, : // store the row id that represents the last relevant entry (upper bound) wrt the : // current "prefix_key"(found in 1.) I think this comment should be moved to the .cc file. Instead of this comment, how about something describing the API we are providing here, perhaps something along these lines: // This method implements a "skip-scan" optimization, allowing a scan to use // the primary key index to efficiently seek to matching rows where there are // predicates on compound key columns that do not necessarily include the // leading column of the primary key. At the time of writing, only a single // equality predicate is supported, although the algorithm can support ranges // of values. // // This method should be invoked during the PrepareBatch() phase of the row // iterator lifecycle. // // The in-out parameter 'remaining' refers to the number of rows remaining to // scan. When this method is invoked, 'remaining' should contain the maximum // number of remaining rows available to scan. Once this method returns, // 'remaining' will contain the number of rows to scan to consume the // available matching rows according to the equality predicate. Note: // 'remaining' will always be at least 1, although it is a TODO to allow it // to be 0 (0 violates CHECK conditions elsewhere in the scan code). // Preconditions upon entering this method: // - ... // // Postconditions upon entering this method: // - The iterator will be reset... (add details about what member variables might be changed) // // See the .cc file for details on the approach and the implementation. http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@565 PS5, Line 565: Get s/Get/Return/ http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@568 PS5, Line 568: LOG(INFO) << " encoded key in build new encoded key = " << new_enc_key->Stringify(base_data_->tablet_schema()); remove or VLOG(2) http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@581 PS5, Line 581: column_value_1 nit: use more descriptive variable names here http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@588 PS5, Line 588: UseSkipScan how about rename this to: SkipToNextScan() http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@602 PS5, Line 602: // TODO: If we loop > 100 times, break out of the loop. I'm not comfortable merging this patch without this feature http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@662 PS5, Line 662: exclusive upper bound http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@668 PS5, Line 668: // We hit the end of the file, so we will simply scan to the end of the : // file. what guarantees the skip_scan_upper_bound_idx_ has been set to upper_bound_idx_? Should we explicitly set that? http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@675 PS5, Line 675: We wording nit: Check to see whether we have seeked backwards. If so, we need to keep looking... http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@684 PS5, Line 684: } while (!predicate_match); can we convert this to a while-loop instead of a do-while-loop, now that the loop condition is so simple? We can initialize the loop with bool predicate_match = false -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Anupama Gupta <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 18:30:00 +0000 Gerrit-HasComments: Yes
