[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Mike Percy has uploaded a new patch set (#22) to the change originally created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. KUDU-1291: efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, until all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,333 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/22 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 22 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Mike Percy has uploaded a new patch set (#21) to the change originally created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. KUDU-1291: efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, until all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,331 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/21 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 21 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 20: (10 comments) http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@337 PS20, Line 337: store the value obtained : // after seeking by validx_iter_ store the value for later use where? http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@338 PS20, Line 338: this value this is ambiguous -- "this value" meaning "the value obtained" or the value of 'cache_seeked_value' http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h@274 PS20, Line 274: // Details would be great to have another section here with an example like a year/month/day with a predicate on month or day http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@430 PS20, Line 430: spec->lower_bound_key()->encoded_key().compare(base_data_->min_encoded_key_) > 0) { we should make sure that if there is a lower bound key that the skip scan knows to start at that point instead of before it http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@501 PS20, Line 501: min_col_id this should be an index, right? same below in a few places http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@506 PS20, Line 506: predicates this predicate list is post-optimization, right? In that case, if the user has specified a predicate on a leading prefix, then that prefix would have been converted into a range scan at an upper level and removed from the predicate list before we get here. http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@514 PS20, Line 514: or || http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@518 PS20, Line 518: LOG(WARNING) << col_name << " column is not found in schema"; DFATAL? should we ever expect to get this? http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@535 PS20, Line 535: use_skip_scan_ = true; do we have any way to expose this back to the scan in terms of a metric? how many cfile blocks skip scan was attempted, and how many times it was auto-disabled, how many times it skipped (how many rows it skipped), etc http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@863 PS20, Line 863: if (use_skip_scan_) { if we are using skip scan, should we also be able to _not_ scan the column that has the range predicate? it seems that would be a key advantage of this approach. For example if your PK is (entity, timestamp) and you set a range on timestamp, you'd want to be able to avoid reading the timestamp column at all because the skipping using the PK index already tells you which rows match the rpedicate. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 28 Aug 2018 22:45:08 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc@777 PS20, Line 777: if (PREDICT_TRUE(validx_iter_ != nullptr)) { : RETURN_NOT_OK(StoreCurrentValue()); : } > To make sure this patch has no side effects when disabled, this should prob Yep, I think that's a good catch. If we want to have a better assurance that the feature is truly isolated, it's better to gate it on something specific for 'skip index scan'. However, I don't think we have that gate readily available in current implementation. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 20 Aug 2018 21:38:06 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 20: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc@777 PS20, Line 777: if (PREDICT_TRUE(validx_iter_ != nullptr)) { : RETURN_NOT_OK(StoreCurrentValue()); : } To make sure this patch has no side effects when disabled, this should probably be gated by some `cache_seeked_value`. AFAICT, besides the skip scan logic, this method is only used in tool_action_fs.cc anyway. WDYT? -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 20 Aug 2018 18:24:35 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 20: (5 comments) http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG@19 PS19, Line 19: until > nit: only one 'l' Done http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc@794 PS19, Line 794: 8192 > nit: should probably be using FLAGS_max_encoded_key_size_bytes here FLAGS_max_encoded_key_size_bytes is not available here and I'm not sure I want to move the definition of that flag in here or re-shuffle libraries only to remove it in the follow-up patch: as we discussed offline, it's better to use arena passed as a for these methods from upper level (i.e. from cfile_set). http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@303 PS19, Line 303: & cur_enc_key, : KuduPartialRow* p_row, > nit: reverse sigil-space order Done http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@354 PS19, Line 354: il > nit: here too Done http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc@168 PS19, Line 168: : Status s = writer.Insert(row); : // As keys are inserted randomly, retry row insertion in case : // the current row insertion failed due to duplicate value. : if (s.IsAlreadyPresent()) { : continue; : } else { : CHECK_OK(s); > Does it matter what these values are? It seems like the predicate column wi No, it's not that important. Yes, there will be only three different values and that's what was desired, yep. The random approach was exercised by the other generator, but if you like randomness here as well, I'll add that. The only important point here is to have not less than num_rows/3 values matching the predicate. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 20 Aug 2018 17:50:01 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 20: Verified+1 Unrelated failure (due to concurrent builds?): 08:44:42 CMakeFiles/file_cache-stress-test.dir/file_cache-stress-test.cc.o: file not recognized: File truncated 08:44:42 collect2: error: ld returned 1 exit status 08:44:42 make[2]: *** [bin/file_cache-stress-test] Error 1 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 20 Aug 2018 17:50:24 + Gerrit-HasComments: No
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has uploaded a new patch set (#20) to the change originally created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. KUDU-1291: efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, until all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,327 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/20 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc@104 PS16, Line 104: : IndexSkipScanTest() : : KuduTabletTest(CreateSchema(std::get<0>(GetParam( { : const auto& param = GetParam(); : schema_type = std::get<0>(param); : FLAGS_enable_skip_scan = std::get<1>(param); : } : : void SetUp() override { : KuduTabletTest::SetUp(); : ASSERT_OK(tablet()->metadata()->CreateRowSet(_meta_)); : FillTestTablet(); : } : : // Generates and inserts given number of rows using the given PRNG, : // returns # rows generated that match the predicate_val on predicate_col. : int GenerateData(Random random, int num_rows, int predicate_col_id, int64_t predicate_value) { : : LocalTabletWriter writer(tablet().get(), _schema_); : KuduPartialRow row(_schema_); : : size_t num_key_cols = client_schema_.num_key_columns(); : int num_matching = 0; : : while (num_rows > 0) { : bool predicate_value_matched = false; : for (int col_idx = 0; col_idx < num_key_cols; col_idx++) { : int64_t value = random.Uniform(1000); : CHECK_OK(row.SetInt64(col_idx, value)); : if (col_idx == predicate_col_id && value == predicate_value) { : predicate_value_matched = true; : } : } : > This is no longer used Bah, yes it is. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 19 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 20 Aug 2018 00:25:17 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 19: (8 comments) http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG@19 PS19, Line 19: untill nit: only one 'l' http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc@794 PS19, Line 794: 8192 nit: should probably be using FLAGS_max_encoded_key_size_bytes here http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@303 PS19, Line 303: _enc_key, : KuduPartialRow *p_row, gscoped_ptr * nit: reverse sigil-space order http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@354 PS19, Line 354: * nit: here too http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc@104 PS16, Line 104: : IndexSkipScanTest() : : KuduTabletTest(CreateSchema(std::get<0>(GetParam( { : const auto& param = GetParam(); : schema_type = std::get<0>(param); : FLAGS_enable_skip_scan = std::get<1>(param); : } : : void SetUp() override { : KuduTabletTest::SetUp(); : ASSERT_OK(tablet()->metadata()->CreateRowSet(_meta_)); : FillTestTablet(); : } : : // Generates and inserts given number of rows using the given PRNG, : // returns # rows generated that match the predicate_val on predicate_col. : int GenerateData(Random random, int num_rows, int predicate_col_id, int64_t predicate_value) { : : LocalTabletWriter writer(tablet().get(), _schema_); : KuduPartialRow row(_schema_); : : size_t num_key_cols = client_schema_.num_key_columns(); : int num_matching = 0; : : while (num_rows > 0) { : bool predicate_value_matched = false; : for (int col_idx = 0; col_idx < num_key_cols; col_idx++) { : int64_t value = random.Uniform(1000); : CHECK_OK(row.SetInt64(col_idx, value)); : if (col_idx == predicate_col_id && value == predicate_value) { : predicate_value_matched = true; : } : } : This is no longer used http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589 PS17, Line 589: int num_matching = GenerateData(random, num_rows, predicate_col_id, predicate_val); : > Right, the original idea was to generate almost the same order of matching Seems like GenerateDataEx() accomplishes the "every Nth row" case with N fixed at 3, but not the case where all or almost all rows match. Is that still a goal? http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc@153 PS19, Line 153: int GenerateDataEx nit: rename this to GenerateData and add docs http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc@168 PS19, Line 168: case 1: : CHECK_OK(row.SetInt64(col_idx, predicate_value / 2 + 1)); : match = false; : break; : case 2: : CHECK_OK(row.SetInt64(col_idx, predicate_value / 2 - 1)); : match = false; : break; Does it matter what these values are? It seems like the predicate column will only have three values in it. Maybe instead we can instead inject random data (and check them against the predicate) instead? -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 19
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has uploaded a new patch set (#19) to the change originally created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. KUDU-1291: efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, untill all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,344 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/19 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 19 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. Patch Set 18: (15 comments) http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19 PS16, Line 19: unti > nit: until Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@238 PS17, Line 238: > nit: prefer using // instead of /**/ Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@244 PS17, Line 244: k to the appropriate range of rows and > nit: how about "leftmost, non-leading primary key column with a predicate" Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@245 PS17, Line 245: . The only exception is when a scan is pr > nit: how about something like "Value that the predicate column is predicate Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@246 PS17, Line 246: columns of a composite N-column primary key, where K < N, > nit: how about "The collective columns to the left of the predicate column" Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@247 PS17, Line 247: r al > nit: extra space, also maybe use : to separate these? I switched to new lines for the 'definition' parts. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@270 PS17, Line 270: : // "prefix key" : // Any value i > nit: now that we have some definitions above, we don't need to redefine the Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@276 PS17, Line 276: n order to keep track of the number of rows that satisfy the : // "predicate value" wrt a distinct prefix key we first seek to the first : // such row in the index column and this corresponding value is called : // "lower bound" key. Next, we seek to the row that is right after the last : // row that satisfies the "predicate value" wrt to the distinct prefix key, : // and this corresponding value is calle > I think this is getting a bit too deep into the implementation details (in Yep, I think that's a very good suggestion. I think I'll just tuck that into the .cc file. We can update the signature of those functions and re-structure those to have better semantics is a separate patch. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@282 PS17, Line 282: // > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@285 PS17, Line 285: // By the definition (see above), the skip scan optimization i > nit: "Seek to the next row that matches the predicate and has the same pref Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@333 PS17, Line 333: 'remaining' will always be at least 1, although it is a TODO to allow it : // to be 0 (0 violates CHECK conditions elsewhere in the scan code). : // : // Currently, skip scan will be dynamically disabled when the number of seeks : // for distinct prefix keys exceeds sqrt(#total rows). We use sqrt(#total rows) : // as a cutoff because based on performance tests on upto 10M rows per tablet, > nit: standardize the bullet types Done http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270 PS16, Line 270: : // "prefix key" : // Any value i > nit: these are no longer needed now that we have the definitions up front Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675 PS17, Line 675: 2. Read that distinct prefix from the ad-hoc index, append the predicate value : //and minimum possible value for all other columns, and seek to that. : //If this matches, this is the lower bound of our desired scan. : // 3. If we found our desired lower bound, find an upper bound fo > nit: use " - " to bullet these two points Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803 PS17, Line 803: continue; : } : > Hmm, not only are we not caching the seeked value, but we're searching for Done http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc File
[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components
Alexey Serbin has uploaded a new patch set (#18) to the change originally created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291: efficiently support predicates on non-prefix key components .. KUDU-1291: efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, untill all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,343 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/18 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 18 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 17: (10 comments) Mostly a nit pass http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@238 PS17, Line 238: / nit: prefer using // instead of /**/ http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@244 PS17, Line 244: leftmost (non-first) primary key column nit: how about "leftmost, non-leading primary key column with a predicate" http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@245 PS17, Line 245: equality value on the "predicate column". nit: how about something like "Value that the predicate column is predicated on. Since only equality predicates are supported, there is only a single value of interest." http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@246 PS17, Line 246: all columns (together) to the left of the "predicate column". nit: how about "The collective columns to the left of the predicate column" http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@247 PS17, Line 247: - nit: extra space, also maybe use : to separate these? http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@270 PS17, Line 270: prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling : // this function). nit: now that we have some definitions above, we don't need to redefine these there. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@276 PS17, Line 276: In this case, prior to calling key_iter->SeekAtOrAfter(), the search key is : //modified such that the predicate column is set to the predicate value and the : //succeeding columns are set to their minimum possible values. This is done to : //make sure that after calling key_iter->SeekAtOrAfter() the key_iter is correctly : //placed at the first occurrence of the row (if it exists) that matches the next : //prefix key with the predicate value. I think this is getting a bit too deep into the implementation details (in that it's referencing function names in the CFileReader class), and it's not immediately clear why setting `cache_seeked_value=true` should seek to a different value. Maybe there's a better parameter name that is more telling. Also now that the term "prefix key" is defined to be the columns before the predicate, the name "SeekToNextPrefixKey" may be confusing when run with `skip_scan_predicate_column_id_ + 1`. Maybe we could change this to be a bit clearer. enum { UPPER, LOWER } BoundType; // If seeking for the lower bound, seeks to ... // If seeking for the upper bound, seeks to ... SeekToNextBound(BoundType bound_type); Maybe it'd be fine with just a name change. Or comments that explain why we're seeking to what we're seeking. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@282 PS17, Line 282: // nit: extra line http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@285 PS17, Line 285: // Seek to the next predicate match within the current prefix. nit: "Seek to the next row that matches the predicate and has the same prefix as that of `enc_key`" http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@333 PS17, Line 333: * key_iter_ is not NULL. : // : // Postconditions upon exiting this method: : // 1. cur_idx_ is updated to the row_id of the next row(containing the next : //higher distinct prefix) to scan. : // 2. 'remaining' stores the number the entries to be scanned in the current scan range. nit: standardize the bullet types -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 17 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 17 Aug 2018 23:08:45 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 17: (5 comments) http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19 PS16, Line 19: till nit: until http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270 PS16, Line 270: prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling : // this function). nit: these are no longer needed now that we have the definitions up front http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675 PS17, Line 675: the primary key (PK) value index (ad-hoc index) search pointer, which gives us : // a forward index of value to position, and : // the scan pointer, which is cur_idx_, representing an offset one more than : // the last row that we actually scanned (in the previous batch). nit: use " - " to bullet these two points http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803 PS17, Line 803: // Note: To handle a similar situation (as illustrated with Tables above) when finding the : // upper bound key offset, we follow a different approach. We simply do not cache : // the seeked value for the upper bound key, hence cache_seeked_value = false below. Hmm, not only are we not caching the seeked value, but we're searching for a different value than if `cache_seeked_value` were true, right? I know it's documented in SeekToNextPrefixKey(), but I think it's worth explaining here, since AFAICT this is the only place we're using `cache_seeked_value=false`. The comment in SeekToNextPrefixKey() explains this is "required for correctness," but here is an appropriate place to explain why. http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589 PS17, Line 589: const int num_matching = GenerateData(random, num_rows, predicate_col_id, predicate_val); : Since this is generating random data, and running a scan for equality on a single value, wouldn't it be pretty unlikely that `num_matching` is non-zero? Maybe we could inflate the probability that `predicate_val` shows up by adding a `inject_predicate_val_probability` parameter to GenerateData() that will inject `predicate_val` to the `predicate_col_id` column with some probability. Looking at GenerateData(), it seems like generated data is capped between [0, 1000), so I don't think this will ever return rows? Same kMiddleColumnMinKeyValue. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 17 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 17 Aug 2018 22:13:33 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#17). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,261 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/17 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 17 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (39 comments) http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792 PS14, Line 792: // Currently this block holds encoded composite key. : Arena arena(16*1024); : : DCHECK(!prepared_blocks_.empty()); : > nit: As a general rule of thumb, if something is indicative of a programmer Thanks for this pointer. I have removed the DCHECK() in this case. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797 PS14, Line 797: us > nit: I know some of the older code has this reversed, but for new code, we Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p > Not sure if this is necessary to avoid breaking encapsulation, as I think w ACK. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p > It's worth documenting in the header the side effect of `cache_seeked_value If `cache_seeked_value` is true, then a call to StoreCurrentValue() will not seek an extra row. Instead, due to the side effect of using CopyNextValues(size_t *n, ColumnDataView *dst) the first value from the last prepared PK column block will be fetched into 'dst'. I have now documented this in the header file. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@790 PS15, Line 790: Todo > nit: s/Todo/TODO/ Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@793 PS15, Line 793: 16*1024 > Why did we go from using a constant back to using a magic number? http://wi Made this a member variable of CFileIterator as a followup of the review comment in cfile_set.cc (L553). After discussing with Alexey, we came to the conclusion that it is safe to initialize Arena with an initial buffer size of 1M. If required, the Arena size grows wrt the data being allocated to it. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@795 PS15, Line 795: DCHECK(!prepared_blocks_.empty()); : if (prepared_blocks_.empty()) { : return Status::NotFound("blocks not found"); : } > This has a code smell. https://martinfowler.com/bliki/CodeSmell.html In deb Yes it is possible. I have removed DCHECK and handled this condition by returning Status::NotFound. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc File src/kudu/common/encoded_key.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92 PS14, Line 92:gscoped_ptr *key, :Arena* arena, int32_t num_columns > Per the GSG (https://google.github.io/styleguide/cppguide.html): Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@175 PS15, Line 175: // Due to the caveat(see below) listed for SkipToNextScan(size_t *remaining), : // this class should not reference a separate "client" class that uses key_iter->cur_val. > I find this part of the comment confusing and I think we should just remove Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@248 PS15, Line 248: Caveat: > I don't think we need to specifically call this out as a caveat Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@249 PS15, Line 249: key_iter->cur_val_ > How about just key_iter_? Is that sufficient? Talking about a private membe Ok. Makes sense. Changed this to just key->iter_. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@362 PS15, Line 362: skip_scan_num_seeks_cutoff > missing underscore suffix on member variable Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: PS14: > nit: s/unique prefix/distinct prefix Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174 PS14, Line 174: // : // Due to the caveat(see below) listed for SkipToNextScan(size_t
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#16). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,261 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/16 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 16 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551 PS12, Line 551: atus CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value) { : gscoped_ptr enc_key; : Arena arena(kEncodedCompositeKeyMaxSize); > Might be helpful: from util/memory/arena.h and the definition of Reset(): +1 Using dedicated Arena per iterator instance sounds good to me. It seems we can easily reset it in the long loop of the SkipToNextScan() method. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770 PS14, Line 770: Check to see whether we have effectively seeked backwards > We effectively have two pointers we are tracking with skip-scan: the primar Adding a comment would definitely help. That might be a comment for the method, as well as an extra for already existing high-level description of the algorithm at line 656. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557 PS15, Line 557: // columns of the cached value obtained after > I think it's possible to pass an instance of arena from the upper level to And the best approach I think is suggested by Andrew -- have an instance of Arena as a member of CFileSet::Iterator. Probably, it's necessary to reset it after use in every cycle of the loop in SkipToNextScan() method. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 16 Aug 2018 07:24:09 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (9 comments) http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557 PS15, Line 557: Arena arena(FLAGS_max_encoded_key_size_bytes); I think it's possible to pass an instance of arena from the upper level to use it here, like it's done in DecodeCurrentKey(). BTW, that parameter for the Arena constructor -- it's just initial size for the arena, it can grow bigger up to the allowed max size (by default 127 * 8KB, i.e. almost 1 MB). However, setting it up to FLAGS_max_encoded_key_size_bytes is a good enough (arena grows 2x times when needed). http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@571 PS15, Line 571: ( nit: this and the pairing brace are not needed. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@597 PS15, Line 597: KuduPartialRow *p_row, : const gscoped_ptr& cur_enc_key style nit: usually we have input parameter specified first, then go output parameters. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@603 PS15, Line 603: auto const style nit: const auto& value http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@606 PS15, Line 606: p_row->Set(col_id, data); What if this returns non-OK? http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@611 PS15, Line 611: p_row->Set(skip_scan_predicate_column_id_, suffix_col_value); ditto http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@619 PS15, Line 619: ( nit: this and the closing parenthesis are not needed http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@742 PS15, Line 742: CheckPredicateMatch(lower_bound_key); I'm not sure this is what is needed: CheckPredicateMatch() verifies that only the first column of the predicate matches, but we want all columns of the predicate to match, no? If it's what we actually need, maybe add a comment that the lower bound key is just about the match of the first column of the predicate. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@811 PS15, Line 811: ERROR I think this should be WARNING, because there are some legit cases where SkipToNextScan() can return non-OK. E.g.: E0815 21:19:24.748093 1681 cfile_set.cc:811] Skip scan failed: Illegal state: No lexicographically greater key exists -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 16 Aug 2018 06:49:03 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (4 comments) A bit more feedback for the test. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: PS15: We discussed that offline, but just to re-iterate: It would be nice to add a few scenario where cardinality of predicate-matching rows is close to the number of all rows inserted, where number of inserted rows would be in order of 10^3. The idea is to exercise SkipToNextScan() method's logic more often. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@18 PS15, Line 18: #include nit: please use c++ style include: #include http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@106 PS15, Line 106: s.IsAlreadyPresent What if num_matching was incremented, but the IsAlreadyPresent() was reported on an attempt to insert a row? Maybe, num_matching should be incremented only in case if row was inserted? http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@129 PS15, Line 129: { nit: here and elsewhere under this switch -- remove those braces, they are not needed. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 16 Aug 2018 00:03:33 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (1 comment) This is a drive-by review; I probably won't check in for another several days http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p > It's worth documenting in the header the side effect of `cache_seeked_value Not sure if this is necessary to avoid breaking encapsulation, as I think we can call the whole key_iter_ reserved by the Iterator class and ignore the private member; see my comment re. key_iter_ on Rev 15 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 15 Aug 2018 17:31:00 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (8 comments) Did a cursory review pass, nice to see the improvements, looking good to me http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@790 PS15, Line 790: Todo nit: s/Todo/TODO/ http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@793 PS15, Line 793: 16*1024 Why did we go from using a constant back to using a magic number? http://wiki.c2.com/?MagicNumber We should use the gflag we are using in the other file here as well. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@795 PS15, Line 795: DCHECK(!prepared_blocks_.empty()); : if (prepared_blocks_.empty()) { : return Status::NotFound("blocks not found"); : } This has a code smell. https://martinfowler.com/bliki/CodeSmell.html In debug mode, we make it impossible to hit this condition, yet in release mode, we handle it. That means it is impossible to have test coverage of this situation, which is bad. We should pick one or the other: either this is impossible, i.e. a code error, in which case just use a CHECK, or it's possible, and just handle it by returning Status::NotFound. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@175 PS15, Line 175: // Due to the caveat(see below) listed for SkipToNextScan(size_t *remaining), : // this class should not reference a separate "client" class that uses key_iter->cur_val. I find this part of the comment confusing and I think we should just remove it http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@248 PS15, Line 248: Caveat: I don't think we need to specifically call this out as a caveat http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@249 PS15, Line 249: key_iter->cur_val_ How about just key_iter_? Is that sufficient? Talking about a private member of a different class breaks encapsulation for that class. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@362 PS15, Line 362: skip_scan_num_seeks_cutoff missing underscore suffix on member variable http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770 PS14, Line 770: reak; > How does this happen? Could you elaborate a bit? What does seeking backward We effectively have two pointers we are tracking with skip-scan: the primary key (PK) value index (ad-hoc index) search pointer, which gives us a forward index of value to position, and the scan pointer, which is cur_idx_, representing an offset one more than the last row that we actually scanned. In this case, we searched the PK and came up with a position lower than our scan offset, which means even though we tried to search forward relative to our previous PK lookups, we ended up landing at an offset that we'd already scanned. The awkward reason we have to deal with this possibility is because we don't have a reverse PK index from offset (position) to PK (value), otherwise we'd . Maybe we should try to explain the above somewhere in a comment, although I'm not sure of the best place to discuss it. Maybe near the top of this method in the .cc file around line 665? -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 15 Aug 2018 17:27:12 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (16 comments) http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792 PS14, Line 792: // Currently this block holds encoded composite key. : Arena arena(16*1024); : : DCHECK(!prepared_blocks_.empty()); : nit: As a general rule of thumb, if something is indicative of a programmer error, we'll add a CHECK or DCHECK, whereas if there's an issue with the underlying data, we'll return an error. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797 PS14, Line 797: us nit: I know some of the older code has this reversed, but for new code, we favor placing the sigil immediately after the var name (PreparedBlock* pblk), here and elsewhere. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p It's worth documenting in the header the side effect of `cache_seeked_value`, i.e. if it is set to true, this call to StoreCurrentValue() will seek an extra row, right? Another approach I chatted with you about before was that maybe we can avoid pushing this bookkeeping so low by slightly adjusting the APIs: Status SeekAtOrAfter(const EncodedKey& encoded_key, bool* exact_match); // Same behavior as before this patch. Status ConsumeNextValue(string* val); // Combines StoreCurrentValue() and GetCurrentValue() Then, callers (i.e. CFileSet::Iterator) could use key_iter->ConsumeNextValue() immediately after SeekAtOrAfter() instead of doing SeekAtOrAfter(/* cache_seeked_values=*/true), the CFileIterator::cur_val_ member could be moved to the CFileSet::Iterator with a more telling name, like 'skip_scan_cur_key_' or somesuch, and we wouldn't have to have caveats and comments explaining the restricted access to cur_val_. Might be a pretty substantial change, but it might with the broken encapsulation. WDYT? http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc File src/kudu/common/encoded_key.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92 PS14, Line 92:gscoped_ptr *key, :Arena* arena, int32_t num_columns Per the GSG (https://google.github.io/styleguide/cppguide.html): nit: spacing nit: output parameters should come after input parameters http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: PS14: nit: s/unique prefix/distinct prefix http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174 PS14, Line 174: // : // Due to the caveat(see below) listed for SkipToNextScan(size_t *remaining), : // this class should not reference a separate "client" class that uses key_iter->cur_val. Hmm, I'm not sure what this means. See my comment in cfile_reader.cc re: an alternate approach. Also nit: add a space in "caveat (see" http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@208 PS14, Line 208: : // Decode the currently-seeked key into 'enc_key'. : Status Before getting into the nitty gritty of each method, I think it'd be helpful to start out by adding a block comment with: - a brief, high-level overview of what a skip scan is - definitions of "predicate column", "prefix columns", "prefix keys", in the context of a skip scan - definitions of a lower and upper bound in the context of a skip scan, and how they're used when seeking With the definitions in one place, it'd be easier to read about, and you wouldn't have to sprinkle definitions throughout the below comments. You've already got a good base at L238. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@209 PS14, Line 209: // Decode the currently-seeked key into 'enc_key'. : Status DecodeCurrentKey(Arena* arena, gscoped_ptr* enc_key); : : // This function is used to place the validx_iter_ at the next greater prefix key. : // prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling : // this function). : // If 'cache_seeked_value' is true, the validx_iter_ will store the value seeked to. : Status SeekToNextPrefixKey(size_t
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 15: (7 comments) I have addressed all the open comments. PTAL. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: > Yep, it would be nice to have a test with super-wide primary key (multiple Added a test case where the encoded primary key size is equal to the max allowable encoded PK size (16MB), stored as max_encoded_key_size_bytes. Also, verified that any insertions that violate this size constraint results in failure. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444 PS12, Line 444: col_id = > If that's something we expect to happen, I would simply disable the skip sc Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462 PS12, Line 462: } : : if (nonfirst_key_pred_exists) { : use_skip_scan_ = true; : : // Store the predicate column id. : skip_scan_predicate_column_id_ = min_col_id; : : > I think it's better to keep it as Andrew requested. After some considerati Sure. Done. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@414 PS14, Line 414: id CFileSet::Ite > nit: since this is not to change in the code below, add 'const' Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@576 PS14, Line 576: > style nit: keep the brace with the name of the method Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@699 PS14, Line 699: : skip_scan_num_seeks_++ > I think it makes sense to introduce a metric to capture cases when skip sca Added this info in VLOG(1). http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc@189 PS14, Line 189: break; : } : case kMaxSizedEncodedKey: { : CHECK_OK(builder.AddKeyColumn("P1", STRING)); : CHECK_OK(builder.AddKeyColumn > nit: all there might be constexpr: Done -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 15 Aug 2018 04:55:52 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#15). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,069 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/15 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: static const int kEncodedCompositeKeyMaxSize = > Can we test what happens when someone violates this? Yep, it would be nice to have a test with super-wide primary key (multiple columns, big cells) and try to run scans with skip scan enabled. As for the desired behavior, at least tservers should not crash, and the logic of the skip scan optimizations should behave predictably, not outright failing the whole scan operation. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@699 PS14, Line 699: use_skip_scan_ = false; : return Status::OK(); I think it makes sense to introduce a metric to capture cases when skip scan decided to bail due to the seek cutoff condition. For examples, please looks for examples in the code: check for METRIC_DEFINE_gauge_int64 and INSTANTIATE_METRIC patterns. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 14 Aug 2018 20:18:45 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: static const int kEncodedCompositeKeyMaxSize = > Refactored this name to: kEncodedCompositeKeyMaxSize as it makes more sense Can we test what happens when someone violates this? -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 14 Aug 2018 14:43:04 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444 PS12, Line 444: col_id == > Yes, if the column is not found in the schema. Not quite sure whether to co If that's something we expect to happen, I would simply disable the skip scan and break out of the cycle, adding a warning log message about wrong column name in the predicate. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 14 Aug 2018 05:52:36 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 14: (6 comments) http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462 PS12, Line 462: : // Store the predicate column id. : skip_scan_predicate_column_id_ = min_col_id; : : // Store the predicate value. : skip_scan_predicate_value_ = pred_value; : : // Store the cutoff on the number of skip scan seeks. : > Alexey, please let me know if you feel that this change should be reverted. I think it's better to keep it as Andrew requested. After some consideration I realized that's clearer because it's moved out of the cycle at least. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@414 PS14, Line 414: int num_key_cols nit: since this is not to change in the code below, add 'const' http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@576 PS14, Line 576: ( style nit: keep the brace with the name of the method http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@321 PS12, Line 321: auto pred_p1 = ColumnPredicate::Equality(schema_.column(0), _p1); : auto pred_p2 = ColumnPredicate::Equality(schema_.column(1), _p2); : spec.AddPredicate(pred_p1); : spec.AddPredicate(pred_p2); : vector results; : NO_FATALS(ScanTablet(, , "Exact match on P1 and P2")); : EXPECT_EQ(1, results.size()); : } : break; : : c > Changed to a more uniform layout. Please let me know if this looks good. thanks, lgtm http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: PS14: This seems to be a good set of unit tests regarding testing scan using different combinations primary key columns. How about adding a simple unit test to verify that skip scan is enabled/disabled in cfileset iterator? I.e., I was thinking to verify functionality of CFileSet::Iterator::TryEnableSkipScan(). What do you think? http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc@189 PS14, Line 189: const int32_t kNumDistinctP1 = 2; : const int16_t kNumDistinctP2 = 10; : const int kNumDistinctP3 = 5; : const int8_t kNumDistinctP4 = 1; : const int8_t kNumDistinctP5 = 20; nit: all there might be constexpr: constexpr int32_t kNumDistinctP1 = 2; ... -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 14 Aug 2018 05:47:54 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has abandoned this change. ( http://gerrit.cloudera.org:8080/11067 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Abandoned This is an obsolete branch. -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 28 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 14: (25 comments) Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@345 PS12, Line 345: bool cache_seeked_value > nit: Even though this is calling StoreCurrentValue(), I think calling this Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@348 PS12, Line 348: const std::string& GetCurrentValue() const; > Add 'const' modifier for this method. Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472 PS12, Line 472: // Value currently pointed to by validx_iter_. : std::string cur_val_; : > +1 Andrew, in case we have a single public function GetCurrentValue(), it would actually lead to different results because it uses CopyNextValues(size_t *n, ColumnDataView *dst) . CopyNextValues(..) fetches the next 'n' values from the block into 'dst' . Hence, the need to store this fetched value in 'cur_val_' . http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@473 PS10, Line 473: std::string cur_val_; : > This breaks encapsulation. We should document something along these lines i Listed this caveat in CFileSet::Iterator . Please let me know if it fine. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc@792 PS12, Line 792: DCHECK(!prepared_blocks_.empty()); > What happens if this does not hold in release build? Makes sense. Added return non-OK status (Status::NotFound) in this case. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@213 PS12, Line 213: // prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is > nit: Reword as "If `cache_seeked_value` is true, the predicate column itera Rephrased as: "If 'cache_seeked_value' is true, the validx_iter_ will store the value seeked to." , since the iterator is a validx_iter_ . http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@220 PS12, Line 220: Status SeekToRowWithCurPrefixMatchingPred(const gscoped_ptr& enc_key); : > nit: reword as "Build the key with the same prefix as `cur_enc_key`, that h Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@19 PS12, Line 19: #include nit: please use '#include ' instead and place that among other C++-s Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@76 PS12, Line 76: > Per our discussion with Mike today, I thought the idea was to have this dis You are right. I assumed this will be disabled right before merging (in the final patch). Set this to 'false' now, as now we are testing both the scenarios in index_skipscan_test.cc http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@435 PS12, Line 435: > nit: you could use 'auto' here. Removed predicates var as per comment on L436. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@436 PS12, Line 436: spec.predi > nit: this can just be `spec.predicates()`, then there would be no need for Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@440 PS12, Line 440: // Get the column id from the predicate > nit: move this out of the loop, maybe up by L413 and then use it in definin Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444 PS12, Line 444: col_id == > Could find_column() return -1 here? Yes, if the column is not found in the schema. Not quite sure whether to continue the loop in this case or just return with skip scan disabled. Any suggestion ? http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462 PS12, Line 462: : // Store the predicate column id. : skip_scan_predicate_column_id_ = min_col_id; : : // Store the predicate value. : skip_scan_predicate_value_ = pred_value; : : // Store the cutoff on the number of skip scan seeks. : > It was originally written as described, but I asked her to make the change, Alexey, please let
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#14). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,023 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/14 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has removed Alexey Serbin from this change. ( http://gerrit.cloudera.org:8080/11067 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Removed reviewer Alexey Serbin. -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 28 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11067 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 28: > Removed reviewer Mike Percy. I'm not sure I understand why this changelist is still around. Isn't it obsoleted by https://gerrit.cloudera.org/#/c/10983/ ? Do you expect any reviews to be done here? Or the 'primary one' is https://gerrit.cloudera.org/#/c/10983/ ? -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 28 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 11 Aug 2018 04:07:54 + Gerrit-HasComments: No
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#28). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,023 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/28 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 28 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has removed Mike Percy from this change. ( http://gerrit.cloudera.org:8080/11067 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Removed reviewer Mike Percy. -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 27 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#27). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,023 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/27 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 27 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11067 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 26: This should be abandoned in favor of https://gerrit.cloudera.org/c/10983/ since that is the change request that has all the review comment history. In order to post to the other one, you must put the following in the commit message: Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Then click the "Abandon" button on this CR. -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 26 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 08 Aug 2018 11:44:31 + Gerrit-HasComments: No
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 13: (13 comments) Went through a few more files; leaving what I have so far. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@345 PS12, Line 345: bool set_current_value nit: Even though this is calling StoreCurrentValue(), I think calling this `cache_seeked_value` makes it this a bit clearer. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472 PS12, Line 472: // Value currently pointed to by validx_iter_. : // Currently, it is assumed that CFileSet::Iterator::SkipToNextScan(size_t *remaining) : // has exclusive access to use this variable. > Any chance to make update this to keep better level of encapsulation? +1 I might be missing something, but would we be able to only have a single public function GetCurrentValue() that does what StoreCurrentValue() does, but has a string as an outparameter? Not sure if this would yield the different results than if we were to continue storing via 'set_current_value'. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@213 PS12, Line 213: // 'cache_seeked_value' controls whether we will remember the next key seeked to in : // this function. nit: Reword as "If `cache_seeked_value` is true, the predicate column iterator will store the value seeked to. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@220 PS12, Line 220: // Builds a key containing the current prefix key, predicate column value : // and the minimum value for rest of the key columns. nit: reword as "Build the key with the same prefix as `cur_enc_key`, that has `skip_scan_predicate_value_` in its predicate column, and the minimum possible value for all other columns. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@436 PS12, Line 436: nit: this can just be `spec.predicates()`, then there would be no need for a `predicates` var http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@440 PS12, Line 440: const string& col_name = col_and_pred.first; nit: move this out of the loop, maybe up by L413 and then use it in defining num_key_cols. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462 PS12, Line 462: } : : if (nonfirst_key_pred_exists) { : use_skip_scan_ = true; : : // Store the predicate column id. : skip_scan_predicate_column_id_ = min_col_id; : : > nit: why not to move this under the same scope where the nonfirst_key_pred_ It was originally written as described, but I asked her to make the change, reason being it seems a bit off to update these variables if we end up disabling the skip scan. Not sure my concern is important, since these variables (I think) are unused if we're not skip-scanning. I'd be ok reverting if you prefer, but I agree the semantics should be updated. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551 PS12, Line 551: : Status CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value) { : gscoped_ptr enc_key; Hmm.. This is doing a lot of heap-allocating and gets called _very often_. I wonder if we could keep a single Arena or unique_ptr as a member of the iterator, though I'm not sure our Arena implementation is reusable in that way. Might be worth looking into if you notice high memory usage in your cluster testing http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@561 PS12, Line 561: _key, , num_prefix_cols)); : : if (!cache_seeked_value) { : return key_iter_->SeekAtOrAfter(*enc_key, : /* set_current_value= */ cache_seeked_value, : /* exact_match= */ nullptr); : } : : // Set the predicate column to the predicate value in case we can find a : // predicate match in one search. As a side effect, GetKeyWithPredicateVal() : // sets minimum values on the columns after the predicate value, which is : // required for correctness here. : KuduPartialRow partial_row(&(base_data_->tablet_schema()));
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 12: (13 comments) http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@348 PS12, Line 348: const std::string& GetCurrentValue(); Add 'const' modifier for this method. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472 PS12, Line 472: // Value currently pointed to by validx_iter_. : // Currently, it is assumed that CFileSet::Iterator::SkipToNextScan(size_t *remaining) : // has exclusive access to use this variable. Any chance to make update this to keep better level of encapsulation? If not, maybe add TODO here to be more aware of this caveat. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc@792 PS12, Line 792: DCHECK(!prepared_blocks_.empty()); What happens if this does not hold in release build? I'm curious why that would not be an option to return non-OK status in that case in addition to DCHECK() ? http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@19 PS12, Line 19: #include nit: please use '#include ' instead and place that among other C++-style include directives below, keeping the list sorted alphabetically. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@76 PS12, Line 76: true Per our discussion with Mike today, I thought the idea was to have this disabled in first commit before more-like-real-world testing is done. Am I missing something? http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@435 PS12, Line 435: std::unordered_map nit: you could use 'auto' here. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444 PS12, Line 444: col_id = Could find_column() return -1 here? http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462 PS12, Line 462: if (nonfirst_key_pred_exists) { : use_skip_scan_ = true; : : // Store the predicate column id. : skip_scan_predicate_column_id_ = min_col_id; : : // Store the predicate value. : skip_scan_predicate_value_ = pred_value; : } nit: why not to move this under the same scope where the nonfirst_key_pred_exists set to 'true' but after the 'if (col_id < min_col_id)' closure? Doing so, you could get rid of the nonfirst_key_pred_exists, but don't forget to add a comment about the semantics of those updates. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@542 PS12, Line 542: * nit: stick the asterisk to the type of the parameter. http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@92 PS12, Line 92: s what it was any other error but IsAlreadyPresent? http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@270 PS12, Line 270: int schema_num = static_cast(GetParam()); Does it make sense to add additional dimension to the test: whether the skip-scan feature enabled? In that way we can make sure that the predicates return the same result set in both cases, and that seems to be a good sanity check. For an example of combining multiple parameters in a Cartesian product see various tests containing text 'testing::Combine' http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@280 PS12, Line 280: ASSERT_NO_FATAL_FAILURE nit: you could use NO_FATALS() shortcut instead for better readability http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@321 PS12, Line 321: case kThreePK: { : // Test predicate on the third PK column. : ScanSpec spec; : Slice value_p3("2_p3"); : auto pred_p3 = ColumnPredicate::Equality(schema_.column(2), _p3); : spec.AddPredicate(pred_p3); : vector results; : ASSERT_NO_FATAL_FAILURE(ScanTablet(, , "Exact match on P3")); : EXPECT_EQ(20, results.size()); : break; : } nit: use the same layout of scopes and braces as in other cases (that's for better readability). -- To view, visit
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#13). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,018 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/13 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 13 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#26). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,018 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/26 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 26 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#25). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,016 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/25 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 25 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#12). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,017 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/12 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 12 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#24). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,017 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/24 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 24 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@470 PS10, Line 470: Status StoreCurrentValue(); : > super-nit: in reading the name SetX(), based on other usages in the codebas Done http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@353 PS10, Line 353: // Whether the skip scan optimization has searched the current prefix for a predicate match : // or whether the prefix has changed since its l > How about change this to: Done -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 11 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 07 Aug 2018 23:25:16 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 11: (41 comments) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: static const int kEncodedCompositeKeyMaxSize = > Great question. This is from http://kudu.apache.org/docs/known_issues.html# Refactored this name to: kEncodedCompositeKeyMaxSize as it makes more sense wrt our skip scan use case. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@339 PS10, Line 339: to store the value obtaine > nit: note that the "current value" refers to the value after seeking. Done http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h@63 PS10, Line 63: ; > nit: what does this do? Created another function IncrementEncodedKeyColumns(..) for better separation of responsibility and documented this parameter. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h File src/kudu/common/partial_row.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h@52 PS10, Line 52: mplate struct SliceTypeRowOps; // IWYU pragma: keep : template struct NumTypeRowOps; // IWYU pragma: keep > nit: not your fault, but mind removing the spaces here? Done http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207 PS10, Line 207: gscoped_ptr > nit: if the change would be contained within cfile_set.cc, mind updating th Noted. Sticking with gscoped_ptr for now since it's used outside cfile_set.cc (in encode_key.cc). http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@209 PS10, Line 209: refix key. > nit: Since this isn't a variable name, maybe take out the underscore. Alright. Changed here and elsewhere. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210 PS10, Line 210: um_prefix_ > nit: 'num_prefix_cols' Done http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210 PS10, Line 210: // prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling SeekToNextPrefixKey()). : // 'cache_seeked_value' controls whether we will remember the next key seeked to in : // this function. > nit: can you reword this to clarify whether "current" refers to before or a Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@76 PS9, Line 76: true > Let's default this to false for the initial merge and do some testing to en Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@80 PS9, Line 80: "Max number of skip attempts the skip scan optimization should make before " > Add: Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@405 PS9, Line 405: e > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@424 PS9, Line 424: > nit: add spaces around your operators on this line, both the assignment = a Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@430 PS9, Line 430: key_cols; > nit: move this comment inside the 'if' to right below this line Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@436 PS9, Line 436: nst auto& co > nit: unnecessary parentheses Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@437 PS9, Line 437: > nit: put this comment on its own line Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@444 PS9, Line 444: p); > nit: unnecessary parentheses Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@522 PS9, Line 522: << KUD > nit: remove so much extra horizontal white space on this line Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@531 PS9, Line 531: " > s/or/which we consider to be/ Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@532 PS9, Line 532: > currently-seeked key in the primary key index For better clarity, changed this to - "cached value obtained after the previous seek in the primary key index". http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@576 PS9, Line 576: } > nit: just
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#11). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 10 files changed, 1,016 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/11 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 11 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: static const int kPrimaryKeyMaxSize = 16*1024; > nit: is this a hard limit or just a target? How is this enforced? Great question. This is from http://kudu.apache.org/docs/known_issues.html#_primary_keys and we haven't investigated how it is enforced yet or whether there is another constant available for this, although I grepped around and didn't find one. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@473 PS10, Line 473: // Currently, it is assumed that CFileSet::Iterator::SkipToNextScan(size_t *remaining) : // has exclusive access to use this variable. This breaks encapsulation. We should document something along these lines in CFileSet::Iterator itself with respect to the validx_iter_ variable; this class should not reference a separate "client" class that uses it, which would be especially surprising considering this is a private variable. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@353 PS10, Line 353: // To track whether the lower bound prefix key rolled over between the first and : // second seek to determine the lower bound key. How about change this to: // Whether the skip scan optimization has searched the current prefix for a predicate match or whether the prefix has changed since its last check. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 07 Aug 2018 19:43:24 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 10: (14 comments) I've only been through a few files, but here are some nits so far. Seems there are a few open points from Mike too. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: static const int kPrimaryKeyMaxSize = 16*1024; nit: is this a hard limit or just a target? How is this enforced? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@339 PS10, Line 339: to store the current value nit: note that the "current value" refers to the value after seeking. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@470 PS10, Line 470: Status SetCurrentValue(); : super-nit: in reading the name SetX(), based on other usages in the codebase (grep -r "Set\w" in src/kudu), I kind of expect the contract of some SetX(x) to set some internal variable to x. Mind changing this to something like StoreCurrentValue or CacheCurrentValue? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h@63 PS10, Line 63: int num_columns = 0 nit: what does this do? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h File src/kudu/common/partial_row.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h@52 PS10, Line 52: template struct SliceTypeRowOps; // IWYU pragma: keep : template struct NumTypeRowOps; // IWYU pragma: keep nit: not your fault, but mind removing the spaces here? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207 PS10, Line 207: Arena* arena nit: note that (I think) arena stores the buffer containing with the encoded key http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207 PS10, Line 207: gscoped_ptr nit: if the change would be contained within cfile_set.cc, mind updating the newly added occurrences of gscoped_ptr to unique_ptr? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@209 PS10, Line 209: prefix_key nit: Since this isn't a variable name, maybe take out the underscore. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210 PS10, Line 210: "num_cols" nit: 'num_prefix_cols' http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210 PS10, Line 210: // "prefix_key" refers to the first "num_cols" columns of the current key : // (current key is the key currently pointed to by validx_iter_). : // 'cache_seeked_value' controls whether we will remember the key seeked to : // in the "current value" of the iterator. nit: can you reword this to clarify whether "current" refers to before or after the seek? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@413 PS10, Line 413: !(lower_bound_idx_ == 0 && upper_bound_idx_ == row_count_) nit: IMO slightly easier to read as (lower_bound_idx_ != 0 || upper_bound_idx != row_count_) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@423 PS10, Line 423: const std::unordered_map = spec.predicates(); : for (auto it=predicates.begin(); it!=predicates.end(); ++it) { Merge these into something like: for (const auto& col_and_pred: predicates) { const string& col_name = col_and_pred.first; const ColumnPredicate& pred = col_and_pred.second; const Schema& schema = base_data_->tablet_schema(); // ... use `col_name`, `pred`, and `schema` } http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@441 PS10, Line 441: skip_scan_predicate_column_id_ = col_id; : : // Store the predicate value. : skip_scan_predicate_value_ = (it->second).raw_lower(); I think it's generally good hygiene that these don't get modified until after we know we're enable skip-scanning, like we do for `use_skip_scan_`. Though maybe it doesn't matter if these aren't going to be used by the iterator anyway? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@509 PS10, Line 509: TryEnableSkipScan(*spec); nit: I know it doesn't make a difference execution-wise, but I think this makes more sense up by Init() at L394 -- To view, visit http://gerrit.cloudera.org:8080/10983
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 10: (31 comments) Just did a nit pass. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/cfile/cfile_reader.h@344 PS9, Line 344: Status SeekAtOrAfter(const EncodedKey _key, :bool *exact_match, bool set_current_value = false); style nit: since 'exact_match' is an out-parameter, it should go last, so let's make the signature of this: Status SeekAtOrAfter(const EncodedKey _key, bool set_current_value = false, bool *exact_match = nullptr); Per https://google.github.io/styleguide/cppguide.html#Output_Parameters http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/common/encoded_key.h@63 PS9, Line 63: num_columns document this parameter http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@76 PS9, Line 76: true Let's default this to false for the initial merge and do some testing to enable it. Add: TAG_FLAG(enable_skip_scan, experimental); http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@80 PS9, Line 80: Add: TAG_FLAG(skip_scan_short_circuit_loops, hidden); TAG_FLAG(skip_scan_short_circuit_loops, advanced); http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@405 PS9, Line 405: nit: extra space http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@424 PS9, Line 424: = nit: add spaces around your operators on this line, both the assignment = and the comparison != http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@430 PS9, Line 430: // Predicate exists on the first PK column. nit: move this comment inside the 'if' to right below this line http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@436 PS9, Line 436: (it->second) nit: unnecessary parentheses http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@437 PS9, Line 437: // Track the minimum column id. nit: put this comment on its own line http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@444 PS9, Line 444: (it->second) nit: unnecessary parentheses http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@522 PS9, Line 522: nit: remove so much extra horizontal white space on this line http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@531 PS9, Line 531: or s/or/which we consider to be/ http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@532 PS9, Line 532: current key currently-seeked key in the primary key index http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@576 PS9, Line 576: nit: just indent 4 spaces here per https://google.github.io/styleguide/cppguide.html#Function_Declarations_and_Definitions The preferred alternative is to align the parameters vertically but in this case I think that would make the line too long and it would get flagged by the lint checker. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@581 PS9, Line 581: : nit: put a space on both sides of the colon in this case to be consistent with the rest of the Kudu code base http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@598 PS9, Line 598: + nit: space around the + for consistency with the rest of the Kudu code base please http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@600 PS9, Line 600: & nit: put the sigil next to the type, not the variable, for consistency with the rest of the Kudu code base, i.e. const ColumnSchema& col = ... Here and elsewhere http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@695 PS9, Line 695: gscoped_ nit: is there something else we should do in this scenario (current key decoding failure) besides crash? Maybe we should RETURN_NOT_OK and have this function return a Status? That way we can detect that skip scan failed, log it, and then disable the optimization? In fact I think that would be much better for the rest of the places we CHECK_OK here... I think RETURN_NOT_OK would be better. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/index_skipscan-test.cc@50 PS9, Line 50: nit: multiple consecutive blank lines not needed here, one is enough
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 10: (8 comments) Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@260 PS9, Line 260: key_iter_ is not NULL. > this is no longer true (checked inside the method) Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@264 PS9, Line 264: //higher unique prefix) to scan. : // 2. 'remaining' stores the number the entries to be scanned in the current scan range > Is this an implementation detail or part of the interface of this method? I It is an implementation detail based on the assumption that currently after initializing CFileSet::Iterator, CFileSet::Iterator::SkipToNextScan(..) has exclusive access to this variable. Have added a line about this where this variable is defined. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@266 PS9, Line 266: // : // See the .cc file for details on the approach and the implementation. : void SkipToNextScan(size_t *rema > These variables are local to this method, so I don't see why a caller would Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@269 PS9, Line 269: : // Collect the IO statistics for each of the underlying columns. > This seems to be one of the key postconditions from my perspective, since i Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@271 PS9, Line 271: l void Ge > nit: 'remaining' Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@639 PS9, Line 639: // 1. Search within the ad-hoc index for the next unique prefix : //(set of keys prior to the first predicate column). : //Searching is done using validx_iter_. : // 2. Read that unique prefix from the ad-hoc index, append the lower bound : //of the first predicate column, and seek to that. : //If this matches, this is the lower bound of our desired scan. : // 3. If we found our desired lower bou > How about using something a bit higher level to describe this algorithm: Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@671 PS9, Line 671: ek to the next prefix if our p > The values need to be inverted due to the variable name change; it doesn't Done http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@820 PS9, Line 820: tion( > nit: unexpected indentation Done -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 07 Aug 2018 00:45:14 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#10). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 977 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/10 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#23). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 977 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/23 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 23 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#22). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 978 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/22 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 22 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#21). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 979 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/21 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 21 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#20). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 979 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/20 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 20 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@260 PS9, Line 260: cur_idx_ >= skip_scan_upper_bound_idx_. this is no longer true (checked inside the method) http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@264 PS9, Line 264: // 1. key_iter_->cur_val_ stores the first entry containing both the next unique : //prefix key and the predicate value, remaining to be scanned in the current batch. Is this an implementation detail or part of the interface of this method? If it's the former then we should remove it from the header file. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@266 PS9, Line 266: // 2. skip_scan_lower_bound_idx will store the row id of the entry found in 1. : // 3. skip_scan_upper_bound_idx is the row id which is an exclusive upper bound : //on our current scan range. These variables are local to this method, so I don't see why a caller would care about this. If we want to document this, we should document it in the implementation file. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@269 PS9, Line 269: // 4. cur_idx_ is either same as skip_scan_lower_bound_idx or remains unchanged (this occurs : //when cur_idx_ lies somewhere in between the current scan range). This seems to be one of the key postconditions from my perspective, since it affects class-level state that is accessed by other methods in this class. Can this be phrased in a way that doesn't reference local implementation variables but rather provides an explanation about what this does? http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@271 PS9, Line 271: remaining nit: 'remaining' http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@639 PS9, Line 639: // 1. Place the validx_iter_ at the entry containing the next : //unique "prefix_key" value. : // 2. Place the iterator at or after the entry formed by the "prefix_key" : //found in 1. and the predicate value. : // 3. If the seek in 2. results in exact match with the predicate value, : //store the row id that represents the last relevant entry (upper bound) wrt the : //current "prefix_key"(found in 1.) How about using something a bit higher level to describe this algorithm: 1. Search within the ad-hoc index for the next unique prefix (set of keys prior to the first predicate column). Searching is done using validx_iter_. 2. Read that unique prefix from the ad-hoc index, append the lower bound of the first predicate column, and seek to that. If this matches, this is the lower bound of our desired scan. 3. If we found our desired lower bound, find an upper bound for the scan by searching for the next row key matching one value higher than the highest value that will match our first predicate. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@671 PS9, Line 671: skip_scan_searched_cur_prefix_ The values need to be inverted due to the variable name change; it doesn't make sense to seek to the next prefix key if we didn't search the current prefix. http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@820 PS9, Line 820: nit: unexpected indentation -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 06 Aug 2018 21:04:26 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#9). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 979 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/9 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 7: (18 comments) Thanks for the comments. Also, added the condition for disabling skip scan on the fly. Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44 PS7, Line 44: class EncodedKey; > move this down to line 55 Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75 PS7, Line 75: return_from_skip_scan > instead of whether to enable this, this flag should specify how many loops Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514 PS7, Line 514: seek_to_upper_bound_key > How about: Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516 PS7, Line 516: 1024 > according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I Added this in cfile_reader.h (this being a common file for both cfile_set.cc and cfile_reader.cc where this variable will be accessed). http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518 PS7, Line 518: // Convert current key slice to encoded key. : RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), , : Slice(key_iter_->GetCurrentValue()), _key), "Invalid scan prefix key"); > This seems to be repeated in a couple of places. How about we factor this o Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533 PS7, Line 533: // Set the predicate column with the predicate value. : // This is done to avoid overshooting the lower bound on the predicate value : // in the current scan range. : KuduPartialRow partial_row(&(base_data_->tablet_schema())); : gscoped_ptr key_with_pred_value = : GetKeyWithPredicateVal(_row, enc_key.Pass()); : : return key_iter_->SeekAtOrAfter(*key_with_pred_value, _match, /* set_current_value= */ true); > Like I noted above, I think this part should be factored into its own funct Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@579 PS7, Line 579: bool CFileSet::Iterator::CheckKeyMatch(const std::vector _keys, int start_col_id, int end_col_id) { : // Encode the key currently pointed to by validx_iter_. : Arena arena2(1024); : gscoped_ptr cur_enc_key; : EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), , : Slice(key_iter_->GetCurrentValue()), _enc_key); : : for (int col_id = start_col_id; col_id <= end_col_id; col_id++) { : if (base_data_->tablet_schema().column(col_id).Stringify(raw_keys[col_id]) != : base_data_->tablet_schema().column(col_id).Stringify(cur_enc_key->raw_keys()[col_id])) { : return false; : } : } : return true; : } > Instead of checking that the key matches, why not just check that the predi Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@596 PS7, Line 596: // This is a three seek approach for index skip scan implementation: : // 1. Place the validx_iter_ at the entry containing the next : //unique "prefix_key" value. : // 2. Place the iterator at or after the entry formed by the "prefix_key" : //found in 1. and the predicate value. : // 3. If the seek in 2. results in exact match with the predicate value, : //store the row id that represents the last relevant entry (upper bound) wrt the : //current "prefix_key"(found in 1.) > move this inside the function to line 605, since it describes what we are d Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@609 PS7, Line 609: bool lower_bound_key_found; : bool predicate_match = false; > can we merge these two variables into the single variable lower_bound_key_f Done http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@614 PS7, Line 614: // Whether to seek for the next unique prefix, this flag is false : // when the prefix key gets incremented while seeking for the lower bound : // entry in the loop. : bool continue_seeking_next_prefix = true; > I think it would be more descriptive of the
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#8). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 979 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/8 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#19). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 979 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/19 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 19 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#18). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 983 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/18 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 18 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#17). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 987 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/17 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 17 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#16). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 990 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/16 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 16 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 7: (19 comments) http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44 PS7, Line 44: class EncodedKey; move this down to line 55 http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527 PS5, Line 527: > We cannot make it nullptr , because exact match is being de-referenced in f Then within SeekAtOrAfter() we can pass a temporary variable to those functions and only pass exact_match back if it's not a nullptr. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75 PS7, Line 75: return_from_skip_scan instead of whether to enable this, this flag should specify how many loops before returning, perhaps with -1 meaning unlimited. Something like: DEFINE_int32(skip_scan_short_circuit_loops, 100, "Max number of skip attempts the skip scan optimization should make before " "returning control to the main loop. -1 means there is no maximum."); http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514 PS7, Line 514: seek_to_upper_bound_key How about: s/nul_cols/num_prefix_cols/ s/seek_to_upper_bound_key/cache_seeked_value/ And then break this into two separate functions: Status CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value) and Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t num_prefix_cols) because it seems like a better separation of responsibilities to have a method that seeks to the next prefix key and a different method that seeks for a match on the predicate, where the former can be used for both the upper and lower bound search, and the latter is only used for the lower bound search. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516 PS7, Line 516: 1024 according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I think this should be 16KB, so 16 * 1024, and we should use a constant for that value at the top of this file (inside the namespace), perhaps: static const int kPrimaryKeyMaxSize = 16 * 1024; http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518 PS7, Line 518: // Convert current key slice to encoded key. : RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), , : Slice(key_iter_->GetCurrentValue()), _key), "Invalid scan prefix key"); This seems to be repeated in a couple of places. How about we factor this out into its own function, like: Status CFileSet::Iterator::DecodeCurrentKey(Arena* arena, gscoped_ptr* enc_key) { RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString( base_data_->tablet_schema(), arena, key_iter_->GetCurrentValue(), enc_key), "Failed to decode current value from primary key index"); return Status::OK(); } So that we can call it here and elsewhere like: Arena arena(...); gscoped_ptr enc_key; RETURN_NOT_OK(DecodeCurrentKey(, _key)); http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533 PS7, Line 533: // Set the predicate column with the predicate value. : // This is done to avoid overshooting the lower bound on the predicate value : // in the current scan range. : KuduPartialRow partial_row(&(base_data_->tablet_schema())); : gscoped_ptr key_with_pred_value = : GetKeyWithPredicateVal(_row, enc_key.Pass()); : : return key_iter_->SeekAtOrAfter(*key_with_pred_value, _match, /* set_current_value= */ true); Like I noted above, I think this part should be factored into its own function, like: // Seek to the next predicate match within the current prefix. Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t num_prefix_cols) { gscoped_ptr enc_key; Arena arena(1024); RETURN_NOT_OK(DecodeCurrentKey(, _key)); // Check to see if the current key matches the predicate value. If so, then // there is no need to seek forward. if (CheckPredicateMatch(enc_key)) { return Status::OK(); } // If we got this far, the current key doesn't match the predicate, so search // for the next key that matches the current prefix and predicate. KuduPartialRow partial_row(&(base_data_->tablet_schema())); gscoped_ptr key_with_pred_value =
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. Patch Set 7: (21 comments) Thanks for the comments. Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG@7 PS5, Line 7: KUDU-1291 > nit: s/Kudu-1291/KUDU-1291/ Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h@340 PS5, Line 340: o value index, > document this parameter in the header file comment Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h@343 PS4, Line 343:bool *exact_match, bool set_current_value = false); > nit: add const specifier for the method. Also, consider returning const re Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@788 PS4, Line 788: > what if prepared_blocks_ is empty? If that the thing-that-cannot-be, add D Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@248 PS4, Line 248: r_bound_idx wi > style nit: here and elsewhere -- we tend to stick the asterisk and ampersan Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@399 PS4, Line 399: > If 'spec' instance is not modified here, why not to pass it as a const refe Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@414 PS4, Line 414: bool nonfirst_key_pred_exists = false; > Is copying necessary here? Why not to use const reference? Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@418 PS4, Line 418: > nit: I don't think the parentheses are necessary here Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@413 PS5, Line 413: > nit: add punctuation to all your comments (add a period at the end of the s Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@511 PS5, Line 511: > this kind of api doc should go into the header file, not the implementation Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527 PS5, Line 527: > since we are not using this, can we just make this /* exact_match= */ nullp We cannot make it nullptr , because exact match is being de-referenced in functions called by SeekAtOrAfter. I have added a comment that exact_match value is not used after the current call. http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@565 PS5, Line 565: tig > s/Get/Return/ Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@568 PS5, Line 568: const ColumnSchema = cont_row.schema()->column(i); > remove or VLOG(2) Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@581 PS5, Line 581: rena2(1024); > nit: use more descriptive variable names here Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@588 PS5, Line 588: let_schema( > how about rename this to: SkipToNextScan() Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@602 PS5, Line 602: //store the row id that represents the last relevant entry (upper bound) wrt the > I'm not comfortable merging this patch without this feature Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@662 PS5, Line 662: > exclusive upper bound Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@668 PS5, Line 668: : bool exact > what guarantees the skip_scan_upper_bound_idx_ has been set to upper_bound_ This is a valid point. I have updated the skip_scan_upper_bound_idx_ value here now. http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@675 PS5, Line 675: ow > wording nit: Check to see whether we have seeked backwards. If so, we need Done http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@684 PS5, Line 684: // We didn't find an exact match for our lower bound, so re-seek. > can we convert this to a while-loop instead of a do-while-loop, now that th Done
[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#7). Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components .. KUDU-1291. Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 920 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/7 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 7 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#6). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 920 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/6 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 6 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#15). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 920 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/15 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#14). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 925 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/14 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 14 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG@7 PS5, Line 7: Kudu-1291 nit: s/Kudu-1291/KUDU-1291/ Also preferably add a period or a colon after the JIRA number, like KUDU-1291. Efficiently blah blah -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 18:33:23 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 5: > Ah, I see -- thank you for letting me know. I thought since the WIP tag was > removed it's more or less ready. No worries, it's actually pretty close, would love to get your feedback on the algorithm -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 18:32:02 + Gerrit-HasComments: No
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 5: > > I took a quick look: my comments are nits regarding the style and > other minor stuff; I didn't look the semantics or the patch yet. > > The patch is still at a point that I think we should mostly focus > on substance rather than nits, although it will need a detailed nit > pass later. Ah, I see -- thank you for letting me know. I thought since the WIP tag was removed it's more or less ready. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 00:46:06 + Gerrit-HasComments: No
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 5: > I took a quick look: my comments are nits regarding the style and other minor > stuff; I didn't look the semantics or the patch yet. The patch is still at a point that I think we should mostly focus on substance rather than nits, although it will need a detailed nit pass later. -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 00:42:45 + Gerrit-HasComments: No
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 5: (5 comments) still reviewing, here are some initial comments http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h@340 PS5, Line 340: set_current_value document this parameter in the header file comment http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@413 PS5, Line 413: // Initialize predicate column id to an upperbound value nit: add punctuation to all your comments (add a period at the end of the sentence) http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@511 PS5, Line 511: // Increment "num_cols" no of columns this kind of api doc should go into the header file, not the implementation http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527 PS5, Line 527: _match since we are not using this, can we just make this /* exact_match= */ nullptr ? http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/index_skipscan-test.cc@400 PS5, Line 400: int predicate_col_id = 2; please add a random test case that does not put the predicate on the last column -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 00:41:02 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 4: (7 comments) I took a quick look: my comments are nits regarding the style and other minor stuff; I didn't look the semantics or the patch yet. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h@343 PS4, Line 343: std::string GetCurrentValue(); nit: add const specifier for the method. Also, consider returning const reference from this method, if possible. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@788 PS4, Line 788: prepared_blocks_ what if prepared_blocks_ is empty? If that the thing-that-cannot-be, add DCHECK() for the non-emptiness condition. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@248 PS4, Line 248: ScanSpec *spec style nit: here and elsewhere -- we tend to stick the asterisk and ampersand to the type, not to the variable/parameter name. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@399 PS4, Line 399: ScanSpec If 'spec' instance is not modified here, why not to pass it as a const reference? http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@414 PS4, Line 414: std::unordered_map predicates Is copying necessary here? Why not to use const reference? http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@418 PS4, Line 418: (it->second) nit: I don't think the parentheses are necessary here http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@500 PS4, Line 500: skip_scan_enabled_ = CanUseSkipScan(spec); Why not to assign the 'skip_scan_enabled_' field in the CanUseSkipScan() method itself since other aux members (like 'suffix_key_column_id_') are assigned in that method already? And rename CanUseSkipScan() into TryEnableSkipScan() or alike? -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 02 Aug 2018 00:33:01 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 4: (22 comments) Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@785 PS4, Line 785: Slice ret; > add: DCHECK_EQ(nullptr, validx_iter_); Added a check for this. Actually no, I think there is no guarantee that value index will always be string. I have added a flag to invoke this function (set_current_value) now . I wonder if there is an alternative to deal with this. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@204 PS4, Line 204: // This function is used to place the validx_iter_ at the next greater "prefix_key". > nit: you should try to maintain the same order in the header file as the .c Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@283 PS4, Line 283: skip_scan_enabled_ > rename to use_skip_scan_ because this is confusingly similar to the gflag n Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@286 PS4, Line 286: suffix_key_pred_value_ > rename to skip_scan_predicate_value_ Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@289 PS4, Line 289: suffix_key_column_id_ > rename to skip_scan_predicate_column_id_ Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@291 PS4, Line 291: // Row id of the last entry of "suffix_key" predicate value wrt to the : // "prefix_key" currently pointed to by validx_iter_ > use something like this as the description: Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@293 PS4, Line 293: int > make this an int64_t Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@538 PS4, Line 538: if (skip_scan_enabled_ && (static_cast(cur_idx_) > suffix_key_upper_bound_idx_)) { > Can you break this out into its own function? Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@540 PS4, Line 540: bool suffix_key_match, continue_seeking_next_prefix = false; > nit: declare these variables separately Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@542 PS4, Line 542: Status next_entry_match, suffix_key_upper_bound_match; > Can we avoid maintaining so much state outside the loop? Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@602 PS4, Line 602: continue_seeking_next_prefix = true; > not indented Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@56 PS4, Line 56: kRandom > the random tests here are not very random. Can you add an actually-random t Thanks for this suggestion. I have added the random tests now. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@145 PS4, Line 145: int32_t kNumDistinctP1 = 2; : int16_t kNumDistinctP2 = 10; : int kNumDistinctP3 = 5; : int8_t kNumDistinctP4 = 1; : int8_t kNumDistinctP5 = 20; > declare all of these variables const Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@163 PS4, Line 163: int32_t > int16_t for p2 here and in the below loops Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@226 PS4, Line 226: //Only 1 row inserted > nit: formatting, should be: Done. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@253 PS4, Line 253: for (int i = 1; i <= 1; i++) { > remove this Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@255 PS4, Line 255: i+1 > style nit: please put spaces around infix operators Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@275 PS4, Line 275: p1 ++ > style nit: please no spaces before postfix operators Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@297 PS4, Line 297: > style nit: collapse >1 consecutive empty lines into a single empty line so Done http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@306 PS4, Line 306: void ScanTablet(ScanSpec *spec, vector *results, const char *descr) { > warning: parameter 'descr' is unused
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#5). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 788 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/5 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#13). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 788 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/13 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 13 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#12). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 784 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/12 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 12 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#11). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 798 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/11 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 11 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@56 PS4, Line 56: kRandom the random tests here are not very random. Can you add an actually-random test case that generates random values? You could do something like this pseudocode: Random random(SeedRandom()); int predicate_val = random.NextInt(); num_matching = GenerateData(random, num_rows, predicate_col_id, predicate_val); // Generates and inserts given number of rows using the given PRNG, returns # rows generated that match the predicate_val on predicate_col. Spec spec = GeneratePredicate(predicate_col_id, predicate_val); ASSERT_EQ(num_matching, ScanWithSpec(spec)); // Scan with given spec and return the number of matching results. The cool thing about SeedRandom() is that if you see a failure you can reproduce the same pseudorandom sequence by passing --test_random_seed to your test when running it (it logs the random seed generated each time it's called) http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@145 PS4, Line 145: int32_t kNumDistinctP1 = 2; : int16_t kNumDistinctP2 = 10; : int kNumDistinctP3 = 5; : int8_t kNumDistinctP4 = 1; : int8_t kNumDistinctP5 = 20; declare all of these variables const http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@163 PS4, Line 163: int32_t int16_t for p2 here and in the below loops http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@226 PS4, Line 226: //Only 1 row inserted nit: formatting, should be: // Only 1 row inserted. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@253 PS4, Line 253: for (int i = 1; i <= 1; i++) { remove this http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@255 PS4, Line 255: i+1 style nit: please put spaces around infix operators http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@275 PS4, Line 275: p1 ++ style nit: please no spaces before postfix operators http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@297 PS4, Line 297: style nit: collapse >1 consecutive empty lines into a single empty line so more code fits on the screen, here and elsewhere http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@306 PS4, Line 306: void ScanTablet(ScanSpec *spec, vector *results, const char *descr) { > warning: parameter 'descr' is unused [misc-unused-parameters] see tidy bot's suggestion, or remove descr http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@312 PS4, Line 312: /*for (const string : *results) { : LOG(INFO) << str; : }*/ remove this http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@551 PS4, Line 551: default: { why do we have a default case? let's just also give this one a name -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 01 Aug 2018 00:49:13 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 4: (14 comments) http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc@793 PS3, Line 793: > Yes, this works for string and binary values stored as uint8_t byte array. What happens if the cfile is storing something other than the ad-hoc index? This is not in general safe, we should put in additional safeguards if this is how we want to do this, but I am a little skeptical about the approach, need to think about it a bit more. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@785 PS4, Line 785: Slice ret; add: DCHECK_EQ(nullptr, validx_iter_); Also, are we sure that the value index can only ever be a string? http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h@63 PS3, Line 63: um_columns = 0) > Done I don't see num_columns documented in the header file comment in rev 4 http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@204 PS4, Line 204: // This function is used to place the validx_iter_ at the next greater "prefix_key". nit: you should try to maintain the same order in the header file as the .cc file per the style guide http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@283 PS4, Line 283: skip_scan_enabled_ rename to use_skip_scan_ because this is confusingly similar to the gflag name FLAGS_enable_skip_scan http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@286 PS4, Line 286: suffix_key_pred_value_ rename to skip_scan_predicate_value_ http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@289 PS4, Line 289: suffix_key_column_id_ rename to skip_scan_predicate_column_id_ http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@291 PS4, Line 291: // Row id of the last entry of "suffix_key" predicate value wrt to the : // "prefix_key" currently pointed to by validx_iter_ use something like this as the description: // Row id of the next row that does not match the predicate. // This is an exclusive upper bound on our scan range. // A value of -1 indicates that the upper bound is not known. http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@293 PS4, Line 293: suffix_key_upper_bound_idx_ rename to skip_scan_upper_bound_idx_ http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@293 PS4, Line 293: int make this an int64_t http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@538 PS4, Line 538: if (skip_scan_enabled_ && (static_cast(cur_idx_) > suffix_key_upper_bound_idx_)) { Can you break this out into its own function? http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@540 PS4, Line 540: bool suffix_key_match, continue_seeking_next_prefix = false; nit: declare these variables separately http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@542 PS4, Line 542: Status next_entry_match, suffix_key_upper_bound_match; Can we avoid maintaining so much state outside the loop? http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@602 PS4, Line 602: continue_seeking_next_prefix = true; not indented -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 31 Jul 2018 23:17:11 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 ) Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Patch Set 4: (41 comments) Thanks for the helpful comments Mike. Please review the changes. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc@793 PS3, Line 793: > This only works if the cell is a pointer, like in the case of a string valu Yes, this works for string and binary values stored as uint8_t byte array. The ad-hoc index stores the keys that are either encoded as binary prefix blocks or binary dict blocks. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h@63 PS3, Line 63: um_columns = 0) > document this parameter in the comment Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc File src/kudu/common/encoded_key.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc@111 PS3, Line 111: num_columns, arena)) { > I thought we were going to generalize this to any column in the compound PK Right, I have updated this now. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h File src/kudu/common/partial_row.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@494 PS3, Line 494: rivate: > This class is a part of the public C++ client API (see KUDU_EXPORT) so we w Thanks, it makes sense. I have now added the calling class (CfileSet) as a friend class of KuduPartialRow. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@498 PS3, Line 498: friend class RowOpe > Why is this private field now public? Should not be necessary. This field is used in the function - CFileSet::Iterator::PrepareBatch , to set the minimum values for some columns. I am not quite sure if there is a better alternative to this. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h File src/kudu/common/wire_protocol-test-util.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h@38 PS3, Line 38: inline void AddTestRowWithNullableStringToPB(RowOperationsPB::Type op_type, > move this to the test where it's being used I have removed this now. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@203 PS3, Line 203: : // This function is used to place the va > These methods need doc comments Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@230 PS3, Line 230: initted_(false), > add documentation for this method Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@264 PS3, Line 264: size_t cur_idx_; : size_t prepared_count_; > these variables should be documented with a comment Done http://gerrit.cloudera.org:8080/#/c/10983/1/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/1/src/kudu/tablet/cfile_set.cc@669 PS1, Line 669: > instead of this loop, I wonder if we can instead "peek" at the last value s Makes sense. But I am not quite sure if we can "peek" at the last seeked ordinal in the ad-hoc index column here. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@395 PS3, Line 395: tate. > let's rename this CanUseSkipScan(), have it return a boolean, and set Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@397 PS3, Line 397: } > Add a flag DEFINE_bool(enable_skip_scan, true, "...") that allows us to tur Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@400 PS3, Line 400: int num_key_cols = base_data_->tablet_schema().num_key_columns(); > looks like a compile error? probably some unstaged changes Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@415 PS3, Line 415: for (auto it=predicates.begin(); it!=predicates.end(); ++it) { > error: use of undeclared identifier 'suffix_key_id_' [clang-diagnostic-erro Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@418 PS3, Line 418: string col_name = (it->second).column().name(); > error: use of undeclared identifier 'suffix_key_id_' [clang-diagnostic-erro Done http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@429 PS3, Line 429: nonfirst_key_pred_exists = true; > error: use of undeclared identifier 'suffix_key_id_'; did
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10983 to look at the new patch set (#4). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 889 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/4 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11067 to look at the new patch set (#9). Change subject: Kudu-1291 Efficiently support predicates on non-prefix key components .. Kudu-1291 Efficiently support predicates on non-prefix key components This patch implements index skip scan approach in the case when there exists an equality predicate on any of the non-first primary key columns. This feature uses the following approach: 1. Seek at the first unique prefix key in the ad-hoc index tree. 2. Seek to the relevant entry corresponding to the unique prefix key found in 1. and the predicate value on the suffix key column. 3. To handle the case where multiple entries exist with respect to a unique prefix key, store the upper bound index of these entries. 4. Repeat steps 1-3, till all the unique prefix keys are scanned. Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d --- M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/common/encoded_key.cc M src/kudu/common/encoded_key.h M src/kudu/common/partial_row.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h A src/kudu/tablet/index_skipscan-test.cc 9 files changed, 890 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/9 -- To view, visit http://gerrit.cloudera.org:8080/11067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d Gerrit-Change-Number: 11067 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot