[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2019-03-06 Thread Mike Percy (Code Review)
Mike Percy has uploaded a new patch set (#22) to the change originally created 
by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..

KUDU-1291: efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, until all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,333 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/22
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 22
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2019-03-06 Thread Mike Percy (Code Review)
Mike Percy has uploaded a new patch set (#21) to the change originally created 
by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..

KUDU-1291: efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, until all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,331 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/21
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 21
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-28 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 20:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@337
PS20, Line 337: store the value obtained
  :   // after seeking by validx_iter_
store the value for later use where?


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.h@338
PS20, Line 338: this value
this is ambiguous -- "this value" meaning "the value obtained" or the value of 
'cache_seeked_value'


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.h@274
PS20, Line 274:   // Details
would be great to have another section here with an example like a 
year/month/day with a predicate on month or day


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@430
PS20, Line 430:   
spec->lower_bound_key()->encoded_key().compare(base_data_->min_encoded_key_) > 
0) {
we should make sure that if there is a lower bound key that the skip scan knows 
to start at that point instead of before it


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@501
PS20, Line 501: min_col_id
this should be an index, right? same below in a few places


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@506
PS20, Line 506: predicates
this predicate list is post-optimization, right? In that case, if the user has 
specified a predicate on a leading prefix, then that prefix would have been 
converted into a range scan at an upper level and removed from the predicate 
list before we get here.


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@514
PS20, Line 514: or
||


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@518
PS20, Line 518: LOG(WARNING) << col_name << " column is not found in 
schema";
DFATAL? should we ever expect to get this?


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@535
PS20, Line 535: use_skip_scan_ = true;
do we have any way to expose this back to the scan in terms of a metric? how 
many cfile blocks skip scan was attempted, and how many times it was 
auto-disabled, how many times it skipped (how many rows it skipped), etc


http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/tablet/cfile_set.cc@863
PS20, Line 863:   if (use_skip_scan_) {
if we are using skip scan, should we also be able to _not_ scan the column that 
has the range predicate? it seems that would be a key advantage of this 
approach. For example if your PK is (entity, timestamp) and you set a range on 
timestamp, you'd want to be able to avoid reading the timestamp column at all 
because the skipping using the PK index already tells you which rows match the 
rpedicate.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 28 Aug 2018 22:45:08 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-20 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc@777
PS20, Line 777:   if (PREDICT_TRUE(validx_iter_ != nullptr)) {
  : RETURN_NOT_OK(StoreCurrentValue());
  :   }
> To make sure this patch has no side effects when disabled, this should prob
Yep, I think that's a good catch.  If we want to have a better assurance that 
the feature is truly isolated, it's better to gate it on something specific for 
'skip index scan'.  However, I don't think we have that gate readily available 
in current implementation.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 20 Aug 2018 21:38:06 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-20 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 20: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/20/src/kudu/cfile/cfile_reader.cc@777
PS20, Line 777:   if (PREDICT_TRUE(validx_iter_ != nullptr)) {
  : RETURN_NOT_OK(StoreCurrentValue());
  :   }
To make sure this patch has no side effects when disabled, this should probably 
be gated by some `cache_seeked_value`. AFAICT, besides the skip scan logic, 
this method is only used in tool_action_fs.cc anyway. WDYT?



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 20 Aug 2018 18:24:35 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-20 Thread Alexey Serbin (Code Review)
Alexey Serbin has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-20 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG@19
PS19, Line 19: until
> nit: only one 'l'
Done


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc@794
PS19, Line 794: 8192
> nit: should probably be using FLAGS_max_encoded_key_size_bytes here
FLAGS_max_encoded_key_size_bytes is not available here and I'm not sure I want 
to move the definition of that flag in here or re-shuffle libraries only to 
remove it in the follow-up patch: as we discussed offline, it's better to use 
arena passed as a for these methods from upper level (i.e. from cfile_set).


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@303
PS19, Line 303: & cur_enc_key,
  :   KuduPartialRow* p_row,
> nit: reverse sigil-space order
Done


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@354
PS19, Line 354: il
> nit: here too
Done


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc@168
PS19, Line 168:
  :   Status s =  writer.Insert(row);
  :   // As keys are inserted randomly, retry row insertion in 
case
  :   // the current row insertion failed due to duplicate 
value.
  :   if (s.IsAlreadyPresent()) {
  : continue;
  :   } else {
  : CHECK_OK(s);
> Does it matter what these values are? It seems like the predicate column wi
No, it's not that important.  Yes, there will be only three different values 
and that's what was desired, yep.  The random approach was exercised by the 
other generator, but if you like randomness here as well, I'll add that.  The 
only important point here is to have not less than num_rows/3 values matching 
the predicate.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 20 Aug 2018 17:50:01 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-20 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 20: Verified+1

Unrelated failure (due to concurrent builds?):

08:44:42 CMakeFiles/file_cache-stress-test.dir/file_cache-stress-test.cc.o: 
file not recognized: File truncated
08:44:42 collect2: error: ld returned 1 exit status
08:44:42 make[2]: *** [bin/file_cache-stress-test] Error 1


--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 20 Aug 2018 17:50:24 +
Gerrit-HasComments: No


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-20 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new patch set (#20) to the change originally 
created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..

KUDU-1291: efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, until all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,327 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/20
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-19 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc@104
PS16, Line 104: 
  :   IndexSkipScanTest()
  :   : KuduTabletTest(CreateSchema(std::get<0>(GetParam( {
  : const auto& param = GetParam();
  : schema_type = std::get<0>(param);
  : FLAGS_enable_skip_scan = std::get<1>(param);
  :   }
  :
  :   void SetUp() override {
  : KuduTabletTest::SetUp();
  : 
ASSERT_OK(tablet()->metadata()->CreateRowSet(_meta_));
  : FillTestTablet();
  :   }
  :
  :   // Generates and inserts given number of rows using the given 
PRNG,
  :   // returns # rows generated that match the predicate_val on 
predicate_col.
  :   int GenerateData(Random random, int num_rows, int 
predicate_col_id, int64_t predicate_value) {
  :
  : LocalTabletWriter writer(tablet().get(), _schema_);
  : KuduPartialRow row(_schema_);
  :
  : size_t num_key_cols = client_schema_.num_key_columns();
  : int num_matching = 0;
  :
  : while (num_rows > 0) {
  :   bool predicate_value_matched = false;
  :   for (int col_idx = 0; col_idx < num_key_cols; col_idx++) {
  : int64_t value = random.Uniform(1000);
  : CHECK_OK(row.SetInt64(col_idx, value));
  : if (col_idx == predicate_col_id && value == 
predicate_value) {
  :   predicate_value_matched = true;
  : }
  :   }
  :
> This is no longer used
Bah, yes it is.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 19
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 20 Aug 2018 00:25:17 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-19 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 19:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/19//COMMIT_MSG@19
PS19, Line 19: untill
nit: only one 'l'


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/cfile/cfile_reader.cc@794
PS19, Line 794: 8192
nit: should probably be using FLAGS_max_encoded_key_size_bytes here


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@303
PS19, Line 303:  _enc_key,
  :   KuduPartialRow *p_row, 
gscoped_ptr *
nit: reverse sigil-space order


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/cfile_set.h@354
PS19, Line 354:  *
nit: here too


http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/index_skipscan-test.cc@104
PS16, Line 104:
  :   IndexSkipScanTest()
  :   : KuduTabletTest(CreateSchema(std::get<0>(GetParam( {
  : const auto& param = GetParam();
  : schema_type = std::get<0>(param);
  : FLAGS_enable_skip_scan = std::get<1>(param);
  :   }
  :
  :   void SetUp() override {
  : KuduTabletTest::SetUp();
  : 
ASSERT_OK(tablet()->metadata()->CreateRowSet(_meta_));
  : FillTestTablet();
  :   }
  :
  :   // Generates and inserts given number of rows using the given 
PRNG,
  :   // returns # rows generated that match the predicate_val on 
predicate_col.
  :   int GenerateData(Random random, int num_rows, int 
predicate_col_id, int64_t predicate_value) {
  :
  : LocalTabletWriter writer(tablet().get(), _schema_);
  : KuduPartialRow row(_schema_);
  :
  : size_t num_key_cols = client_schema_.num_key_columns();
  : int num_matching = 0;
  :
  : while (num_rows > 0) {
  :   bool predicate_value_matched = false;
  :   for (int col_idx = 0; col_idx < num_key_cols; col_idx++) {
  : int64_t value = random.Uniform(1000);
  : CHECK_OK(row.SetInt64(col_idx, value));
  : if (col_idx == predicate_col_id && value == 
predicate_value) {
  :   predicate_value_matched = true;
  : }
  :   }
  :
This is no longer used


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589
PS17, Line 589: int num_matching = GenerateData(random, num_rows, 
predicate_col_id, predicate_val);
  :
> Right, the original idea was to generate almost the same order of matching
Seems like GenerateDataEx() accomplishes the "every Nth row" case with N fixed 
at 3, but not the case where all or almost all rows match. Is that still a goal?


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc@153
PS19, Line 153: int GenerateDataEx
nit: rename this to GenerateData and add docs


http://gerrit.cloudera.org:8080/#/c/10983/19/src/kudu/tablet/index_skipscan-test.cc@168
PS19, Line 168: case 1:
  :   CHECK_OK(row.SetInt64(col_idx, predicate_value / 
2 + 1));
  :   match = false;
  :   break;
  : case 2:
  :   CHECK_OK(row.SetInt64(col_idx, predicate_value / 
2 - 1));
  :   match = false;
  :   break;
Does it matter what these values are? It seems like the predicate column will 
only have three values in it. Maybe instead we can instead inject random data 
(and check them against the predicate) instead?



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 19

[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-18 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new patch set (#19) to the change originally 
created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..

KUDU-1291: efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, untill all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,344 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/19
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 19
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-17 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..


Patch Set 18:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19
PS16, Line 19: unti
> nit: until
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@238
PS17, Line 238:
> nit: prefer using // instead of /**/
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@244
PS17, Line 244: k to the appropriate range of rows and
> nit: how about "leftmost, non-leading primary key column with a predicate"
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@245
PS17, Line 245: . The only exception is when a scan is pr
> nit: how about something like "Value that the predicate column is predicate
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@246
PS17, Line 246: columns of a composite N-column primary key, where K < N,
> nit: how about "The collective columns to the left of the predicate column"
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@247
PS17, Line 247: r al
> nit: extra space, also maybe use : to separate these?
I switched to new lines for the 'definition' parts.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@270
PS17, Line 270:
  :   //   "prefix key"
  :   // Any value i
> nit: now that we have some definitions above, we don't need to redefine the
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@276
PS17, Line 276: n order to keep track of the number of rows that satisfy the
  :   //   "predicate value" wrt a distinct prefix key we first 
seek to the first
  :   //   such row in the index column and this corresponding 
value is called
  :   //   "lower bound" key. Next, we seek to the row that is 
right after the last
  :   //   row that satisfies the "predicate value" wrt to the 
distinct prefix key,
  :   //   and this corresponding value is calle
> I think this is getting a bit too deep into the implementation details (in
Yep, I think that's a very good suggestion.

I think I'll just tuck that into the .cc file.  We can update the signature of 
those functions and re-structure those to have better semantics is a separate 
patch.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@282
PS17, Line 282:   //
> nit: extra line
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@285
PS17, Line 285:   // By the definition (see above), the skip scan optimization i
> nit: "Seek to the next row that matches the predicate and has the same pref
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@333
PS17, Line 333: 'remaining' will always be at least 1, although it is a TODO to 
allow it
  :   // to be 0 (0 violates CHECK conditions elsewhere in the scan 
code).
  :   //
  :   // Currently, skip scan will be dynamically disabled when the 
number of seeks
  :   // for distinct prefix keys exceeds sqrt(#total rows). We use 
sqrt(#total rows)
  :   // as a cutoff because based on performance tests on upto 10M 
rows per tablet,
> nit: standardize the bullet types
Done


http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270
PS16, Line 270:
  :   //   "prefix key"
  :   // Any value i
> nit: these are no longer needed now that we have the definitions up front
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675
PS17, Line 675: 2. Read that distinct prefix from the ad-hoc index, append the 
predicate value
  :   //and minimum possible value for all other columns, and 
seek to that.
  :   //If this matches, this is the lower bound of our desired 
scan.
  :   // 3. If we found our desired lower bound, find an upper 
bound fo
> nit: use " - " to bullet these two points
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803
PS17, Line 803:   continue;
  : }
  :
> Hmm, not only are we not caching the seeked value, but we're searching for
Done


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc
File 

[kudu-CR] KUDU-1291: efficiently support predicates on non-prefix key components

2018-08-17 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new patch set (#18) to the change originally 
created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291: efficiently support predicates on non-prefix key 
components
..

KUDU-1291: efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, untill all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,343 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/18
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 18
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-17 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 17:

(10 comments)

Mostly a nit pass

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@238
PS17, Line 238: /
nit: prefer using // instead of /**/


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@244
PS17, Line 244: leftmost (non-first) primary key column
nit: how about "leftmost, non-leading primary key column with a predicate"


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@245
PS17, Line 245: equality value on the "predicate column".
nit: how about something like "Value that the predicate column is predicated 
on. Since only equality predicates are supported, there is only a single value 
of interest."


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@246
PS17, Line 246:  all columns (together) to the left of the "predicate column".
nit: how about "The collective columns to the left of the predicate column"


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@247
PS17, Line 247:  -
nit: extra space, also maybe use : to separate these?


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@270
PS17, Line 270: prefix key refers to the first "num_prefix_cols" columns of the 
current key.
  :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling
  :   // this function).
nit: now that we have some definitions above, we don't need to redefine these 
there.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@276
PS17, Line 276: In this case, prior to calling key_iter->SeekAtOrAfter(), the 
search key is
  :   //modified such that the predicate column is set to the 
predicate value and the
  :   //succeeding columns are set to their minimum possible 
values. This is done to
  :   //make sure that after calling key_iter->SeekAtOrAfter() 
the key_iter is correctly
  :   //placed at the first occurrence of the row (if it 
exists) that matches the next
  :   //prefix key with the predicate value.
I think this is getting a bit too deep into the implementation details (in that 
it's referencing function names in the CFileReader class), and it's not 
immediately clear why setting `cache_seeked_value=true` should seek to a 
different value. Maybe there's a better parameter name that is more telling.

Also now that the term "prefix key" is defined to be the columns before the 
predicate, the name "SeekToNextPrefixKey" may be confusing when run with 
`skip_scan_predicate_column_id_ + 1`. Maybe we could change this to be a bit 
clearer.

 enum {
   UPPER,
   LOWER
 } BoundType;

 // If seeking for the lower bound, seeks to ...
 // If seeking for the upper bound, seeks to ...
 SeekToNextBound(BoundType bound_type);

Maybe it'd be fine with just a name change. Or comments that explain why we're 
seeking to what we're seeking.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@282
PS17, Line 282:   //
nit: extra line


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@285
PS17, Line 285:   // Seek to the next predicate match within the current prefix.
nit: "Seek to the next row that matches the predicate and has the same prefix 
as that of `enc_key`"


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.h@333
PS17, Line 333: * key_iter_ is not NULL.
  :   //
  :   // Postconditions upon exiting this method:
  :   // 1. cur_idx_ is updated to the row_id of the next 
row(containing the next
  :   //higher distinct prefix) to scan.
  :   // 2. 'remaining' stores the number the entries to be scanned 
in the current scan range.
nit: standardize the bullet types



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 17
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 17 Aug 2018 23:08:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-17 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/16//COMMIT_MSG@19
PS16, Line 19: till
nit: until


http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/16/src/kudu/tablet/cfile_set.h@270
PS16, Line 270: prefix key refers to the first "num_prefix_cols" columns of the 
current key.
  :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling
  :   // this function).
nit: these are no longer needed now that we have the definitions up front


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@675
PS17, Line 675: the primary key (PK) value index (ad-hoc index) search pointer, 
which gives us
  :   // a forward index of value to position, and
  :   // the scan pointer, which is cur_idx_, representing an 
offset one more than
  :   // the last row that we actually scanned (in the previous 
batch).
nit: use " - " to bullet these two points


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/cfile_set.cc@803
PS17, Line 803: // Note: To handle a similar situation (as illustrated with 
Tables above) when finding the
  : // upper bound key offset, we follow a different approach. 
We simply do not cache
  : // the seeked value for the upper bound key, hence 
cache_seeked_value = false below.
Hmm, not only are we not caching the seeked value, but we're searching for a 
different value than if `cache_seeked_value` were true, right? I know it's 
documented in SeekToNextPrefixKey(), but I think it's worth explaining here, 
since AFAICT this is the only place we're using `cache_seeked_value=false`. The 
comment in SeekToNextPrefixKey() explains this is "required for correctness," 
but here is an appropriate place to explain why.


http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/17/src/kudu/tablet/index_skipscan-test.cc@589
PS17, Line 589: const int num_matching = GenerateData(random, num_rows, 
predicate_col_id, predicate_val);
  :
Since this is generating random data, and running a scan for equality on a 
single value, wouldn't it be pretty unlikely that `num_matching` is non-zero? 
Maybe we could inflate the probability that `predicate_val` shows up by adding 
a `inject_predicate_val_probability` parameter to GenerateData() that will 
inject `predicate_val` to the `predicate_col_id` column with some probability.

Looking at GenerateData(), it seems like generated data is capped between [0, 
1000), so I don't think this will ever return rows? Same 
kMiddleColumnMinKeyValue.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 17
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 17 Aug 2018 22:13:33 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-17 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#17).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,261 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/17
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 17
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-16 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792
PS14, Line 792: // Currently this block holds encoded composite key.
  :   Arena arena(16*1024);
  :
  :   DCHECK(!prepared_blocks_.empty());
  :
> nit: As a general rule of thumb, if something is indicative of a programmer
Thanks for this pointer. I have removed the DCHECK() in this case.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797
PS14, Line 797: us
> nit: I know some of the older code has this reversed, but for new code, we
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
  :   last_prepare_count_ = 0;
  :
  :   p
> Not sure if this is necessary to avoid breaking encapsulation, as I think w
ACK.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
  :   last_prepare_count_ = 0;
  :
  :   p
> It's worth documenting in the header the side effect of `cache_seeked_value
If `cache_seeked_value` is true, then a call to StoreCurrentValue() will not 
seek an extra row. Instead, due to the side effect of using 
CopyNextValues(size_t *n, ColumnDataView *dst) the first value from the last 
prepared PK column block will be fetched into 'dst'.

I have now documented this in the header file.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@790
PS15, Line 790: Todo
> nit: s/Todo/TODO/
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@793
PS15, Line 793: 16*1024
> Why did we go from using a constant back to using a magic number? http://wi
Made this a member variable of CFileIterator as a followup of the review 
comment in cfile_set.cc (L553). After discussing with Alexey, we came to the 
conclusion that it is safe to initialize Arena with an initial buffer size of 
1M. If required, the Arena size grows wrt the data being allocated to it.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@795
PS15, Line 795:   DCHECK(!prepared_blocks_.empty());
  :   if (prepared_blocks_.empty()) {
  : return Status::NotFound("blocks not found");
  :   }
> This has a code smell. https://martinfowler.com/bliki/CodeSmell.html In deb
Yes it is possible. I have removed DCHECK and handled this condition by 
returning Status::NotFound.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc
File src/kudu/common/encoded_key.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92
PS14, Line 92:gscoped_ptr 
*key,
 :Arena* arena, int32_t 
num_columns
> Per the GSG (https://google.github.io/styleguide/cppguide.html):
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@175
PS15, Line 175: // Due to the caveat(see below) listed for 
SkipToNextScan(size_t *remaining),
  : // this class should not reference a separate "client" class 
that uses key_iter->cur_val.
> I find this part of the comment confusing and I think we should just remove
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@248
PS15, Line 248: Caveat:
> I don't think we need to specifically call this out as a caveat
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@249
PS15, Line 249: key_iter->cur_val_
> How about just key_iter_? Is that sufficient? Talking about a private membe
Ok. Makes sense. Changed this to just key->iter_.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@362
PS15, Line 362: skip_scan_num_seeks_cutoff
> missing underscore suffix on member variable
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

PS14:
> nit: s/unique prefix/distinct prefix
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174
PS14, Line 174: //
  : // Due to the caveat(see below) listed for 
SkipToNextScan(size_t 

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-16 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#16).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,261 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/16
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 16
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551
PS12, Line 551: atus CFileSet::Iterator::SeekToNextPrefixKey(size_t 
num_prefix_cols, bool cache_seeked_value) {
  :   gscoped_ptr enc_key;
  :   Arena arena(kEncodedCompositeKeyMaxSize);
> Might be helpful: from util/memory/arena.h and the definition of Reset():
+1

Using dedicated Arena per iterator instance sounds good to me.  It seems we can 
easily reset it in the long loop of the SkipToNextScan() method.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770
PS14, Line 770: Check to see whether we have effectively seeked backwards
> We effectively have two pointers we are tracking with skip-scan: the primar
Adding a comment would definitely help.  That might be a comment for the 
method, as well as an extra for already existing high-level description of the 
algorithm at line 656.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557
PS15, Line 557: // columns of the cached value obtained after
> I think it's possible to pass an instance of arena from the upper level to
And the best approach I think is suggested by Andrew -- have an instance of 
Arena as a member of CFileSet::Iterator.  Probably, it's necessary to reset it 
after use in every cycle of the loop in SkipToNextScan() method.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 16 Aug 2018 07:24:09 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557
PS15, Line 557: Arena arena(FLAGS_max_encoded_key_size_bytes);
I think it's possible to pass an instance of arena from the upper level to use 
it here, like it's done in DecodeCurrentKey().

BTW, that parameter for the Arena constructor -- it's just initial size for the 
arena, it can grow bigger up to the allowed max size (by default 127 * 8KB, 
i.e. almost 1 MB).  However, setting it up to FLAGS_max_encoded_key_size_bytes 
is a good enough (arena grows 2x times when needed).


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@571
PS15, Line 571: (
nit: this and the pairing brace are not needed.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@597
PS15, Line 597: KuduPartialRow *p_row,
  : const gscoped_ptr& cur_enc_key
style nit: usually we have input parameter specified first, then go output 
parameters.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@603
PS15, Line 603: auto const 
style nit: const auto& value


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@606
PS15, Line 606: p_row->Set(col_id, data);
What if this returns non-OK?


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@611
PS15, Line 611: p_row->Set(skip_scan_predicate_column_id_, suffix_col_value);
ditto


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@619
PS15, Line 619: (
nit: this and the closing parenthesis are not needed


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@742
PS15, Line 742: CheckPredicateMatch(lower_bound_key);
I'm not sure this is what is needed: CheckPredicateMatch() verifies that only 
the first column of the predicate matches, but we want all columns of the 
predicate to match, no?  If it's what we actually need, maybe add a comment 
that the lower  bound key is just about the match of the first column of the 
predicate.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@811
PS15, Line 811: ERROR
I think this should be WARNING, because there are some legit cases where 
SkipToNextScan() can return non-OK.  E.g.:

E0815 21:19:24.748093  1681 cfile_set.cc:811] Skip scan failed: Illegal state: 
No lexicographically greater key exists



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 16 Aug 2018 06:49:03 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(4 comments)

A bit more feedback for the test.

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

PS15:
We discussed that offline, but just to re-iterate:

It would be nice to add a few scenario where cardinality of predicate-matching 
rows is close to the number of all rows inserted, where number of inserted rows 
would be in order of 10^3.  The idea is to exercise SkipToNextScan() method's 
logic more often.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@18
PS15, Line 18: #include 
nit: please use c++ style include:

#include 


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@106
PS15, Line 106: s.IsAlreadyPresent
What if num_matching was incremented, but the IsAlreadyPresent() was reported 
on an attempt to insert a row?  Maybe, num_matching should be incremented only 
in case if row was inserted?


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@129
PS15, Line 129: {
nit: here and elsewhere under this switch -- remove those braces, they are not 
needed.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 16 Aug 2018 00:03:33 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-15 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(1 comment)

This is a drive-by review; I probably won't check in for another several days

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
  :   last_prepare_count_ = 0;
  :
  :   p
> It's worth documenting in the header the side effect of `cache_seeked_value
Not sure if this is necessary to avoid breaking encapsulation, as I think we 
can call the whole key_iter_ reserved by the Iterator class and ignore the 
private member; see my comment re. key_iter_ on Rev 15



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 15 Aug 2018 17:31:00 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-15 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(8 comments)

Did a cursory review pass, nice to see the improvements, looking good to me

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@790
PS15, Line 790: Todo
nit: s/Todo/TODO/


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@793
PS15, Line 793: 16*1024
Why did we go from using a constant back to using a magic number? 
http://wiki.c2.com/?MagicNumber

We should use the gflag we are using in the other file here as well.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@795
PS15, Line 795:   DCHECK(!prepared_blocks_.empty());
  :   if (prepared_blocks_.empty()) {
  : return Status::NotFound("blocks not found");
  :   }
This has a code smell. https://martinfowler.com/bliki/CodeSmell.html In debug 
mode, we make it impossible to hit this condition, yet in release mode, we 
handle it. That means it is impossible to have test coverage of this situation, 
which is bad. We should pick one or the other: either this is impossible, i.e. 
a code error, in which case just use a CHECK, or it's possible, and just handle 
it by returning Status::NotFound.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@175
PS15, Line 175: // Due to the caveat(see below) listed for 
SkipToNextScan(size_t *remaining),
  : // this class should not reference a separate "client" class 
that uses key_iter->cur_val.
I find this part of the comment confusing and I think we should just remove it


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@248
PS15, Line 248: Caveat:
I don't think we need to specifically call this out as a caveat


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@249
PS15, Line 249: key_iter->cur_val_
How about just key_iter_? Is that sufficient? Talking about a private member of 
a different class breaks encapsulation for that class.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@362
PS15, Line 362: skip_scan_num_seeks_cutoff
missing underscore suffix on member variable


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770
PS14, Line 770: reak;
> How does this happen? Could you elaborate a bit? What does seeking backward
We effectively have two pointers we are tracking with skip-scan: the primary 
key (PK) value index (ad-hoc index) search pointer, which gives us a forward 
index of value to position, and the scan pointer, which is cur_idx_, 
representing an offset one more than the last row that we actually scanned.

In this case, we searched the PK and came up with a position lower than our 
scan offset, which means even though we tried to search forward relative to our 
previous PK lookups, we ended up landing at an offset that we'd already 
scanned. The awkward reason we have to deal with this possibility is because we 
don't have a reverse PK index from offset (position) to PK (value), otherwise 
we'd .

Maybe we should try to explain the above somewhere in a comment, although I'm 
not sure of the best place to discuss it. Maybe near the top of this method in 
the .cc file around line 665?



-- 
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 15 Aug 2018 17:27:12 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-15 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792
PS14, Line 792: // Currently this block holds encoded composite key.
  :   Arena arena(16*1024);
  :
  :   DCHECK(!prepared_blocks_.empty());
  :
nit: As a general rule of thumb, if something is indicative of a programmer 
error, we'll add a CHECK or DCHECK, whereas if there's an issue with the 
underlying data, we'll return an error.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797
PS14, Line 797: us
nit: I know some of the older code has this reversed, but for new code, we 
favor placing the sigil immediately after the var name (PreparedBlock* pblk), 
here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
  :   last_prepare_count_ = 0;
  :
  :   p
It's worth documenting in the header the side effect of `cache_seeked_value`, 
i.e. if it is set to true, this call to StoreCurrentValue() will seek an extra 
row, right?

Another approach I chatted with you about before was that maybe we can avoid 
pushing this bookkeeping so low by slightly adjusting the APIs:

 Status SeekAtOrAfter(const EncodedKey& encoded_key, bool* exact_match);  // 
Same behavior as before this patch.
 Status ConsumeNextValue(string* val);  // Combines StoreCurrentValue() and 
GetCurrentValue()

Then, callers (i.e. CFileSet::Iterator) could use key_iter->ConsumeNextValue() 
immediately after SeekAtOrAfter() instead of doing SeekAtOrAfter(/* 
cache_seeked_values=*/true), the CFileIterator::cur_val_ member could be moved 
to the CFileSet::Iterator with a more telling name, like 'skip_scan_cur_key_' 
or somesuch, and we wouldn't have to have caveats and comments explaining the 
restricted access to cur_val_.

Might be a pretty substantial change, but it might with the broken 
encapsulation. WDYT?


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc
File src/kudu/common/encoded_key.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92
PS14, Line 92:gscoped_ptr 
*key,
 :Arena* arena, int32_t 
num_columns
Per the GSG (https://google.github.io/styleguide/cppguide.html):
nit: spacing
nit: output parameters should come after input parameters


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

PS14:
nit: s/unique prefix/distinct prefix


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174
PS14, Line 174: //
  : // Due to the caveat(see below) listed for 
SkipToNextScan(size_t *remaining),
  : // this class should not reference a separate "client" class 
that uses key_iter->cur_val.
Hmm, I'm not sure what this means. See my comment in cfile_reader.cc re: an 
alternate approach.

Also nit: add a space in "caveat (see"


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@208
PS14, Line 208:
  :   // Decode the currently-seeked key into 'enc_key'.
  :   Status
Before getting into the nitty gritty of each method, I think it'd be helpful to 
start out by adding a block comment with:
- a brief, high-level overview of what a skip scan is
- definitions of "predicate column", "prefix columns", "prefix keys", in the 
context of a skip scan
- definitions of a lower and upper bound in the context of a skip scan, and how 
they're used when seeking

With the definitions in one place, it'd be easier to read about, and you 
wouldn't have to sprinkle definitions throughout the below comments. You've 
already got a good base at L238.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@209
PS14, Line 209: // Decode the currently-seeked key into 'enc_key'.
  :   Status DecodeCurrentKey(Arena* arena, 
gscoped_ptr* enc_key);
  :
  :   // This function is used to place the validx_iter_ at the 
next greater prefix key.
  :   // prefix key refers to the first "num_prefix_cols" columns 
of the current key.
  :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling
  :   // this function).
  :   // If 'cache_seeked_value' is true, the validx_iter_ will 
store the value seeked to.
  :   Status SeekToNextPrefixKey(size_t 

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-14 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 15:

(7 comments)

I have addressed all the open comments. PTAL.

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59
PS10, Line 59:
> Yep, it would be nice to have a test with super-wide primary key (multiple
Added a test case where the encoded primary key size is equal to the max 
allowable encoded PK size (16MB), stored as max_encoded_key_size_bytes.

Also, verified that any insertions that violate this size constraint results in 
failure.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444
PS12, Line 444: col_id =
> If that's something we expect to happen, I would simply disable the skip sc
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462
PS12, Line 462:   }
  :
  :   if (nonfirst_key_pred_exists) {
  : use_skip_scan_ = true;
  :
  : // Store the predicate column id.
  : skip_scan_predicate_column_id_ = min_col_id;
  :
  :
> I think it's better to keep it as Andrew requested.  After some considerati
Sure. Done.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@414
PS14, Line 414: id CFileSet::Ite
> nit: since this is not to change in the code below, add 'const'
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@576
PS14, Line 576:
> style nit: keep the brace with the name of the method
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@699
PS14, Line 699:
  :   skip_scan_num_seeks_++
> I think it makes sense to introduce a metric to capture cases when skip sca
Added this info in VLOG(1).


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc@189
PS14, Line 189: break;
  :   }
  :   case kMaxSizedEncodedKey: {
  : CHECK_OK(builder.AddKeyColumn("P1", STRING));
  : CHECK_OK(builder.AddKeyColumn
> nit: all there might be constexpr:
Done



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 15 Aug 2018 04:55:52 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-14 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#15).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,069 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/15
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-14 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59
PS10, Line 59: static const int kEncodedCompositeKeyMaxSize =
> Can we test what happens when someone violates this?
Yep, it would be nice to have a test with super-wide primary key (multiple 
columns, big cells) and try to run scans with skip scan enabled.  As for the 
desired behavior, at least tservers should not crash, and the logic of the skip 
scan optimizations should behave predictably, not outright failing the whole 
scan operation.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@699
PS14, Line 699: use_skip_scan_ = false;
  : return Status::OK();
I think it makes sense to introduce a metric to capture cases when skip scan 
decided to bail due to the seek cutoff condition.

For examples, please looks for examples in the code: check for  
METRIC_DEFINE_gauge_int64 and INSTANTIATE_METRIC patterns.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 14 Aug 2018 20:18:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-14 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59
PS10, Line 59: static const int kEncodedCompositeKeyMaxSize =
> Refactored this name to: kEncodedCompositeKeyMaxSize as it makes more sense
Can we test what happens when someone violates this?



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 14 Aug 2018 14:43:04 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444
PS12, Line 444: col_id ==
> Yes, if the column is not found in the schema. Not quite sure whether to co
If that's something we expect to happen, I would simply disable the skip scan 
and break out of the cycle, adding a warning log message about wrong column 
name in the predicate.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 14 Aug 2018 05:52:36 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 14:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462
PS12, Line 462:
  : // Store the predicate column id.
  : skip_scan_predicate_column_id_ = min_col_id;
  :
  : // Store the predicate value.
  : skip_scan_predicate_value_ = pred_value;
  :
  : // Store the cutoff on the number of skip scan seeks.
  :
> Alexey, please let me know if you feel that this change should be reverted.
I think it's better to keep it as Andrew requested.  After some consideration I 
realized that's clearer because it's moved out of the cycle at least.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@414
PS14, Line 414: int num_key_cols
nit: since this is not to change in the code below, add 'const'


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@576
PS14, Line 576: (
style nit: keep the brace with the name of the method


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@321
PS12, Line 321: auto pred_p1 = 
ColumnPredicate::Equality(schema_.column(0), _p1);
  : auto pred_p2 = 
ColumnPredicate::Equality(schema_.column(1), _p2);
  : spec.AddPredicate(pred_p1);
  : spec.AddPredicate(pred_p2);
  : vector results;
  : NO_FATALS(ScanTablet(, , "Exact match on 
P1 and P2"));
  : EXPECT_EQ(1, results.size());
  :   }
  :   break;
  :
  : c
> Changed to a more uniform layout. Please let me know if this looks good.
thanks, lgtm


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

PS14:
This seems to be a good set of unit tests regarding testing scan using 
different combinations primary key columns.

How about adding a simple unit test to verify that skip scan is 
enabled/disabled in cfileset iterator?  I.e., I was thinking to verify 
functionality of  CFileSet::Iterator::TryEnableSkipScan().

What do you think?


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/index_skipscan-test.cc@189
PS14, Line 189: const int32_t kNumDistinctP1 = 2;
  : const int16_t kNumDistinctP2 = 10;
  : const int kNumDistinctP3 = 5;
  : const int8_t kNumDistinctP4 = 1;
  : const int8_t kNumDistinctP5 = 20;
nit: all there might be constexpr:

constexpr int32_t kNumDistinctP1 = 2;
...



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 14 Aug 2018 05:47:54 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-11 Thread Anupama Gupta (Code Review)
Anupama Gupta has abandoned this change. ( 
http://gerrit.cloudera.org:8080/11067 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Abandoned

This is an obsolete branch.
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 28
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-11 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 14:

(25 comments)

Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@345
PS12, Line 345: bool cache_seeked_value
> nit: Even though this is calling StoreCurrentValue(), I think calling this
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@348
PS12, Line 348:   const std::string& GetCurrentValue() const;
> Add 'const' modifier for this method.
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472
PS12, Line 472:   // Value currently pointed to by validx_iter_.
  :   std::string cur_val_;
  :
> +1
Andrew, in case we have a single public function GetCurrentValue(), it would 
actually lead to different results because it uses CopyNextValues(size_t *n, 
ColumnDataView *dst) . CopyNextValues(..) fetches the next 'n' values from the 
block into 'dst' . Hence, the need to store this fetched value in 'cur_val_' .


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@473
PS10, Line 473:   std::string cur_val_;
  :
> This breaks encapsulation. We should document something along these lines i
Listed this caveat in CFileSet::Iterator . Please let me know if it fine.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc@792
PS12, Line 792: DCHECK(!prepared_blocks_.empty());
> What happens if this does not hold in release build?
Makes sense. Added return non-OK status (Status::NotFound) in this case.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@213
PS12, Line 213: // prefix key refers to the first "num_prefix_cols" columns of 
the current key.
  :   // current key is
> nit: Reword as "If `cache_seeked_value` is true, the predicate column itera
Rephrased as: "If 'cache_seeked_value' is true, the validx_iter_ will store the 
value seeked to." , since the iterator is a validx_iter_ .


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@220
PS12, Line 220: Status SeekToRowWithCurPrefixMatchingPred(const 
gscoped_ptr& enc_key);
  :
> nit: reword as "Build the key with the same prefix as `cur_enc_key`, that h
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@19
PS12, Line 19: #include  nit: please use '#include ' instead and place that among other C++-s
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@76
PS12, Line 76:
> Per our discussion with Mike today, I thought the idea was to have this dis
You are right. I assumed this will be disabled right before merging (in the 
final patch). Set this to 'false' now, as now we are testing both the scenarios 
in index_skipscan_test.cc


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@435
PS12, Line 435:
> nit: you could use 'auto' here.
Removed predicates var as per comment on L436.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@436
PS12, Line 436: spec.predi
> nit: this can just be `spec.predicates()`, then there would be no need for
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@440
PS12, Line 440: // Get the column id from the predicate
> nit: move this out of the loop, maybe up by L413 and then use it in definin
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444
PS12, Line 444: col_id ==
> Could find_column() return -1 here?
Yes, if the column is not found in the schema. Not quite sure whether to 
continue the loop in this case or just return with skip scan disabled. Any 
suggestion ?


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462
PS12, Line 462:
  : // Store the predicate column id.
  : skip_scan_predicate_column_id_ = min_col_id;
  :
  : // Store the predicate value.
  : skip_scan_predicate_value_ = pred_value;
  :
  : // Store the cutoff on the number of skip scan seeks.
  :
> It was originally written as described, but I asked her to make the change,
Alexey, please let 

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-11 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#14).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,023 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/14
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-11 Thread Anupama Gupta (Code Review)
Anupama Gupta has removed Alexey Serbin from this change.  ( 
http://gerrit.cloudera.org:8080/11067 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Removed reviewer Alexey Serbin.
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 28
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-10 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11067 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 28:

> Removed reviewer Mike Percy.

I'm not sure I understand why this changelist is still around.  Isn't it 
obsoleted by https://gerrit.cloudera.org/#/c/10983/  ?

Do you expect any reviews to be done here?  Or the 'primary one' is 
https://gerrit.cloudera.org/#/c/10983/  ?


--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 28
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 11 Aug 2018 04:07:54 +
Gerrit-HasComments: No


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-10 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#28).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,023 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/28
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 28
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-10 Thread Anupama Gupta (Code Review)
Anupama Gupta has removed Mike Percy from this change.  ( 
http://gerrit.cloudera.org:8080/11067 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Removed reviewer Mike Percy.
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 27
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-10 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#27).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,023 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/27
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 27
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-08 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11067 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 26:

This should be abandoned in favor of https://gerrit.cloudera.org/c/10983/ since 
that is the change request that has all the review comment history.

In order to post to the other one, you must put the following in the commit 
message:

  Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f

Then click the "Abandon" button on this CR.


--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 26
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 08 Aug 2018 11:44:31 +
Gerrit-HasComments: No


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-08 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 13:

(13 comments)

Went through a few more files; leaving what I have so far.

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@345
PS12, Line 345: bool set_current_value
nit: Even though this is calling StoreCurrentValue(), I think calling this 
`cache_seeked_value` makes it this a bit clearer.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472
PS12, Line 472:   // Value currently pointed to by validx_iter_.
  :   // Currently, it is assumed that 
CFileSet::Iterator::SkipToNextScan(size_t *remaining)
  :   // has exclusive access to use this variable.
> Any chance to make update this to keep better level of encapsulation?
+1

I might be missing something, but would we be able to only have a single public 
function GetCurrentValue() that does what StoreCurrentValue() does, but has a 
string as an outparameter? Not sure if this would yield the different results 
than if we were to continue storing via 'set_current_value'.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@213
PS12, Line 213: // 'cache_seeked_value' controls whether we will remember the 
next key seeked to in
  :   // this function.
nit: Reword as "If `cache_seeked_value` is true, the predicate column iterator 
will store the value seeked to.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.h@220
PS12, Line 220: // Builds a key containing the current prefix key, predicate 
column value
  :   // and the minimum value for rest of the key columns.
nit: reword as "Build the key with the same prefix as `cur_enc_key`, that has 
`skip_scan_predicate_value_` in its predicate column, and the minimum possible 
value for all other columns.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@436
PS12, Line 436:
nit: this can just be `spec.predicates()`, then there would be no need for a 
`predicates` var


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@440
PS12, Line 440: const string& col_name = col_and_pred.first;
nit: move this out of the loop, maybe up by L413 and then use it in defining 
num_key_cols.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462
PS12, Line 462:   }
  :
  :   if (nonfirst_key_pred_exists) {
  : use_skip_scan_ = true;
  :
  : // Store the predicate column id.
  : skip_scan_predicate_column_id_ = min_col_id;
  :
  :
> nit: why not to move this under the same scope where the nonfirst_key_pred_
It was originally written as described, but I asked her to make the change, 
reason being it seems a bit off to update these variables if we end up 
disabling the skip scan. Not sure my concern is important, since these 
variables (I think) are unused if we're not skip-scanning. I'd be ok reverting 
if you prefer, but I agree the semantics should be updated.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551
PS12, Line 551:
  : Status CFileSet::Iterator::SeekToNextPrefixKey(size_t 
num_prefix_cols, bool cache_seeked_value) {
  :   gscoped_ptr enc_key;
Hmm.. This is doing a lot of heap-allocating and gets called _very often_. I 
wonder if we could keep a single Arena or unique_ptr as a member of the 
iterator,  though I'm not sure our Arena implementation is reusable in that 
way. Might be worth looking into if you notice high memory usage in your 
cluster testing


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@561
PS12, Line 561: _key, , num_prefix_cols));
  :
  :   if (!cache_seeked_value) {
  : return key_iter_->SeekAtOrAfter(*enc_key,
  : /* set_current_value= */ cache_seeked_value,
  : /* exact_match= */ nullptr);
  :   }
  :
  :   // Set the predicate column to the predicate value in case we 
can find a
  :   // predicate match in one search. As a side effect, 
GetKeyWithPredicateVal()
  :   // sets minimum values on the columns after the predicate 
value, which is
  :   // required for correctness here.
  :   KuduPartialRow partial_row(&(base_data_->tablet_schema()));
  

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 12:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@348
PS12, Line 348:   const std::string& GetCurrentValue();
Add 'const' modifier for this method.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.h@472
PS12, Line 472:   // Value currently pointed to by validx_iter_.
  :   // Currently, it is assumed that 
CFileSet::Iterator::SkipToNextScan(size_t *remaining)
  :   // has exclusive access to use this variable.
Any chance to make update this to keep better level of encapsulation?

If not, maybe add TODO here to be more aware of this caveat.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/cfile/cfile_reader.cc@792
PS12, Line 792: DCHECK(!prepared_blocks_.empty());
What happens if this does not hold in release build?

I'm curious why that would not be an option to return non-OK status in that 
case in addition to DCHECK() ?


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@19
PS12, Line 19: #include 
nit: please use '#include ' instead and place that among other C++-style 
include directives below, keeping the list sorted alphabetically.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@76
PS12, Line 76: true
Per our discussion with Mike today, I thought the idea was to have this 
disabled in first commit before more-like-real-world testing is done.

Am I missing something?


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@435
PS12, Line 435: std::unordered_map
nit: you could use 'auto' here.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@444
PS12, Line 444: col_id =
Could find_column() return -1 here?


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@462
PS12, Line 462:   if (nonfirst_key_pred_exists) {
  : use_skip_scan_ = true;
  :
  : // Store the predicate column id.
  : skip_scan_predicate_column_id_ = min_col_id;
  :
  : // Store the predicate value.
  : skip_scan_predicate_value_ = pred_value;
  :   }
nit: why not to move this under the same scope where the 
nonfirst_key_pred_exists set to 'true' but after the 'if (col_id < min_col_id)' 
closure?  Doing so, you could get rid of the nonfirst_key_pred_exists, but 
don't forget to add a comment about the semantics of those updates.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@542
PS12, Line 542:  *
nit: stick the asterisk to the type of the parameter.


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@92
PS12, Line 92: s
what it was any other error but IsAlreadyPresent?


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@270
PS12, Line 270: int schema_num = static_cast(GetParam());
Does it make sense to add additional dimension to the test: whether the 
skip-scan feature enabled?  In that way we can make sure that the predicates 
return the same result set in both cases, and that seems to be a good sanity 
check.

For an example of combining multiple parameters in a Cartesian product see 
various tests containing text 'testing::Combine'


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@280
PS12, Line 280: ASSERT_NO_FATAL_FAILURE
nit: you could use NO_FATALS() shortcut instead for better readability


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/index_skipscan-test.cc@321
PS12, Line 321: case kThreePK: {
  : // Test predicate on the third PK column.
  : ScanSpec spec;
  : Slice value_p3("2_p3");
  : auto pred_p3 = 
ColumnPredicate::Equality(schema_.column(2), _p3);
  : spec.AddPredicate(pred_p3);
  : vector results;
  : ASSERT_NO_FATAL_FAILURE(ScanTablet(, , 
"Exact match on P3"));
  : EXPECT_EQ(20, results.size());
  :   break;
  : }
nit: use the same layout of scopes and braces as in other cases (that's for 
better readability).



--
To view, visit 

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#13).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,018 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/13
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 13
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#26).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,018 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/26
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 26
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#25).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,016 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/25
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 25
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#12).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,017 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/12
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 12
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#24).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,017 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/24
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 24
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@470
PS10, Line 470: Status StoreCurrentValue();
  :
> super-nit: in reading the name SetX(), based on other usages in the codebas
Done


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@353
PS10, Line 353:   // Whether the skip scan optimization has searched the 
current prefix for a predicate match
  :   // or whether the prefix has changed since its l
> How about change this to:
Done



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 11
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 07 Aug 2018 23:25:16 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 11:

(41 comments)

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59
PS10, Line 59: static const int kEncodedCompositeKeyMaxSize =
> Great question. This is from http://kudu.apache.org/docs/known_issues.html#
Refactored this name to: kEncodedCompositeKeyMaxSize as it makes more sense wrt 
our skip scan use case.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@339
PS10, Line 339: to store the value obtaine
> nit: note that the "current value" refers to the value after seeking.
Done


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h
File src/kudu/common/encoded_key.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h@63
PS10, Line 63: ;
> nit: what does this do?
Created another function IncrementEncodedKeyColumns(..) for better separation 
of responsibility and documented this parameter.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h@52
PS10, Line 52: mplate struct SliceTypeRowOps; // IWYU 
pragma: keep
 : template struct NumTypeRowOps;   // 
IWYU pragma: keep
> nit: not your fault, but mind removing the spaces here?
Done


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207
PS10, Line 207: gscoped_ptr
> nit: if the change would be contained within cfile_set.cc, mind updating th
Noted. Sticking with gscoped_ptr for now since it's used outside cfile_set.cc 
(in encode_key.cc).


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@209
PS10, Line 209: refix key.
> nit: Since this isn't a variable name, maybe take out the underscore.
Alright. Changed here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210
PS10, Line 210: um_prefix_
> nit: 'num_prefix_cols'
Done


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210
PS10, Line 210:   // prefix key refers to the first "num_prefix_cols" columns 
of the current key.
  :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling SeekToNextPrefixKey()).
  :   // 'cache_seeked_value' controls whether we will remember the 
next key seeked to in
  :   // this function.
> nit: can you reword this to clarify whether "current" refers to before or a
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@76
PS9, Line 76: true
> Let's default this to false for the initial merge and do some testing to en
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@80
PS9, Line 80:  "Max number of skip attempts the skip scan 
optimization should make before "
> Add:
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@405
PS9, Line 405: e
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@424
PS9, Line 424:
> nit: add spaces around your operators on this line, both the assignment = a
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@430
PS9, Line 430: key_cols;
> nit: move this comment inside the 'if' to right below this line
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@436
PS9, Line 436: nst auto& co
> nit: unnecessary parentheses
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@437
PS9, Line 437:
> nit: put this comment on its own line
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@444
PS9, Line 444: p);
> nit: unnecessary parentheses
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@522
PS9, Line 522: << KUD
> nit: remove so much extra horizontal white space on this line
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@531
PS9, Line 531: "
> s/or/which we consider to be/
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@532
PS9, Line 532:
> currently-seeked key in the primary key index
For better clarity, changed this to -  "cached value obtained after the 
previous seek in the primary key index".


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@576
PS9, Line 576: }
> nit: just 

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#11).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
10 files changed, 1,016 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/11
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 11
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-07 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59
PS10, Line 59: static const int kPrimaryKeyMaxSize = 16*1024;
> nit: is this a hard limit or just a target? How is this enforced?
Great question. This is from 
http://kudu.apache.org/docs/known_issues.html#_primary_keys and we haven't 
investigated how it is enforced yet or whether there is another constant 
available for this, although I grepped around and didn't find one.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@473
PS10, Line 473:   // Currently, it is assumed that 
CFileSet::Iterator::SkipToNextScan(size_t *remaining)
  :   // has exclusive access to use this variable.
This breaks encapsulation. We should document something along these lines in 
CFileSet::Iterator itself with respect to the validx_iter_ variable; this class 
should not reference a separate "client" class that uses it, which would be 
especially surprising considering this is a private variable.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@353
PS10, Line 353:   // To track whether the lower bound prefix key rolled over 
between the first and
  :   // second seek to determine the lower bound key.
How about change this to:

  // Whether the skip scan optimization has searched the current prefix for a 
predicate match or whether the prefix has changed since its last check.



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 10
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 07 Aug 2018 19:43:24 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 10:

(14 comments)

I've only been through a few files, but here are some nits so far. Seems there 
are a few open points from Mike too.

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59
PS10, Line 59: static const int kPrimaryKeyMaxSize = 16*1024;
nit: is this a hard limit or just a target? How is this enforced?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@339
PS10, Line 339: to store the current value
nit: note that the "current value" refers to the value after seeking.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@470
PS10, Line 470: Status SetCurrentValue();
  :
super-nit: in reading the name SetX(), based on other usages in the codebase 
(grep -r "Set\w" in src/kudu), I kind of expect the contract of some SetX(x) to 
set some internal variable to x. Mind changing this to something like 
StoreCurrentValue or CacheCurrentValue?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h
File src/kudu/common/encoded_key.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h@63
PS10, Line 63:  int num_columns = 0
nit: what does this do?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h@52
PS10, Line 52: template struct SliceTypeRowOps; // 
IWYU pragma: keep
 :   template struct NumTypeRowOps;   // 
IWYU pragma: keep
nit: not your fault, but mind removing the spaces here?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207
PS10, Line 207: Arena* arena
nit: note that (I think) arena stores the buffer containing with the encoded key


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207
PS10, Line 207: gscoped_ptr
nit: if the change would be contained within cfile_set.cc, mind updating the 
newly added occurrences of gscoped_ptr to unique_ptr?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@209
PS10, Line 209: prefix_key
nit: Since this isn't a variable name, maybe take out the underscore.


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210
PS10, Line 210: "num_cols"
nit: 'num_prefix_cols'


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210
PS10, Line 210:   // "prefix_key" refers to the first "num_cols" columns of the 
current key
  :   // (current key is the key currently pointed to by 
validx_iter_).
  :   // 'cache_seeked_value' controls whether we will remember the 
key seeked to
  :   // in the "current value" of the iterator.
nit: can you reword this to clarify whether "current" refers to before or after 
the seek?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@413
PS10, Line 413: !(lower_bound_idx_ == 0 && upper_bound_idx_ == row_count_)
nit: IMO slightly easier to read as (lower_bound_idx_ != 0 || upper_bound_idx 
!= row_count_)


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@423
PS10, Line 423: const std::unordered_map 
 = spec.predicates();
  :   for (auto it=predicates.begin(); it!=predicates.end(); ++it) {
Merge these into something like:

 for (const auto& col_and_pred: predicates) {
   const string& col_name = col_and_pred.first;
   const ColumnPredicate& pred = col_and_pred.second;
   const Schema& schema = base_data_->tablet_schema();
   // ... use `col_name`, `pred`, and `schema`
 }


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@441
PS10, Line 441: skip_scan_predicate_column_id_ = col_id;
  :
  : // Store the predicate value.
  : skip_scan_predicate_value_ = (it->second).raw_lower();
I think it's generally good hygiene that these don't get modified until after 
we know we're enable skip-scanning, like we do for `use_skip_scan_`. Though 
maybe it doesn't matter if these aren't going to be used by the iterator anyway?


http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@509
PS10, Line 509:   TryEnableSkipScan(*spec);
nit: I know it doesn't make a difference execution-wise, but I think this makes 
more sense up by Init() at L394



--
To view, visit http://gerrit.cloudera.org:8080/10983

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 10:

(31 comments)

Just did a nit pass.

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/cfile/cfile_reader.h@344
PS9, Line 344:   Status SeekAtOrAfter(const EncodedKey _key,
 :bool *exact_match, bool set_current_value 
= false);
style nit: since 'exact_match' is an out-parameter, it should go last, so let's 
make the signature of this:

  Status SeekAtOrAfter(const EncodedKey _key,
   bool set_current_value = false, bool *exact_match = 
nullptr);

Per https://google.github.io/styleguide/cppguide.html#Output_Parameters


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/common/encoded_key.h
File src/kudu/common/encoded_key.h:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/common/encoded_key.h@63
PS9, Line 63: num_columns
document this parameter


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@76
PS9, Line 76: true
Let's default this to false for the initial merge and do some testing to enable 
it.

Add:

  TAG_FLAG(enable_skip_scan, experimental);


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@80
PS9, Line 80:
Add:

  TAG_FLAG(skip_scan_short_circuit_loops, hidden);
  TAG_FLAG(skip_scan_short_circuit_loops, advanced);


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@405
PS9, Line 405:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@424
PS9, Line 424: =
nit: add spaces around your operators on this line, both the assignment = and 
the comparison !=


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@430
PS9, Line 430: // Predicate exists on the first PK column.
nit: move this comment inside the 'if' to right below this line


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@436
PS9, Line 436: (it->second)
nit: unnecessary parentheses


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@437
PS9, Line 437: // Track the minimum column id.
nit: put this comment on its own line


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@444
PS9, Line 444: (it->second)
nit: unnecessary parentheses


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@522
PS9, Line 522:  
nit: remove so much extra horizontal white space on this line


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@531
PS9, Line 531: or
s/or/which we consider to be/


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@532
PS9, Line 532: current key
currently-seeked key in the primary key index


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@576
PS9, Line 576:
nit: just indent 4 spaces here per 
https://google.github.io/styleguide/cppguide.html#Function_Declarations_and_Definitions

The preferred alternative is to align the parameters vertically but in this 
case I think that would make the line too long and it would get flagged by the 
lint checker.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@581
PS9, Line 581: :
nit: put a space on both sides of the colon in this case to be consistent with 
the rest of the Kudu code base


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@598
PS9, Line 598: +
nit: space around the + for consistency with the rest of the Kudu code base 
please


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@600
PS9, Line 600: &
nit: put the sigil next to the type, not the variable, for consistency with the 
rest of the Kudu code base, i.e.

  const ColumnSchema& col = ...

Here and elsewhere


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@695
PS9, Line 695: gscoped_
nit: is there something else we should do in this scenario (current key 
decoding failure) besides crash? Maybe we should RETURN_NOT_OK and have this 
function return a Status? That way we can detect that skip scan failed, log it, 
and then disable the optimization?

In fact I think that would be much better for the rest of the places we 
CHECK_OK here... I think RETURN_NOT_OK would be better.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/index_skipscan-test.cc@50
PS9, Line 50:
nit: multiple consecutive blank lines not needed here, one is enough



[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 10:

(8 comments)

Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@260
PS9, Line 260: key_iter_ is not NULL.
> this is no longer true (checked inside the method)
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@264
PS9, Line 264:   //higher unique prefix) to scan.
 :   // 2. 'remaining' stores the number the entries to be scanned 
in the current scan range
> Is this an implementation detail or part of the interface of this method? I
It is an implementation detail based on the assumption that currently after 
initializing CFileSet::Iterator, CFileSet::Iterator::SkipToNextScan(..) has 
exclusive access to this variable. Have added a line about this where this 
variable is defined.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@266
PS9, Line 266:   //
 :   // See the .cc file for details on the approach and the 
implementation.
 :   void SkipToNextScan(size_t *rema
> These variables are local to this method, so I don't see why a caller would
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@269
PS9, Line 269:
 :   // Collect the IO statistics for each of the underlying 
columns.
> This seems to be one of the key postconditions from my perspective, since i
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@271
PS9, Line 271: l void Ge
> nit: 'remaining'
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@639
PS9, Line 639:   // 1. Search within the ad-hoc index for the next unique prefix
 :   //(set of keys prior to the first predicate column).
 :   //Searching is done using validx_iter_.
 :   // 2. Read that unique prefix from the ad-hoc index, append 
the lower bound
 :   //of the first predicate column, and seek to that.
 :   //If this matches, this is the lower bound of our desired 
scan.
 :   // 3. If we found our desired lower bou
> How about using something a bit higher level to describe this algorithm:
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@671
PS9, Line 671: ek to the next prefix if our p
> The values need to be inverted due to the variable name change; it doesn't
Done


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@820
PS9, Line 820: tion(
> nit: unexpected indentation
Done



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 10
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 07 Aug 2018 00:45:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#10).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 977 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/10
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 10
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#23).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 977 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/23
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 23
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#22).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 978 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/22
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 22
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#21).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 979 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/21
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 21
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#20).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 979 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/20
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 20
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@260
PS9, Line 260: cur_idx_ >= skip_scan_upper_bound_idx_.
this is no longer true (checked inside the method)


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@264
PS9, Line 264:   // 1. key_iter_->cur_val_ stores the first entry containing 
both the next unique
 :   //prefix key and the predicate value, remaining to be 
scanned in the current batch.
Is this an implementation detail or part of the interface of this method? If 
it's the former then we should remove it from the header file.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@266
PS9, Line 266:   // 2. skip_scan_lower_bound_idx will store the row id of the 
entry found in 1.
 :   // 3. skip_scan_upper_bound_idx is the row id which is an 
exclusive upper bound
 :   //on our current scan range.
These variables are local to this method, so I don't see why a caller would 
care about this. If we want to document this, we should document it in the 
implementation file.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@269
PS9, Line 269:   // 4. cur_idx_ is either same as skip_scan_lower_bound_idx or 
remains unchanged (this occurs
 :   //when cur_idx_ lies somewhere in between the current scan 
range).
This seems to be one of the key postconditions from my perspective, since it 
affects class-level state that is accessed by other methods in this class. Can 
this be phrased in a way that doesn't reference local implementation variables 
but rather provides an explanation about what this does?


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.h@271
PS9, Line 271: remaining
nit: 'remaining'


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@639
PS9, Line 639:   // 1. Place the validx_iter_ at the entry containing the next
 :   //unique "prefix_key" value.
 :   // 2. Place the iterator at or after the entry formed by the 
"prefix_key"
 :   //found in 1. and the predicate value.
 :   // 3. If the seek in 2. results in exact match with the 
predicate value,
 :   //store the row id that represents the last relevant entry 
(upper bound) wrt the
 :   //current "prefix_key"(found in 1.)
How about using something a bit higher level to describe this algorithm:

1. Search within the ad-hoc index for the next unique prefix (set of keys prior 
to the first predicate column). Searching is done using validx_iter_.
2. Read that unique prefix from the ad-hoc index, append the lower bound of the 
first predicate column, and seek to that. If this matches, this is the lower 
bound of our desired scan.
3. If we found our desired lower bound, find an upper bound for the scan by 
searching for the next row key matching one value higher than the highest value 
that will match our first predicate.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@671
PS9, Line 671: skip_scan_searched_cur_prefix_
The values need to be inverted due to the variable name change; it doesn't make 
sense to seek to the next prefix key if we didn't search the current prefix.


http://gerrit.cloudera.org:8080/#/c/10983/9/src/kudu/tablet/cfile_set.cc@820
PS9, Line 820:
nit: unexpected indentation



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 9
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 06 Aug 2018 21:04:26 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-06 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#9).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 979 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/9
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 9
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 7:

(18 comments)

Thanks for the comments. Also, added the condition for disabling skip scan on 
the fly. Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44
PS7, Line 44: class EncodedKey;
> move this down to line 55
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75
PS7, Line 75: return_from_skip_scan
> instead of whether to enable this, this flag should specify how many loops
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514
PS7, Line 514: seek_to_upper_bound_key
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516
PS7, Line 516: 1024
> according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I
Added this in cfile_reader.h (this being a common file for both cfile_set.cc 
and cfile_reader.cc where this variable will be accessed).


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518
PS7, Line 518:   // Convert current key slice to encoded key.
 :   RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString(
 :   base_data_->tablet_schema(), ,
 :   Slice(key_iter_->GetCurrentValue()), _key), "Invalid 
scan prefix key");
> This seems to be repeated in a couple of places. How about we factor this o
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533
PS7, Line 533:   // Set the predicate column with the predicate value.
 :   // This is done to avoid overshooting the lower bound on the 
predicate value
 :   // in the current scan range.
 :   KuduPartialRow partial_row(&(base_data_->tablet_schema()));
 :   gscoped_ptr key_with_pred_value =
 :   GetKeyWithPredicateVal(_row, enc_key.Pass());
 :
 :   return key_iter_->SeekAtOrAfter(*key_with_pred_value, 
_match, /* set_current_value= */ true);
> Like I noted above, I think this part should be factored into its own funct
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@579
PS7, Line 579: bool CFileSet::Iterator::CheckKeyMatch(const std::vector _keys, int start_col_id, int end_col_id) {
 :   // Encode the key currently pointed to by validx_iter_.
 :   Arena arena2(1024);
 :   gscoped_ptr cur_enc_key;
 :   EncodedKey::DecodeEncodedString(
 :   base_data_->tablet_schema(), ,
 :   Slice(key_iter_->GetCurrentValue()), _enc_key);
 :
 :   for (int col_id = start_col_id; col_id <= end_col_id; 
col_id++) {
 :   if 
(base_data_->tablet_schema().column(col_id).Stringify(raw_keys[col_id]) !=
 :  
base_data_->tablet_schema().column(col_id).Stringify(cur_enc_key->raw_keys()[col_id]))
 {
 : return false;
 :   }
 :   }
 :   return true;
 : }
> Instead of checking that the key matches, why not just check that the predi
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@596
PS7, Line 596: // This is a three seek approach for index skip scan 
implementation:
 : // 1. Place the validx_iter_ at the entry containing the next
 : //unique "prefix_key" value.
 : // 2. Place the iterator at or after the entry formed by the 
"prefix_key"
 : //found in 1. and the predicate value.
 : // 3. If the seek in 2. results in exact match with the 
predicate value,
 : //store the row id that represents the last relevant entry 
(upper bound) wrt the
 : //current "prefix_key"(found in 1.)
> move this inside the function to line 605, since it describes what we are d
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@609
PS7, Line 609:   bool lower_bound_key_found;
 :   bool predicate_match = false;
> can we merge these two variables into the single variable lower_bound_key_f
Done


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@614
PS7, Line 614:   // Whether to seek for the next unique prefix, this flag is 
false
 :   // when the prefix key gets incremented while seeking for the 
lower bound
 :   // entry in the loop.
 :   bool continue_seeking_next_prefix = true;
> I think it would be more descriptive of the 

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#8).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 979 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/8
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 8
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#19).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 979 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/19
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 19
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#18).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 983 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/18
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 18
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#17).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 987 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/17
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 17
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#16).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 990 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/16
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 16
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 7:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44
PS7, Line 44: class EncodedKey;
move this down to line 55


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527
PS5, Line 527:
> We cannot make it nullptr , because exact match is being de-referenced in f
Then within SeekAtOrAfter() we can pass a temporary variable to those functions 
and only pass exact_match back if it's not a nullptr.


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75
PS7, Line 75: return_from_skip_scan
instead of whether to enable this, this flag should specify how many loops 
before returning, perhaps with -1 meaning unlimited. Something like:

  DEFINE_int32(skip_scan_short_circuit_loops, 100,
  "Max number of skip attempts the skip scan optimization should make 
before "
  "returning control to the main loop. -1 means there is no maximum.");


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514
PS7, Line 514: seek_to_upper_bound_key
How about:

  s/nul_cols/num_prefix_cols/
  s/seek_to_upper_bound_key/cache_seeked_value/

And then break this into two separate functions:

  Status CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool 
cache_seeked_value)

and

  Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t 
num_prefix_cols)

because it seems like a better separation of responsibilities to have a method 
that seeks to the next prefix key and a different method that seeks for a match 
on the predicate, where the former can be used for both the upper and lower 
bound search, and the latter is only used for the lower bound search.


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516
PS7, Line 516: 1024
according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I 
think this should be 16KB, so 16 * 1024, and we should use a constant for that 
value at the top of this file (inside the namespace), perhaps: static const int 
kPrimaryKeyMaxSize = 16 * 1024;


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518
PS7, Line 518:   // Convert current key slice to encoded key.
 :   RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString(
 :   base_data_->tablet_schema(), ,
 :   Slice(key_iter_->GetCurrentValue()), _key), "Invalid 
scan prefix key");
This seems to be repeated in a couple of places. How about we factor this out 
into its own function, like:

  Status CFileSet::Iterator::DecodeCurrentKey(Arena* arena, 
gscoped_ptr* enc_key) {
RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString(
base_data_->tablet_schema(), arena,
key_iter_->GetCurrentValue(), enc_key), "Failed to decode current value 
from primary key index");
return Status::OK();
 }

So that we can call it here and elsewhere like:

  Arena arena(...);
  gscoped_ptr enc_key;
  RETURN_NOT_OK(DecodeCurrentKey(, _key));


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533
PS7, Line 533:   // Set the predicate column with the predicate value.
 :   // This is done to avoid overshooting the lower bound on the 
predicate value
 :   // in the current scan range.
 :   KuduPartialRow partial_row(&(base_data_->tablet_schema()));
 :   gscoped_ptr key_with_pred_value =
 :   GetKeyWithPredicateVal(_row, enc_key.Pass());
 :
 :   return key_iter_->SeekAtOrAfter(*key_with_pred_value, 
_match, /* set_current_value= */ true);
Like I noted above, I think this part should be factored into its own function, 
like:

  // Seek to the next predicate match within the current prefix.
  Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t 
num_prefix_cols) {
gscoped_ptr enc_key;
Arena arena(1024);
RETURN_NOT_OK(DecodeCurrentKey(, _key));

// Check to see if the current key matches the predicate value. If so, then
// there is no need to seek forward.
if (CheckPredicateMatch(enc_key)) {
  return Status::OK();
}

// If we got this far, the current key doesn't match the predicate, so 
search
// for the next key that matches the current prefix and predicate.
KuduPartialRow partial_row(&(base_data_->tablet_schema()));
gscoped_ptr key_with_pred_value =

[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-03 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..


Patch Set 7:

(21 comments)

Thanks for the comments. Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG@7
PS5, Line 7: KUDU-1291
> nit: s/Kudu-1291/KUDU-1291/
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h@340
PS5, Line 340: o value index,
> document this parameter in the header file comment
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h@343
PS4, Line 343:bool *exact_match, bool set_current_value 
= false);
> nit: add const specifier for the method.  Also, consider returning const re
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@788
PS4, Line 788:
> what if prepared_blocks_ is empty?  If that the thing-that-cannot-be, add D
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@248
PS4, Line 248: r_bound_idx wi
> style nit: here and elsewhere -- we tend to stick the asterisk and ampersan
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@399
PS4, Line 399:
> If 'spec' instance is not modified here, why not to pass it as a const refe
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@414
PS4, Line 414: bool nonfirst_key_pred_exists = false;
> Is copying necessary here?  Why not to use const reference?
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@418
PS4, Line 418:
> nit: I don't think the parentheses are necessary here
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@413
PS5, Line 413:
> nit: add punctuation to all your comments (add a period at the end of the s
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@511
PS5, Line 511:
> this kind of api doc should go into the header file, not the implementation
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527
PS5, Line 527:
> since we are not using this, can we just make this /* exact_match= */ nullp
We cannot make it nullptr , because exact match is being de-referenced in 
functions called by SeekAtOrAfter. I have added a comment that exact_match 
value is not used after the current call.


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@565
PS5, Line 565: tig
> s/Get/Return/
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@568
PS5, Line 568: const ColumnSchema  = cont_row.schema()->column(i);
> remove or VLOG(2)
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@581
PS5, Line 581: rena2(1024);
> nit: use more descriptive variable names here
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@588
PS5, Line 588: let_schema(
> how about rename this to: SkipToNextScan()
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@602
PS5, Line 602: //store the row id that represents the last relevant entry 
(upper bound) wrt the
> I'm not comfortable merging this patch without this feature
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@662
PS5, Line 662:
> exclusive upper bound
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@668
PS5, Line 668:
 : bool exact
> what guarantees the skip_scan_upper_bound_idx_ has been set to upper_bound_
This is a valid point. I have  updated the skip_scan_upper_bound_idx_ value 
here now.


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@675
PS5, Line 675: ow
> wording nit: Check to see whether we have seeked backwards. If so, we need
Done


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@684
PS5, Line 684: // We didn't find an exact match for our lower bound, so 
re-seek.
> can we convert this to a while-loop instead of a do-while-loop, now that th
Done



[kudu-CR] KUDU-1291. Efficiently support predicates on non-prefix key components

2018-08-03 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#7).

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
..

KUDU-1291. Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 920 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/7
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 7
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-03 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#6).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 920 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/6
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 6
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-03 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#15).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 920 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/15
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-03 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#14).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 925 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/14
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-02 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10983/5//COMMIT_MSG@7
PS5, Line 7: Kudu-1291
nit: s/Kudu-1291/KUDU-1291/

Also preferably add a period or a colon after the JIRA number, like KUDU-1291. 
Efficiently blah blah



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Aug 2018 18:33:23 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-02 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 5:

> Ah, I see -- thank you for letting me know.  I thought since the WIP tag was 
> removed it's more or less ready.

No worries, it's actually pretty close, would love to get your feedback on the 
algorithm


--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Aug 2018 18:32:02 +
Gerrit-HasComments: No


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 5:

> > I took a quick look: my comments are nits regarding the style and
 > other minor stuff; I didn't look the semantics or the patch yet.
 >
 > The patch is still at a point that I think we should mostly focus
 > on substance rather than nits, although it will need a detailed nit
 > pass later.

Ah, I see -- thank you for letting me know.  I thought since the WIP tag was 
removed it's more or less ready.


--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Aug 2018 00:46:06 +
Gerrit-HasComments: No


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 5:

> I took a quick look: my comments are nits regarding the style and other minor 
> stuff; I didn't look the semantics or the patch yet.

The patch is still at a point that I think we should mostly focus on substance 
rather than nits, although it will need a detailed nit pass later.


--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Aug 2018 00:42:45 +
Gerrit-HasComments: No


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 5:

(5 comments)

still reviewing, here are some initial comments

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/cfile/cfile_reader.h@340
PS5, Line 340: set_current_value
document this parameter in the header file comment


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@413
PS5, Line 413:   // Initialize predicate column id to an upperbound value
nit: add punctuation to all your comments (add a period at the end of the 
sentence)


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@511
PS5, Line 511: // Increment "num_cols" no of columns
this kind of api doc should go into the header file, not the implementation


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527
PS5, Line 527: _match
since we are not using this, can we just make this /* exact_match= */ nullptr ?


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/index_skipscan-test.cc@400
PS5, Line 400:   int predicate_col_id = 2;
please add a random test case that does not put the predicate on the last column



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Aug 2018 00:41:02 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 4:

(7 comments)

I took a quick look: my comments are nits regarding the style and other minor 
stuff; I didn't look the semantics or the patch yet.

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.h@343
PS4, Line 343:   std::string GetCurrentValue();
nit: add const specifier for the method.  Also, consider returning const 
reference from this method, if possible.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@788
PS4, Line 788: prepared_blocks_
what if prepared_blocks_ is empty?  If that the thing-that-cannot-be, add 
DCHECK() for the non-emptiness condition.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@248
PS4, Line 248: ScanSpec *spec
style nit: here and elsewhere -- we tend to stick the asterisk and ampersand to 
the type, not to the variable/parameter name.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@399
PS4, Line 399: ScanSpec
If 'spec' instance is not modified here, why not to pass it as a const 
reference?


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@414
PS4, Line 414: std::unordered_map predicates
Is copying necessary here?  Why not to use const reference?


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@418
PS4, Line 418: (it->second)
nit: I don't think the parentheses are necessary here


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@500
PS4, Line 500: skip_scan_enabled_ = CanUseSkipScan(spec);
Why not to assign the 'skip_scan_enabled_' field in the CanUseSkipScan() method 
itself since other aux members (like 'suffix_key_column_id_') are assigned in 
that method already?  And rename CanUseSkipScan() into TryEnableSkipScan() or 
alike?



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 4
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Thu, 02 Aug 2018 00:33:01 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 4:

(22 comments)

Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@785
PS4, Line 785:   Slice ret;
> add: DCHECK_EQ(nullptr, validx_iter_);
Added a check for this. Actually no, I think there is no guarantee that value 
index will always be string. I have added a flag to invoke this function 
(set_current_value) now . I wonder if there is an alternative to deal with this.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@204
PS4, Line 204:   // This function is used to place the validx_iter_ at the next 
greater "prefix_key".
> nit: you should try to maintain the same order in the header file as the .c
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@283
PS4, Line 283: skip_scan_enabled_
> rename to use_skip_scan_ because this is confusingly similar to the gflag n
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@286
PS4, Line 286: suffix_key_pred_value_
> rename to skip_scan_predicate_value_
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@289
PS4, Line 289: suffix_key_column_id_
> rename to skip_scan_predicate_column_id_
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@291
PS4, Line 291:   // Row id of the last entry of "suffix_key" predicate value 
wrt to the
 :   // "prefix_key" currently pointed to by validx_iter_
> use something like this as the description:
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@293
PS4, Line 293: int
> make this an int64_t
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@538
PS4, Line 538:   if (skip_scan_enabled_ && (static_cast(cur_idx_) > 
suffix_key_upper_bound_idx_)) {
> Can you break this out into its own function?
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@540
PS4, Line 540: bool suffix_key_match, continue_seeking_next_prefix = false;
> nit: declare these variables separately
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@542
PS4, Line 542: Status next_entry_match, suffix_key_upper_bound_match;
> Can we avoid maintaining so much state outside the loop?
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@602
PS4, Line 602: continue_seeking_next_prefix  = true;
> not indented
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@56
PS4, Line 56: kRandom
> the random tests here are not very random. Can you add an actually-random t
Thanks for this suggestion. I have added the random tests now.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@145
PS4, Line 145: int32_t kNumDistinctP1 = 2;
 : int16_t kNumDistinctP2 = 10;
 : int kNumDistinctP3 = 5;
 : int8_t kNumDistinctP4 = 1;
 : int8_t kNumDistinctP5 = 20;
> declare all of these variables const
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@163
PS4, Line 163: int32_t
> int16_t for p2 here and in the below loops
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@226
PS4, Line 226: //Only 1 row inserted
> nit: formatting, should be:
Done.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@253
PS4, Line 253: for (int i = 1; i <= 1; i++) {
> remove this
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@255
PS4, Line 255: i+1
> style nit: please put spaces around infix operators
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@275
PS4, Line 275: p1 ++
> style nit: please no spaces before postfix operators
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@297
PS4, Line 297:
> style nit: collapse >1 consecutive empty lines into a single empty line so
Done


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@306
PS4, Line 306:   void ScanTablet(ScanSpec *spec, vector *results, const 
char *descr) {
> warning: parameter 'descr' is unused 

[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#5).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 788 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/5
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#13).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 788 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/13
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 13
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#12).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 784 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/12
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 12
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-08-01 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#11).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 798 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/11
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 11
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-07-31 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@56
PS4, Line 56: kRandom
the random tests here are not very random. Can you add an actually-random test 
case that generates random values?

You could do something like this pseudocode:

  Random random(SeedRandom());
  int predicate_val = random.NextInt();
  num_matching = GenerateData(random, num_rows, predicate_col_id, 
predicate_val); // Generates and inserts given number of rows using the given 
PRNG, returns # rows generated that match the predicate_val on predicate_col.
  Spec spec = GeneratePredicate(predicate_col_id, predicate_val);
  ASSERT_EQ(num_matching, ScanWithSpec(spec)); // Scan with given spec and 
return the number of matching results.

The cool thing about SeedRandom() is that if you see a failure you can 
reproduce the same pseudorandom sequence by passing --test_random_seed to your 
test when running it (it logs the random seed generated each time it's called)


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@145
PS4, Line 145: int32_t kNumDistinctP1 = 2;
 : int16_t kNumDistinctP2 = 10;
 : int kNumDistinctP3 = 5;
 : int8_t kNumDistinctP4 = 1;
 : int8_t kNumDistinctP5 = 20;
declare all of these variables const


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@163
PS4, Line 163: int32_t
int16_t for p2 here and in the below loops


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@226
PS4, Line 226: //Only 1 row inserted
nit: formatting, should be:

  // Only 1 row inserted.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@253
PS4, Line 253: for (int i = 1; i <= 1; i++) {
remove this


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@255
PS4, Line 255: i+1
style nit: please put spaces around infix operators


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@275
PS4, Line 275: p1 ++
style nit: please no spaces before postfix operators


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@297
PS4, Line 297:
style nit: collapse >1 consecutive empty lines into a single empty line so more 
code fits on the screen, here and elsewhere


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@306
PS4, Line 306:   void ScanTablet(ScanSpec *spec, vector *results, const 
char *descr) {
> warning: parameter 'descr' is unused [misc-unused-parameters]
see tidy bot's suggestion, or remove descr


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@312
PS4, Line 312: /*for (const string  : *results) {
 :   LOG(INFO) << str;
 : }*/
remove this


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/index_skipscan-test.cc@551
PS4, Line 551: default: {
why do we have a default case? let's just also give this one a name



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 4
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 01 Aug 2018 00:49:13 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-07-31 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 4:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc@793
PS3, Line 793:
> Yes, this works for string and binary values stored as uint8_t byte array.
What happens if the cfile is storing something other than the ad-hoc index? 
This is not in general safe, we should put in additional safeguards if this is 
how we want to do this, but I am a little skeptical about the approach, need to 
think about it a bit more.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/cfile/cfile_reader.cc@785
PS4, Line 785:   Slice ret;
add: DCHECK_EQ(nullptr, validx_iter_);

Also, are we sure that the value index can only ever be a string?


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h
File src/kudu/common/encoded_key.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h@63
PS3, Line 63: um_columns = 0)
> Done
I don't see num_columns documented in the header file comment in rev 4


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@204
PS4, Line 204:   // This function is used to place the validx_iter_ at the next 
greater "prefix_key".
nit: you should try to maintain the same order in the header file as the .cc 
file per the style guide


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@283
PS4, Line 283: skip_scan_enabled_
rename to use_skip_scan_ because this is confusingly similar to the gflag name 
FLAGS_enable_skip_scan


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@286
PS4, Line 286: suffix_key_pred_value_
rename to skip_scan_predicate_value_


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@289
PS4, Line 289: suffix_key_column_id_
rename to skip_scan_predicate_column_id_


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@291
PS4, Line 291:   // Row id of the last entry of "suffix_key" predicate value 
wrt to the
 :   // "prefix_key" currently pointed to by validx_iter_
use something like this as the description:

  // Row id of the next row that does not match the predicate.
  // This is an exclusive upper bound on our scan range.
  // A value of -1 indicates that the upper bound is not known.


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@293
PS4, Line 293: suffix_key_upper_bound_idx_
rename to skip_scan_upper_bound_idx_


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.h@293
PS4, Line 293: int
make this an int64_t


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@538
PS4, Line 538:   if (skip_scan_enabled_ && (static_cast(cur_idx_) > 
suffix_key_upper_bound_idx_)) {
Can you break this out into its own function?


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@540
PS4, Line 540: bool suffix_key_match, continue_seeking_next_prefix = false;
nit: declare these variables separately


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@542
PS4, Line 542: Status next_entry_match, suffix_key_upper_bound_match;
Can we avoid maintaining so much state outside the loop?


http://gerrit.cloudera.org:8080/#/c/10983/4/src/kudu/tablet/cfile_set.cc@602
PS4, Line 602: continue_seeking_next_prefix  = true;
not indented



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 4
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 31 Jul 2018 23:17:11 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-07-30 Thread Anupama Gupta (Code Review)
Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..


Patch Set 4:

(41 comments)

Thanks for the helpful comments Mike. Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc@793
PS3, Line 793:
> This only works if the cell is a pointer, like in the case of a string valu
Yes, this works for string and binary values stored as uint8_t byte array. The 
ad-hoc index stores the keys that are either encoded as binary prefix blocks or 
binary dict blocks.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h
File src/kudu/common/encoded_key.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h@63
PS3, Line 63: um_columns = 0)
> document this parameter in the comment
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc
File src/kudu/common/encoded_key.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc@111
PS3, Line 111:  num_columns, arena)) {
> I thought we were going to generalize this to any column in the compound PK
Right, I have updated this now.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@494
PS3, Line 494: rivate:
> This class is a part of the public C++ client API (see KUDU_EXPORT) so we w
Thanks, it makes sense. I have now added the calling class (CfileSet) as a 
friend class of KuduPartialRow.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@498
PS3, Line 498: friend class RowOpe
> Why is this private field now public? Should not be necessary.
This field is used in the function - CFileSet::Iterator::PrepareBatch , to set 
the minimum values for some columns. I am not quite sure if there is a better 
alternative to this.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h
File src/kudu/common/wire_protocol-test-util.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h@38
PS3, Line 38: inline void 
AddTestRowWithNullableStringToPB(RowOperationsPB::Type op_type,
> move this to the test where it's being used
I have removed this now.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@203
PS3, Line 203:
 :   // This function is used to place the va
> These methods need doc comments
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@230
PS3, Line 230: initted_(false),
> add documentation for this method
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@264
PS3, Line 264:   size_t cur_idx_;
 :   size_t prepared_count_;
> these variables should be documented with a comment
Done


http://gerrit.cloudera.org:8080/#/c/10983/1/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/1/src/kudu/tablet/cfile_set.cc@669
PS1, Line 669:
> instead of this loop, I wonder if we can instead "peek" at the last value s
Makes sense. But I am not quite sure if we can "peek" at the last seeked 
ordinal in the ad-hoc index column here.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@395
PS3, Line 395: tate.
> let's rename this CanUseSkipScan(), have it return a boolean, and set
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@397
PS3, Line 397: }
> Add a flag DEFINE_bool(enable_skip_scan, true, "...") that allows us to tur
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@400
PS3, Line 400:   int num_key_cols =  
base_data_->tablet_schema().num_key_columns();
> looks like a compile error? probably some unstaged changes
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@415
PS3, Line 415:   for (auto it=predicates.begin(); it!=predicates.end(); ++it) {
> error: use of undeclared identifier 'suffix_key_id_' [clang-diagnostic-erro
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@418
PS3, Line 418: string col_name = (it->second).column().name();
> error: use of undeclared identifier 'suffix_key_id_' [clang-diagnostic-erro
Done


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@429
PS3, Line 429:   nonfirst_key_pred_exists = true;
> error: use of undeclared identifier 'suffix_key_id_'; did 

[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-07-30 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10983

to look at the new patch set (#4).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 889 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/10983/4
--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 4
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Kudu-1291 Efficiently support predicates on non-prefix key components

2018-07-30 Thread Anupama Gupta (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11067

to look at the new patch set (#9).

Change subject: Kudu-1291 Efficiently support predicates on non-prefix key 
components
..

Kudu-1291 Efficiently support predicates on non-prefix key components

This patch implements index skip scan approach in the case when there
exists an equality predicate on any of the non-first
primary key columns.

This feature uses the following approach:
1. Seek at the first unique prefix key in the ad-hoc index tree.
2. Seek to the relevant entry corresponding to the unique prefix key
   found in 1. and the predicate value on the suffix key column.
3. To handle the case where multiple entries exist with respect to
   a unique prefix key, store the upper bound index of these entries.
4. Repeat steps 1-3, till all the unique prefix keys are scanned.

Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
---
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/common/encoded_key.cc
M src/kudu/common/encoded_key.h
M src/kudu/common/partial_row.h
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/index_skipscan-test.cc
9 files changed, 890 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/11067/9
--
To view, visit http://gerrit.cloudera.org:8080/11067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iffbc240233e5ad52f6a1c7912d4e44baf32a5f2d
Gerrit-Change-Number: 11067
Gerrit-PatchSet: 9
Gerrit-Owner: Anupama Gupta 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot