Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: WIP 1. Added correctness test for skip scan 2. Extended the 
approach for equality query on any PK column
......................................................................


Patch Set 3:

(29 comments)

I did a first pass but need to spend more time looking at the changes in 
cfile_set.cc to understand the approach.

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.h@401
PS3, Line 401:   std::string cur_val_;
needs comment: what does cur_val_ mean?


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.h@463
PS3, Line 463: key
s/key/value of the cell/

Also note that this consumes the cell from the perspective of the iterator


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc@793
PS3, Line 793: reinterpret_cast<Slice*>(cdv.data())
This only works if the cell is a pointer, like in the case of a string value, 
right?


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h
File src/kudu/common/encoded_key.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h@63
PS3, Line 63: prefix_key_only
document this parameter in the comment


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc
File src/kudu/common/encoded_key.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc@111
PS3, Line 111: tablet_schema.num_key_columns()-1
I thought we were going to generalize this to any column in the compound PK?


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@494
PS3, Line 494: Status Set(int32_t column_idx, const uint8_t* val);
This class is a part of the public C++ client API (see KUDU_EXPORT) so we will 
need to use some other method to achieve the same effect as exposing this 
method directly. The problem is that this method exposes implementation details 
that we don't want application developers to depend on. One other way to do it 
is to have a switch statement. If we want to allow access just for Kudu itself, 
we could also add the calling class as a friend class of KuduPartialRow so the 
friend class can call private methods.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@498
PS3, Line 498: uint8_t* row_data_;
Why is this private field now public? Should not be necessary.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h
File src/kudu/common/wire_protocol-test-util.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h@38
PS3, Line 38: inline Schema GetTestSchema() {
move this to the test where it's being used


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@203
PS3, Line 203:   Status SeekToNextPrefixKey();
             :   void BuildNewKey(KuduPartialRow *p_row);
These methods need doc comments


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@230
PS3, Line 230:   void EnableSkipScan(ScanSpec *spec);
add documentation for this method


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@264
PS3, Line 264:   bool skip_scan_enabled_ = false;
             :   const void* suffix_eq_pred_;
these variables should be documented with a comment


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@395
PS3, Line 395: EnableSkipScan
let's rename this CanUseSkipScan(), have it return a boolean, and set

  skip_scan_enabled_ = CanUseSkipScan(scan_spec)

at the call site.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@397
PS3, Line 397:   if (num_key_cols <= 1) {
Add a flag DEFINE_bool(enable_skip_scan, true, "...") that allows us to turn 
off skip scan and use it here, i.e.

  if (!FLAG_enable_skip_scan || num_key_cols <= 1) {
    return false;
  }


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@400
PS3, Line 400:   suffix_key_id_ = num_key_cols;
> error: use of undeclared identifier 'suffix_key_id_' [clang-diagnostic-erro
looks like a compile error? probably some unstaged changes


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@511
PS3, Line 511:   LOG(INFO) << "initial prepared count = " << prepared_count_;
The following huge block of code doesn't have any comments in it... mind adding 
some?

Maybe every 20 or 30 lines write a comment explaining what we are trying to do?


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@524
PS3, Line 524: and
use && not 'and'


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc
File src/kudu/tablet/index_skipscan-test-2.cc:

PS3:
don't name this file -2 ... also this test is not included in 
tablet/CMakeLists.txt so Jenkins will not build and run it


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@41
PS3, Line 41: #include "kudu/tablet/tablet_metrics.h"
> warning: #includes are not sorted properly [llvm-include-order]
please address tidy bot's comments


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@62
PS3, Line 62:   OnePK = 1, TwoPK = 2, ThreePK = 3, FourPK = 4, FivePK = 5, 
Random=6
nit: constants such as these should be named kOnePK, kTwoPK, etc according to 
the Google style guide, see 
https://google.github.io/styleguide/cppguide.html#Constant_Names


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@87
PS3, Line 87: 1
use kOnePK here instead of a magic number, also below


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@122
PS3, Line 122:
             :       default: {
             :         CHECK_OK(builder.AddKeyColumn("P1", STRING));
             :         CHECK_OK(builder.AddKeyColumn("P2", STRING));
             :         CHECK_OK(builder.AddColumn("K1", INT16));
             :         break;
             :       }
is this used?


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@173
PS3, Line 173:               CHECK_OK(row.SetInt32(0, p1));
All of your test cases use sequential row values. We should add a test with 
non-sequential and "holey" test values because it seems to be that there is a 
class of bugs that will only manifest with a sparse set of values being scanned.


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@184
PS3, Line 184:         int32_t distinct_p1 = 2;
             :         int16_t distinct_p2 = 10;
             :         int distinct_p3 = 5;
             :         int8_t distinct_p4 = 1;
this seems very copy / paste, if the intention is to use the same values for 
the same column across these test cases then let's use some constants like 
kNumDistinctP1 = 2 at the top of the test or something


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@273
PS3, Line 273:     case 2: {
describe your test case here


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@276
PS3, Line 276: p1
nit: avoid using non-descriptive variable names like p1, I'm not sure what it 
means


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@284
PS3, Line 284:       {
describe this test


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@294
PS3, Line 294:       {
please add a very short description of what we're testing here


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-2.cc
File src/kudu/tserver/tablet_server-test-2.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-2.cc@158
PS3, Line 158: TEST_F(TabletServerTest, TestScanNewData) {
Add a comment describing the purpose of this test and use a different name for 
this test since this name is too generic.

If this test is a benchmark then maybe name it something like 
IndexSkipScanBenchmark


http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-base.cc
File src/kudu/tserver/tablet_server-test-base.cc:

http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-base.cc@85
PS3, Line 85:     //: schema_(GetSimpleTestSchema()),
            :       : schema_(GetTestSchema()),
do this in your test child classes, not in the base class



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 3
Gerrit-Owner: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 27 Jul 2018 03:27:59 +0000
Gerrit-HasComments: Yes

Reply via email to