Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: WIP 1. Added correctness test for skip scan 2. Extended the approach for equality query on any PK column ...................................................................... Patch Set 3: (29 comments) I did a first pass but need to spend more time looking at the changes in cfile_set.cc to understand the approach. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.h@401 PS3, Line 401: std::string cur_val_; needs comment: what does cur_val_ mean? http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.h@463 PS3, Line 463: key s/key/value of the cell/ Also note that this consumes the cell from the perspective of the iterator http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/cfile/cfile_reader.cc@793 PS3, Line 793: reinterpret_cast<Slice*>(cdv.data()) This only works if the cell is a pointer, like in the case of a string value, right? http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.h@63 PS3, Line 63: prefix_key_only document this parameter in the comment http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc File src/kudu/common/encoded_key.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/encoded_key.cc@111 PS3, Line 111: tablet_schema.num_key_columns()-1 I thought we were going to generalize this to any column in the compound PK? http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h File src/kudu/common/partial_row.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@494 PS3, Line 494: Status Set(int32_t column_idx, const uint8_t* val); This class is a part of the public C++ client API (see KUDU_EXPORT) so we will need to use some other method to achieve the same effect as exposing this method directly. The problem is that this method exposes implementation details that we don't want application developers to depend on. One other way to do it is to have a switch statement. If we want to allow access just for Kudu itself, we could also add the calling class as a friend class of KuduPartialRow so the friend class can call private methods. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/partial_row.h@498 PS3, Line 498: uint8_t* row_data_; Why is this private field now public? Should not be necessary. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h File src/kudu/common/wire_protocol-test-util.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/common/wire_protocol-test-util.h@38 PS3, Line 38: inline Schema GetTestSchema() { move this to the test where it's being used http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@203 PS3, Line 203: Status SeekToNextPrefixKey(); : void BuildNewKey(KuduPartialRow *p_row); These methods need doc comments http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@230 PS3, Line 230: void EnableSkipScan(ScanSpec *spec); add documentation for this method http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.h@264 PS3, Line 264: bool skip_scan_enabled_ = false; : const void* suffix_eq_pred_; these variables should be documented with a comment http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@395 PS3, Line 395: EnableSkipScan let's rename this CanUseSkipScan(), have it return a boolean, and set skip_scan_enabled_ = CanUseSkipScan(scan_spec) at the call site. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@397 PS3, Line 397: if (num_key_cols <= 1) { Add a flag DEFINE_bool(enable_skip_scan, true, "...") that allows us to turn off skip scan and use it here, i.e. if (!FLAG_enable_skip_scan || num_key_cols <= 1) { return false; } http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@400 PS3, Line 400: suffix_key_id_ = num_key_cols; > error: use of undeclared identifier 'suffix_key_id_' [clang-diagnostic-erro looks like a compile error? probably some unstaged changes http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@511 PS3, Line 511: LOG(INFO) << "initial prepared count = " << prepared_count_; The following huge block of code doesn't have any comments in it... mind adding some? Maybe every 20 or 30 lines write a comment explaining what we are trying to do? http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/cfile_set.cc@524 PS3, Line 524: and use && not 'and' http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc File src/kudu/tablet/index_skipscan-test-2.cc: PS3: don't name this file -2 ... also this test is not included in tablet/CMakeLists.txt so Jenkins will not build and run it http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@41 PS3, Line 41: #include "kudu/tablet/tablet_metrics.h" > warning: #includes are not sorted properly [llvm-include-order] please address tidy bot's comments http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@62 PS3, Line 62: OnePK = 1, TwoPK = 2, ThreePK = 3, FourPK = 4, FivePK = 5, Random=6 nit: constants such as these should be named kOnePK, kTwoPK, etc according to the Google style guide, see https://google.github.io/styleguide/cppguide.html#Constant_Names http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@87 PS3, Line 87: 1 use kOnePK here instead of a magic number, also below http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@122 PS3, Line 122: : default: { : CHECK_OK(builder.AddKeyColumn("P1", STRING)); : CHECK_OK(builder.AddKeyColumn("P2", STRING)); : CHECK_OK(builder.AddColumn("K1", INT16)); : break; : } is this used? http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@173 PS3, Line 173: CHECK_OK(row.SetInt32(0, p1)); All of your test cases use sequential row values. We should add a test with non-sequential and "holey" test values because it seems to be that there is a class of bugs that will only manifest with a sparse set of values being scanned. http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@184 PS3, Line 184: int32_t distinct_p1 = 2; : int16_t distinct_p2 = 10; : int distinct_p3 = 5; : int8_t distinct_p4 = 1; this seems very copy / paste, if the intention is to use the same values for the same column across these test cases then let's use some constants like kNumDistinctP1 = 2 at the top of the test or something http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@273 PS3, Line 273: case 2: { describe your test case here http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@276 PS3, Line 276: p1 nit: avoid using non-descriptive variable names like p1, I'm not sure what it means http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@284 PS3, Line 284: { describe this test http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tablet/index_skipscan-test-2.cc@294 PS3, Line 294: { please add a very short description of what we're testing here http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-2.cc File src/kudu/tserver/tablet_server-test-2.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-2.cc@158 PS3, Line 158: TEST_F(TabletServerTest, TestScanNewData) { Add a comment describing the purpose of this test and use a different name for this test since this name is too generic. If this test is a benchmark then maybe name it something like IndexSkipScanBenchmark http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-base.cc File src/kudu/tserver/tablet_server-test-base.cc: http://gerrit.cloudera.org:8080/#/c/10983/3/src/kudu/tserver/tablet_server-test-base.cc@85 PS3, Line 85: //: schema_(GetSimpleTestSchema()), : : schema_(GetTestSchema()), do this in your test child classes, not in the base class -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 3 Gerrit-Owner: Anupama Gupta <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 27 Jul 2018 03:27:59 +0000 Gerrit-HasComments: Yes
