Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 15:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792
PS14, Line 792: // Currently this block holds encoded composite key.
              :   Arena arena(16*1024);
              :
              :   DCHECK(!prepared_blocks_.empty());
              :
> nit: As a general rule of thumb, if something is indicative of a programmer
Thanks for this pointer. I have removed the DCHECK() in this case.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797
PS14, Line 797: us
> nit: I know some of the older code has this reversed, but for new code, we
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
              :   last_prepare_count_ = 0;
              :
              :   p
> Not sure if this is necessary to avoid breaking encapsulation, as I think w
ACK.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871
PS14, Line 871:   last_prepare_idx_ = b->first_row_idx() + 
b->dblk_->GetCurrentIndex();
              :   last_prepare_count_ = 0;
              :
              :   p
> It's worth documenting in the header the side effect of `cache_seeked_value
If `cache_seeked_value` is true, then a call to StoreCurrentValue() will not 
seek an extra row. Instead, due to the side effect of using 
CopyNextValues(size_t *n, ColumnDataView *dst) the first value from the last 
prepared PK column block will be fetched into 'dst'.

I have now documented this in the header file.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@790
PS15, Line 790: Todo
> nit: s/Todo/TODO/
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@793
PS15, Line 793: 16*1024
> Why did we go from using a constant back to using a magic number? http://wi
Made this a member variable of CFileIterator as a followup of the review 
comment in cfile_set.cc (L553). After discussing with Alexey, we came to the 
conclusion that it is safe to initialize Arena with an initial buffer size of 
1M. If required, the Arena size grows wrt the data being allocated to it.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@795
PS15, Line 795:   DCHECK(!prepared_blocks_.empty());
              :   if (prepared_blocks_.empty()) {
              :     return Status::NotFound("blocks not found");
              :   }
> This has a code smell. https://martinfowler.com/bliki/CodeSmell.html In deb
Yes it is possible. I have removed DCHECK and handled this condition by 
returning Status::NotFound.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc
File src/kudu/common/encoded_key.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92
PS14, Line 92:                                        gscoped_ptr<EncodedKey> 
*key,
             :                                        Arena* arena, int32_t 
num_columns
> Per the GSG (https://google.github.io/styleguide/cppguide.html):
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@175
PS15, Line 175: // Due to the caveat(see below) listed for 
SkipToNextScan(size_t *remaining),
              : // this class should not reference a separate "client" class 
that uses key_iter->cur_val.
> I find this part of the comment confusing and I think we should just remove
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@248
PS15, Line 248: Caveat:
> I don't think we need to specifically call this out as a caveat
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@249
PS15, Line 249: key_iter->cur_val_
> How about just key_iter_? Is that sufficient? Talking about a private membe
Ok. Makes sense. Changed this to just key->iter_.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@362
PS15, Line 362: skip_scan_num_seeks_cutoff
> missing underscore suffix on member variable
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

PS14:
> nit: s/unique prefix/distinct prefix
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174
PS14, Line 174: //
              : // Due to the caveat(see below) listed for 
SkipToNextScan(size_t *remaining),
              : // this class should not reference a separate "client" class 
that uses key_iter->cur_val.
> Hmm, I'm not sure what this means. See my comment in cfile_reader.cc re: an
Removed this, as per Mike's comment.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@208
PS14, Line 208:
              :   // Decode the currently-seeked key into 'enc_key'.
              :   Status
> Before getting into the nitty gritty of each method, I think it'd be helpfu
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@209
PS14, Line 209: // Decode the currently-seeked key into 'enc_key'.
              :   Status DecodeCurrentKey(Arena* arena, 
gscoped_ptr<EncodedKey>* enc_key);
              :
              :   // This function is used to place the validx_iter_ at the 
next greater prefix key.
              :   // prefix key refers to the first "num_prefix_cols" columns 
of the current key.
              :   // current key is the key currently pointed to by 
validx_iter_ (prior to calling
              :   // this function).
              :   // If 'cache_seeked_value' is true, the validx_iter_ will 
store the value seeked to.
              :   Status SeekToNextPrefixKey(size_t num_prefix_cols, bool 
cache_seeked_value);
              :
              :   // Seek to the next predicate match within the current prefix.
              :   Status SeekToRowWithCurPrefixMatchingPred(const 
gscoped_ptr<EncodedKey>& enc_key);
              :
              :   // Build the key with the same prefix as 'cur_enc_key', that 
has
              :   // 'skip_scan_predicate_value_' in its predicate column,
              :   // and the minimum possible value for all other columns.
              :   gscoped_ptr<EncodedKey> GetKeyWithPredicateVal(KuduPartialRow 
*p_row,
              :                                                  const 
gscoped_ptr<EncodedKey>& cur_enc_key);
              :
              :   // Returns true if the given encoded key matches the skip 
scan predicate.
              :   bool CheckPredicateMatch(const gscoped_ptr<EncodedKey>& 
enc_key) const;
              :
              :   // Check if the column values in the range corresponding to 
the given
              :   // inclusive column id range [start_col_id, end_col_id] are 
equal between the
              :   // two given keys.
              :   bool KeyColumnsMatch(const gscoped_ptr<EncodedKey>& key1,
              :                        const gscoped_ptr<EncodedKey>& key2,
              :                        int start_col_id, int end_col_id) const;
              :
              :   // This method implements a "skip-scan" optimization, 
allowing a scan to use
              :   // the primary key index to efficiently seek to matching rows 
where there are
              :   // predicates on compound key columns that do not necessarily 
include the
              :   // leading column of the primary key. At the time of writing, 
only a single
              :   // equality predicate is supported, although the algorithm 
can support ranges
              :   // of values.
              :   //
              :   // This method should be invoked during the PrepareBatch() 
phase of the row
              :   // iterator lifecycle.
              :   //
              :   // Caveat:
              :   // This method assumes exclusive access to key_iter->cur_val_.
              :   //
              :   // The in-out parameter 'remaining' refers to the number of 
rows remaining to
              :   // scan. When this method is invoked, 'remaining' should 
contain the maximum
              :   // number of remaining rows available to scan. Once this 
method returns,
              :   // 'remaining' will contain the number of rows to scan to 
consume the
              :   // available matching rows according to the equality 
predicate. Note:
              :   // 'remaining' will always be at least 1, although it is a 
TODO to allow it
              :   // to be 0 (0 violates CHECK conditions elsewhere in the scan 
code).
              :   //
              :   // Currently, skip scan will be dynamically disabled when the 
number of seeks
              :   // for unique prefix keys exceeds sqrt(#total rows). We use 
sqrt(#total rows)
              :   // as a cutoff because based on performance tests on upto 10M 
rows per tablet,
              :   // the scan time for skip scan is the same as that of the 
current flow until
              :   // #seeks = sqrt(#total_rows). Further increase in #seeks 
leads to a drop in
              :   // skip scan performance wrt the current flow. This cutoff 
value is stored in
              :   // 'skip_scan_num_seeks_cutoff'.
              :
              :   // Preconditions upon entering this method:
              :   // * key_iter_ is not NULL.
              :   //
              :   // Postconditions upon exiting this method:
              :   // 1. cur_idx_ is updated to the row_id of the next 
row(containing the next
              :   //    higher unique prefix) to scan.
              :   // 2. 'remaining' stores the number the entries to be scanned 
in the current scan range.
              :   //
              :   // See the .cc file for details on the approach and the 
implementation.
              :   Status SkipToNextScan(size_t *remaining);
              :
> These could all be private, right?
Yes. Made them all private.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@299
PS14, Line 299: key(PK)
> nit: add a space
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@301
PS14, Line 301: predicate_column"
> nit: remove underscore
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@362
PS14, Line 362: int64_t skip_scan_num_seeks_cutoff;
              :
> nit: follow private members with an underscore
Done


http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551
PS12, Line 551:     "Failed to decode current value from primary key index");
              :   return Status::OK();
              : }
> +1
Done.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@562
PS14, Line 562:   RETURN_NOT_OK(EncodedKey::IncrementEncodedKeyColumns(
              :       base_data_->tablet_schema(),
              :       &enc_key, &arena, num_prefix_cols));
              :
              :   if (cache_seeked_value) {
              :     // Set the predicate column to the predicate value in case 
we can find a
              :     // predicate match in one search. As a side effect, 
GetKeyWithPredicateVal()
              :
> Please document in the .h that if `cache_seeked_value` is true, this will s
Yes, I have now documented the same.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@593
PS14, Line 593:       /* cache_seeked_value= */ true,
              :       /* exact_match= */ nullptr);
> Please sprinkle a couple TODOs about how to generalize this patch. E.g.:
Incorporated this. Plus, added a similar TODO to find the skip scan upper bound 
key to support in-range predicates.


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@680
PS14, Line 680:     !lower_bound_key_found && loop_num < 
FLAGS_skip_scan_short_circuit_loops;
              :        loop_num++) {
              :     DCHECK_LT(cur_idx_, skip_scan_upper_bound_idx_);
> nit: since the steps are described in this method, we can probably get away
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770
PS14, Line 770: reak;
> Adding a comment would definitely help.  That might be a comment for the me
Done


http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770
PS14, Line 770: reak;
> We effectively have two pointers we are tracking with skip-scan: the primar
Added this explanation.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557
PS15, Line 557: Arena arena(FLAGS_max_encoded_key_size_bytes);
> And the best approach I think is suggested by Andrew -- have an instance of
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557
PS15, Line 557: Arena arena(FLAGS_max_encoded_key_size_bytes);
> I think it's possible to pass an instance of arena from the upper level to
Thanks for the clarification, Alexey. Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@571
PS15, Line 571: (
> nit: this and the pairing brace are not needed.
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@597
PS15, Line 597: KuduPartialRow *p_row,
              :     const gscoped_ptr<EncodedKey>& cur_enc_key
> style nit: usually we have input parameter specified first, then go output
Done. Also modified this function signature to return Status instead. And added 
gscoped_ptr<EncodedKey>* enc_key as an output parameter.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@603
PS15, Line 603: auto const &value
> style nit: const auto& value
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@606
PS15, Line 606: p_row->Set(col_id, data);
> What if this returns non-OK?
Added RETURN_NOT_OK() in this case. This function will return STATUS now.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@611
PS15, Line 611: p_row->Set(skip_scan_predicate_column_id_, suffix_col_value);
> ditto
Added RETURN_NOT_OK() in this case too.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@619
PS15, Line 619: (
> nit: this and the closing parenthesis are not needed
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@742
PS15, Line 742: CheckPredicateMatch(lower_bound_key);
> I'm not sure this is what is needed: CheckPredicateMatch() verifies that on
Actually we only need the predicate column to match in this case. Added this in 
the comment.  On a side note, you are correct but in this case even if the 
prefix keys do not match, still the current flow will execute as expected for 
the next prefix key.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@811
PS15, Line 811: ERROR
> I think this should be WARNING, because there are some legit cases where Sk
Thanks. Makes sense.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc
File src/kudu/tablet/index_skipscan-test.cc:

PS15:
> We discussed that offline, but just to re-iterate:
As we discussed I have added the test case that handles the case when one or 
more of the column(s)  may contain the min and(or) max values of the respective 
data types. Regarding the scenario, where the cardinality of predicate-matching 
rows is close to the number of all rows, I think it is covered in the 
non-random cases. Still, in case you think it is important I will add this too. 
Please let me know.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@18
PS15, Line 18: #include <stddef.h>
> nit: please use c++ style include:
Done


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@106
PS15, Line 106: s.IsAlreadyPresent
> What if num_matching was incremented, but the IsAlreadyPresent() was report
Makes sense. Done.


http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@129
PS15, Line 129: {
> nit: here and elsewhere under this switch -- remove those braces, they are
Done



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 17 Aug 2018 04:35:45 +0000
Gerrit-HasComments: Yes

Reply via email to