Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 15: (39 comments) http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792 PS14, Line 792: // Currently this block holds encoded composite key. : Arena arena(16*1024); : : DCHECK(!prepared_blocks_.empty()); : > nit: As a general rule of thumb, if something is indicative of a programmer Thanks for this pointer. I have removed the DCHECK() in this case. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797 PS14, Line 797: us > nit: I know some of the older code has this reversed, but for new code, we Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p > Not sure if this is necessary to avoid breaking encapsulation, as I think w ACK. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p > It's worth documenting in the header the side effect of `cache_seeked_value If `cache_seeked_value` is true, then a call to StoreCurrentValue() will not seek an extra row. Instead, due to the side effect of using CopyNextValues(size_t *n, ColumnDataView *dst) the first value from the last prepared PK column block will be fetched into 'dst'. I have now documented this in the header file. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@790 PS15, Line 790: Todo > nit: s/Todo/TODO/ Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@793 PS15, Line 793: 16*1024 > Why did we go from using a constant back to using a magic number? http://wi Made this a member variable of CFileIterator as a followup of the review comment in cfile_set.cc (L553). After discussing with Alexey, we came to the conclusion that it is safe to initialize Arena with an initial buffer size of 1M. If required, the Arena size grows wrt the data being allocated to it. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/cfile/cfile_reader.cc@795 PS15, Line 795: DCHECK(!prepared_blocks_.empty()); : if (prepared_blocks_.empty()) { : return Status::NotFound("blocks not found"); : } > This has a code smell. https://martinfowler.com/bliki/CodeSmell.html In deb Yes it is possible. I have removed DCHECK and handled this condition by returning Status::NotFound. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc File src/kudu/common/encoded_key.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92 PS14, Line 92: gscoped_ptr<EncodedKey> *key, : Arena* arena, int32_t num_columns > Per the GSG (https://google.github.io/styleguide/cppguide.html): Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@175 PS15, Line 175: // Due to the caveat(see below) listed for SkipToNextScan(size_t *remaining), : // this class should not reference a separate "client" class that uses key_iter->cur_val. > I find this part of the comment confusing and I think we should just remove Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@248 PS15, Line 248: Caveat: > I don't think we need to specifically call this out as a caveat Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@249 PS15, Line 249: key_iter->cur_val_ > How about just key_iter_? Is that sufficient? Talking about a private membe Ok. Makes sense. Changed this to just key->iter_. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.h@362 PS15, Line 362: skip_scan_num_seeks_cutoff > missing underscore suffix on member variable Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: PS14: > nit: s/unique prefix/distinct prefix Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174 PS14, Line 174: // : // Due to the caveat(see below) listed for SkipToNextScan(size_t *remaining), : // this class should not reference a separate "client" class that uses key_iter->cur_val. > Hmm, I'm not sure what this means. See my comment in cfile_reader.cc re: an Removed this, as per Mike's comment. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@208 PS14, Line 208: : // Decode the currently-seeked key into 'enc_key'. : Status > Before getting into the nitty gritty of each method, I think it'd be helpfu Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@209 PS14, Line 209: // Decode the currently-seeked key into 'enc_key'. : Status DecodeCurrentKey(Arena* arena, gscoped_ptr<EncodedKey>* enc_key); : : // This function is used to place the validx_iter_ at the next greater prefix key. : // prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling : // this function). : // If 'cache_seeked_value' is true, the validx_iter_ will store the value seeked to. : Status SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value); : : // Seek to the next predicate match within the current prefix. : Status SeekToRowWithCurPrefixMatchingPred(const gscoped_ptr<EncodedKey>& enc_key); : : // Build the key with the same prefix as 'cur_enc_key', that has : // 'skip_scan_predicate_value_' in its predicate column, : // and the minimum possible value for all other columns. : gscoped_ptr<EncodedKey> GetKeyWithPredicateVal(KuduPartialRow *p_row, : const gscoped_ptr<EncodedKey>& cur_enc_key); : : // Returns true if the given encoded key matches the skip scan predicate. : bool CheckPredicateMatch(const gscoped_ptr<EncodedKey>& enc_key) const; : : // Check if the column values in the range corresponding to the given : // inclusive column id range [start_col_id, end_col_id] are equal between the : // two given keys. : bool KeyColumnsMatch(const gscoped_ptr<EncodedKey>& key1, : const gscoped_ptr<EncodedKey>& key2, : int start_col_id, int end_col_id) const; : : // This method implements a "skip-scan" optimization, allowing a scan to use : // the primary key index to efficiently seek to matching rows where there are : // predicates on compound key columns that do not necessarily include the : // leading column of the primary key. At the time of writing, only a single : // equality predicate is supported, although the algorithm can support ranges : // of values. : // : // This method should be invoked during the PrepareBatch() phase of the row : // iterator lifecycle. : // : // Caveat: : // This method assumes exclusive access to key_iter->cur_val_. : // : // The in-out parameter 'remaining' refers to the number of rows remaining to : // scan. When this method is invoked, 'remaining' should contain the maximum : // number of remaining rows available to scan. Once this method returns, : // 'remaining' will contain the number of rows to scan to consume the : // available matching rows according to the equality predicate. Note: : // 'remaining' will always be at least 1, although it is a TODO to allow it : // to be 0 (0 violates CHECK conditions elsewhere in the scan code). : // : // Currently, skip scan will be dynamically disabled when the number of seeks : // for unique prefix keys exceeds sqrt(#total rows). We use sqrt(#total rows) : // as a cutoff because based on performance tests on upto 10M rows per tablet, : // the scan time for skip scan is the same as that of the current flow until : // #seeks = sqrt(#total_rows). Further increase in #seeks leads to a drop in : // skip scan performance wrt the current flow. This cutoff value is stored in : // 'skip_scan_num_seeks_cutoff'. : : // Preconditions upon entering this method: : // * key_iter_ is not NULL. : // : // Postconditions upon exiting this method: : // 1. cur_idx_ is updated to the row_id of the next row(containing the next : // higher unique prefix) to scan. : // 2. 'remaining' stores the number the entries to be scanned in the current scan range. : // : // See the .cc file for details on the approach and the implementation. : Status SkipToNextScan(size_t *remaining); : > These could all be private, right? Yes. Made them all private. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@299 PS14, Line 299: key(PK) > nit: add a space Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@301 PS14, Line 301: predicate_column" > nit: remove underscore Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@362 PS14, Line 362: int64_t skip_scan_num_seeks_cutoff; : > nit: follow private members with an underscore Done http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551 PS12, Line 551: "Failed to decode current value from primary key index"); : return Status::OK(); : } > +1 Done. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@562 PS14, Line 562: RETURN_NOT_OK(EncodedKey::IncrementEncodedKeyColumns( : base_data_->tablet_schema(), : &enc_key, &arena, num_prefix_cols)); : : if (cache_seeked_value) { : // Set the predicate column to the predicate value in case we can find a : // predicate match in one search. As a side effect, GetKeyWithPredicateVal() : > Please document in the .h that if `cache_seeked_value` is true, this will s Yes, I have now documented the same. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@593 PS14, Line 593: /* cache_seeked_value= */ true, : /* exact_match= */ nullptr); > Please sprinkle a couple TODOs about how to generalize this patch. E.g.: Incorporated this. Plus, added a similar TODO to find the skip scan upper bound key to support in-range predicates. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@680 PS14, Line 680: !lower_bound_key_found && loop_num < FLAGS_skip_scan_short_circuit_loops; : loop_num++) { : DCHECK_LT(cur_idx_, skip_scan_upper_bound_idx_); > nit: since the steps are described in this method, we can probably get away Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770 PS14, Line 770: reak; > Adding a comment would definitely help. That might be a comment for the me Done http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770 PS14, Line 770: reak; > We effectively have two pointers we are tracking with skip-scan: the primar Added this explanation. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557 PS15, Line 557: Arena arena(FLAGS_max_encoded_key_size_bytes); > And the best approach I think is suggested by Andrew -- have an instance of Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@557 PS15, Line 557: Arena arena(FLAGS_max_encoded_key_size_bytes); > I think it's possible to pass an instance of arena from the upper level to Thanks for the clarification, Alexey. Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@571 PS15, Line 571: ( > nit: this and the pairing brace are not needed. Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@597 PS15, Line 597: KuduPartialRow *p_row, : const gscoped_ptr<EncodedKey>& cur_enc_key > style nit: usually we have input parameter specified first, then go output Done. Also modified this function signature to return Status instead. And added gscoped_ptr<EncodedKey>* enc_key as an output parameter. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@603 PS15, Line 603: auto const &value > style nit: const auto& value Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@606 PS15, Line 606: p_row->Set(col_id, data); > What if this returns non-OK? Added RETURN_NOT_OK() in this case. This function will return STATUS now. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@611 PS15, Line 611: p_row->Set(skip_scan_predicate_column_id_, suffix_col_value); > ditto Added RETURN_NOT_OK() in this case too. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@619 PS15, Line 619: ( > nit: this and the closing parenthesis are not needed Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@742 PS15, Line 742: CheckPredicateMatch(lower_bound_key); > I'm not sure this is what is needed: CheckPredicateMatch() verifies that on Actually we only need the predicate column to match in this case. Added this in the comment. On a side note, you are correct but in this case even if the prefix keys do not match, still the current flow will execute as expected for the next prefix key. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/cfile_set.cc@811 PS15, Line 811: ERROR > I think this should be WARNING, because there are some legit cases where Sk Thanks. Makes sense. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc File src/kudu/tablet/index_skipscan-test.cc: PS15: > We discussed that offline, but just to re-iterate: As we discussed I have added the test case that handles the case when one or more of the column(s) may contain the min and(or) max values of the respective data types. Regarding the scenario, where the cardinality of predicate-matching rows is close to the number of all rows, I think it is covered in the non-random cases. Still, in case you think it is important I will add this too. Please let me know. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@18 PS15, Line 18: #include <stddef.h> > nit: please use c++ style include: Done http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@106 PS15, Line 106: s.IsAlreadyPresent > What if num_matching was incremented, but the IsAlreadyPresent() was report Makes sense. Done. http://gerrit.cloudera.org:8080/#/c/10983/15/src/kudu/tablet/index_skipscan-test.cc@129 PS15, Line 129: { > nit: here and elsewhere under this switch -- remove those braces, they are Done -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Anupama Gupta <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 17 Aug 2018 04:35:45 +0000 Gerrit-HasComments: Yes
