Anupama Gupta has posted comments on this change. ( )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 

Patch Set 14:


Please review the changes.
File src/kudu/cfile/cfile_reader.h:
PS12, Line 345: bool cache_seeked_value
> nit: Even though this is calling StoreCurrentValue(), I think calling this
PS12, Line 348:   const std::string& GetCurrentValue() const;
> Add 'const' modifier for this method.
PS12, Line 472:   // Value currently pointed to by validx_iter_.
              :   std::string cur_val_;
> +1
Andrew, in case we have a single public function GetCurrentValue(), it would 
actually lead to different results because it uses CopyNextValues(size_t *n, 
ColumnDataView *dst) . CopyNextValues(..) fetches the next 'n' values from the 
block into 'dst' . Hence, the need to store this fetched value in 'cur_val_' .
File src/kudu/cfile/cfile_reader.h:
PS10, Line 473:   std::string cur_val_;
> This breaks encapsulation. We should document something along these lines i
Listed this caveat in CFileSet::Iterator . Please let me know if it fine.
File src/kudu/cfile/
PS12, Line 792: DCHECK(!prepared_blocks_.empty());
> What happens if this does not hold in release build?
Makes sense. Added return non-OK status (Status::NotFound) in this case.
File src/kudu/tablet/cfile_set.h:
PS12, Line 213: // prefix key refers to the first "num_prefix_cols" columns of 
the current key.
              :   // current key is
> nit: Reword as "If `cache_seeked_value` is true, the predicate column itera
Rephrased as: "If 'cache_seeked_value' is true, the validx_iter_ will store the 
value seeked to." , since the iterator is a validx_iter_ .
PS12, Line 220: Status SeekToRowWithCurPrefixMatchingPred(const 
gscoped_ptr<EncodedKey>& enc_key);
> nit: reword as "Build the key with the same prefix as `cur_enc_key`, that h
File src/kudu/tablet/
PS12, Line 19: #include <algorit
> nit: please use '#include <cmath>' instead and place that among other C++-s
PS12, Line 76:
> Per our discussion with Mike today, I thought the idea was to have this dis
You are right. I assumed this will be disabled right before merging (in the 
final patch). Set this to 'false' now, as now we are testing both the scenarios 
PS12, Line 435:
> nit: you could use 'auto' here.
Removed predicates var as per comment on L436.
PS12, Line 436: spec.predi
> nit: this can just be `spec.predicates()`, then there would be no need for
PS12, Line 440: // Get the column id from the predicate
> nit: move this out of the loop, maybe up by L413 and then use it in definin
PS12, Line 444: col_id ==
> Could find_column() return -1 here?
Yes, if the column is not found in the schema. Not quite sure whether to 
continue the loop in this case or just return with skip scan disabled. Any 
suggestion ?
PS12, Line 462:
              :     // Store the predicate column id.
              :     skip_scan_predicate_column_id_ = min_col_id;
              :     // Store the predicate value.
              :     skip_scan_predicate_value_ = pred_value;
              :     // Store the cutoff on the number of skip scan seeks.
> It was originally written as described, but I asked her to make the change,
Alexey, please let me know if you feel that this change should be reverted.
PS12, Line 542:
> nit: stick the asterisk to the type of the parameter.
PS12, Line 551: atus CFileSet::Iterator::SeekToNextPrefixKey(size_t 
num_prefix_cols, bool cache_seeked_value) {
              :   gscoped_ptr<EncodedKey> enc_key;
              :   Arena arena(kEncodedCompositeKeyMaxSize);
> Hmm.. This is doing a lot of heap-allocating and gets called _very often_.
Noted. I will have to look more into this.
PS12, Line 561:
              :   if (cache_seeked_value) {
              :     // Set the predicate column to the predicate value in case 
we can find a
              :     // predicate match in one search. As a side effect, 
              :     // sets minimum values on the columns after the predicate 
value, which is
              :     // required for correctness here.
              :     KuduPartialRow partial_row(&(base_data_->tablet_schema()));
              :     enc_key = GetKeyWithPredicateVal(&partial_row, enc_key);
              :   }
              :   return key_iter_->SeekAtOrAfter(*enc_key,
              :       /* cache_seeked_value= */ cache_seeked_value,
              :       /* exact_match= */ nullptr);
              : }
              : Status CFileSet::Iterator::SeekToR
> I think this would be clearer as
PS12, Line 583: // If we got this far, the current key doesn't match the 
predicate, so search
              :   // for the next key that matches the current prefix and 
              :   KuduPartialRow partial_row(&(base_data_->tablet_schema()));
              :   gscoped_ptr<EncodedKey> key_with_pred_value =
> Hmm.. IMO SeekToNext* should always move forward at least one row, unless w
Yes, you are right. Renamed this function to 
PS12, Line 662: /    If this matches, this is the lower bound of our desired 
              :   // 3. If we found our desired lower bound, find an upper 
bound for the scan
              :   //    by searching for the next row key matching one value 
higher than the
              :   //    highest value that will match our predicate.
              :   skip_scan_upper_bound_idx_ = upper_bound_idx_;
              :   size_t skip_scan_lower_bound_idx = cur_idx_;
              :   // Whether we found our lower bound key.
              :   bool lower_bound_key_found = false;
> Can you also label down below where 1., 2., and 3. are?
Added these labels "Step 1..." , "Step 2..", "Step 3.." in the comments below.
PS12, Line 704:
> I don't think upper_bound_idx_ changes for the duration of a scan, so maybe
Done. Added this as a member variable of CFileSet::Iterator and initialized it 
in CFileSet::Iterator::TryEnableSkipScan.
PS12, Line 721:     s = SeekToRowWithCurPrefixMatchingPred(next_prefix_key);
> This decodes the current key, and then immediately after we call SeekToNext
Makes sense. I have made this change now.
File src/kudu/tablet/
PS12, Line 92: (
> what it was any other error but IsAlreadyPresent?
Added CHECK_OK() in else block to handle other cases.
PS12, Line 270: void ScanTablet(ScanSpec* spec, vector<string>
> Does it make sense to add additional dimension to the test: whether the ski
Thanks for the reference, Alexey. Done.
PS12, Line 280:  following set of tests
> nit: you could use NO_FATALS() shortcut instead for better readability
Thanks for this, Alexey. Done.
PS12, Line 321:         auto pred_p1 = 
ColumnPredicate::Equality(schema_.column(0), &value_p1);
              :         auto pred_p2 = 
ColumnPredicate::Equality(schema_.column(1), &value_p2);
              :         spec.AddPredicate(pred_p1);
              :         spec.AddPredicate(pred_p2);
              :         vector<string> results;
              :         NO_FATALS(ScanTablet(&spec, &results, "Exact match on 
P1 and P2"));
              :         EXPECT_EQ(1, results.size());
              :       }
              :       break;
              :     c
> nit: use the same layout of scopes and braces as in other cases (that's for
Changed to a more uniform layout. Please let me know if this looks good.

To view, visit
To unsubscribe, visit

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta <>
Gerrit-Reviewer: Alexey Serbin <>
Gerrit-Reviewer: Andrew Wong <>
Gerrit-Reviewer: Anupama Gupta <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 11 Aug 2018 08:14:07 +0000
Gerrit-HasComments: Yes

Reply via email to