Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10983 )

Change subject: KUDU-1291. Efficiently support predicates on non-prefix key 
components
......................................................................


Patch Set 7:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44
PS7, Line 44: class EncodedKey;
move this down to line 55


http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527
PS5, Line 527:
> We cannot make it nullptr , because exact match is being de-referenced in f
Then within SeekAtOrAfter() we can pass a temporary variable to those functions 
and only pass exact_match back if it's not a nullptr.


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75
PS7, Line 75: return_from_skip_scan
instead of whether to enable this, this flag should specify how many loops 
before returning, perhaps with -1 meaning unlimited. Something like:

  DEFINE_int32(skip_scan_short_circuit_loops, 100,
      "Max number of skip attempts the skip scan optimization should make 
before "
      "returning control to the main loop. -1 means there is no maximum.");


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514
PS7, Line 514: seek_to_upper_bound_key
How about:

  s/nul_cols/num_prefix_cols/
  s/seek_to_upper_bound_key/cache_seeked_value/

And then break this into two separate functions:

  Status CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool 
cache_seeked_value)

and

  Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t 
num_prefix_cols)

because it seems like a better separation of responsibilities to have a method 
that seeks to the next prefix key and a different method that seeks for a match 
on the predicate, where the former can be used for both the upper and lower 
bound search, and the latter is only used for the lower bound search.


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516
PS7, Line 516: 1024
according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I 
think this should be 16KB, so 16 * 1024, and we should use a constant for that 
value at the top of this file (inside the namespace), perhaps: static const int 
kPrimaryKeyMaxSize = 16 * 1024;


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518
PS7, Line 518:   // Convert current key slice to encoded key.
             :   RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString(
             :       base_data_->tablet_schema(), &arena,
             :       Slice(key_iter_->GetCurrentValue()), &enc_key), "Invalid 
scan prefix key");
This seems to be repeated in a couple of places. How about we factor this out 
into its own function, like:

  Status CFileSet::Iterator::DecodeCurrentKey(Arena* arena, 
gscoped_ptr<EncodedKey>* enc_key) {
    RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString(
        base_data_->tablet_schema(), arena,
        key_iter_->GetCurrentValue(), enc_key), "Failed to decode current value 
from primary key index");
    return Status::OK();
 }

So that we can call it here and elsewhere like:

  Arena arena(...);
  gscoped_ptr<EncodedKey> enc_key;
  RETURN_NOT_OK(DecodeCurrentKey(&arena, &enc_key));


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533
PS7, Line 533:   // Set the predicate column with the predicate value.
             :   // This is done to avoid overshooting the lower bound on the 
predicate value
             :   // in the current scan range.
             :   KuduPartialRow partial_row(&(base_data_->tablet_schema()));
             :   gscoped_ptr<EncodedKey> key_with_pred_value =
             :       GetKeyWithPredicateVal(&partial_row, enc_key.Pass());
             :
             :   return key_iter_->SeekAtOrAfter(*key_with_pred_value, 
&exact_match, /* set_current_value= */ true);
Like I noted above, I think this part should be factored into its own function, 
like:

  // Seek to the next predicate match within the current prefix.
  Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t 
num_prefix_cols) {
    gscoped_ptr<EncodedKey> enc_key;
    Arena arena(1024);
    RETURN_NOT_OK(DecodeCurrentKey(&arena, &enc_key));

    // Check to see if the current key matches the predicate value. If so, then
    // there is no need to seek forward.
    if (CheckPredicateMatch(enc_key)) {
      return Status::OK();
    }

    // If we got this far, the current key doesn't match the predicate, so 
search
    // for the next key that matches the current prefix and predicate.
    KuduPartialRow partial_row(&(base_data_->tablet_schema()));
    gscoped_ptr<EncodedKey> key_with_pred_value =
        GetKeyWithPredicateVal(&partial_row, enc_key);
    return key_iter_->SeekAtOrAfter(*key_with_pred_value,
                                    /* exact_match= */ nullptr,
                                    /* set_current_value= */ true);
  }


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@579
PS7, Line 579: bool CFileSet::Iterator::CheckKeyMatch(const std::vector<const 
void *> &raw_keys, int start_col_id, int end_col_id) {
             :   // Encode the key currently pointed to by validx_iter_.
             :   Arena arena2(1024);
             :   gscoped_ptr<EncodedKey> cur_enc_key;
             :   EncodedKey::DecodeEncodedString(
             :       base_data_->tablet_schema(), &arena2,
             :       Slice(key_iter_->GetCurrentValue()), &cur_enc_key);
             :
             :   for (int col_id = start_col_id; col_id <= end_col_id; 
col_id++) {
             :       if 
(base_data_->tablet_schema().column(col_id).Stringify(raw_keys[col_id]) !=
             :          
base_data_->tablet_schema().column(col_id).Stringify(cur_enc_key->raw_keys()[col_id]))
 {
             :         return false;
             :       }
             :   }
             :   return true;
             : }
Instead of checking that the key matches, why not just check that the 
predicates match the given key? Ex:

  // Returns true if the given encoded key matches the skip scan predicate.
  bool CFileSet::Iterator::CheckPredicateMatch(const gscoped_ptr<EncodedKey>& 
enc_key) const {
    return 
base_data_->tablet_schema().column(skip_scan_predicate_column_id_).Compare(
        skip_scan_predicate_value_,
        enc_key->raw_keys()[skip_scan_predicate_column_id_]) == 0;
  }


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@596
PS7, Line 596: // This is a three seek approach for index skip scan 
implementation:
             : // 1. Place the validx_iter_ at the entry containing the next
             : //    unique "prefix_key" value.
             : // 2. Place the iterator at or after the entry formed by the 
"prefix_key"
             : //    found in 1. and the predicate value.
             : // 3. If the seek in 2. results in exact match with the 
predicate value,
             : //    store the row id that represents the last relevant entry 
(upper bound) wrt the
             : //    current "prefix_key"(found in 1.)
move this inside the function to line 605, since it describes what we are doing 
inside the function and this is not a header file


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@609
PS7, Line 609:   bool lower_bound_key_found;
             :   bool predicate_match = false;
can we merge these two variables into the single variable lower_bound_key_found?


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@614
PS7, Line 614:   // Whether to seek for the next unique prefix, this flag is 
false
             :   // when the prefix key gets incremented while seeking for the 
lower bound
             :   // entry in the loop.
             :   bool continue_seeking_next_prefix = true;
I think it would be more descriptive of the reasoning behind the program logic 
to rename this variable like so:

  // Whether the lower bound prefix key rolled over between the first and
  // second seek to determine the lower bound key.
  bool prefix_key_rolled = false;


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@621
PS7, Line 621:   while (!predicate_match) {
how about:

  for (int loop_num = 0;
       !lower_bound_key_found && loop_num < FLAGS_skip_scan_short_circuit_loops;
       loop_num++) {


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@678
PS7, Line 678:
add: CHECK_OK(s) to ensure we didn't get something unexpected in 's'


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@681
PS7, Line 681: CheckKeyMatch
Why do we care if we have an exact key match? Don't we only care if we have a 
predicate match here?


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@684
PS7, Line 684:     // We didn't find an exact match for our lower bound, so 
re-seek.
             :     if (!predicate_match)  {
             :       // If the prefix key has changed already, do not seek for 
the next prefix
             :       // in the next iteration.
             :       if (  !CheckKeyMatch(key_with_pred_value->raw_keys(),
             :                          0, skip_scan_predicate_column_id_ - 1)) 
{
             :         continue_seeking_next_prefix = false;
             :       }
             :       continue;
             :     }
I think this could be equivalently written as:

    // We weren't able to find a predicate match for our lower bound key, so 
loop and search again.
    if (!lower_bound_key_found)  {
      // If the prefix key rolled between our initial lower bound next prefix
      // seek and our seek to the predicate match with that prefix, it's
      // possible that the latest prefix will have a predicate match, so on our
      // next iteration of the loop, we should not seek to the next prefix.
      if (!KeyColumnsMatch(next_prefix_key, lower_bound_key,
                           /* start_col_id= */ 0,
                           /* end_col_id= */ skip_scan_predicate_column_id_ - 
1)) {
        prefix_key_rolled = true;
       }
       continue;
     }

Where KeyColumnsMatch looks like:

  // Check if the column values in the range corresponding to the given
  // inclusive column id range [start_col_id, end_col_id] are equal between the
  // two given keys.
  bool CFileSet::Iterator::KeyColumnsMatch(const gscoped_ptr<EncodedKey>& key1,
                                           const gscoped_ptr<EncodedKey>& key2,
                                           int start_col_id, int end_col_id) 
const {
    const auto& schema = base_data_->tablet_schema();
    for (int col_id = start_col_id; col_id <= end_col_id; col_id++) {
      if (schema.column(col_id).Compare(key1->raw_keys()[col_id], 
key2->raw_keys()[col_id]) != 0) {
        return false;
      }
    }
    return true;
  }

Because we can easily save the results (current key) of our two lower bound key 
seeks and then avoid lots of additional encoding by just passing them in here 
at the call site.


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@703
PS7, Line 703:     // We add 1 here because we want to increment the value of a 
prefix that
             :     // includes even our predicate column.
             :     // Note: After this step, the current value pointed by 
validx_iter_ is an
             :     // exclusive upper bound on our scan range, therefore, 
key_iter->cur_val_ is not updated.
             :     // to avoid skipping over this current "prefix_key" in the 
next iteration.
             :     s = SeekToNextPrefixKey(skip_scan_predicate_column_id_+1, /* 
seek_to_upper_bound_key */ true);
I think it would be easier to follow the logic if this said something more 
along the lines of:

    int num_prefix_cols_including_predicate = skip_scan_predicate_column_id_ + 
1;
    // Do not cache the seeked value for the upper bound because the next time
    // we seek for the lower bound we want to be able to get a match on the
    // previous upper bound -- it's possible that it has a predicate match.
    s = SeekToNextPrefixKey(num_prefix_cols_including_predicate, /* 
cache_seeked_value=*/ false);


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@723
PS7, Line 723: continue
remove (doesn't do anything at the end of the loop)


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@730
PS7, Line 730: TODO
change this to TODO(anupama)


http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@749
PS7, Line 749: (static_cast<int>(cur_idx_) >= skip_scan_upper_bound_idx_)
Move the check for this boundary condition inside SkipToNextScan() to 
encapsulate the use of the skip_scan_upper_bound_idx_ variable into a single 
method instead of having to do checks for it so high up in the program logic 
and having to worry about this condition being part of the preconditions 
imposed upon the caller of the method and therefore part of its API contract.

So at the top of SkipToNextScan() you can just do:

  // Keep scanning if we're still in the range that needs scanning from our
  // previous seek.
  if (cur_idx_ < skip_scan_upper_bound_idx_) {
    *remaining = std::max<int64_t>(skip_scan_upper_bound_idx_ - cur_idx_, 1);
    return;
  }



--
To view, visit http://gerrit.cloudera.org:8080/10983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f
Gerrit-Change-Number: 10983
Gerrit-PatchSet: 7
Gerrit-Owner: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Anupama Gupta <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sun, 05 Aug 2018 17:49:45 +0000
Gerrit-HasComments: Yes

Reply via email to