Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 7: (19 comments) http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.h@44 PS7, Line 44: class EncodedKey; move this down to line 55 http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/5/src/kudu/tablet/cfile_set.cc@527 PS5, Line 527: > We cannot make it nullptr , because exact match is being de-referenced in f Then within SeekAtOrAfter() we can pass a temporary variable to those functions and only pass exact_match back if it's not a nullptr. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@75 PS7, Line 75: return_from_skip_scan instead of whether to enable this, this flag should specify how many loops before returning, perhaps with -1 meaning unlimited. Something like: DEFINE_int32(skip_scan_short_circuit_loops, 100, "Max number of skip attempts the skip scan optimization should make before " "returning control to the main loop. -1 means there is no maximum."); http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@514 PS7, Line 514: seek_to_upper_bound_key How about: s/nul_cols/num_prefix_cols/ s/seek_to_upper_bound_key/cache_seeked_value/ And then break this into two separate functions: Status CFileSet::Iterator::SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value) and Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t num_prefix_cols) because it seems like a better separation of responsibilities to have a method that seeks to the next prefix key and a different method that seeks for a match on the predicate, where the former can be used for both the upper and lower bound search, and the latter is only used for the lower bound search. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@516 PS7, Line 516: 1024 according to https://kudu.apache.org/docs/known_issues.html#_primary_keys I think this should be 16KB, so 16 * 1024, and we should use a constant for that value at the top of this file (inside the namespace), perhaps: static const int kPrimaryKeyMaxSize = 16 * 1024; http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@518 PS7, Line 518: // Convert current key slice to encoded key. : RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), &arena, : Slice(key_iter_->GetCurrentValue()), &enc_key), "Invalid scan prefix key"); This seems to be repeated in a couple of places. How about we factor this out into its own function, like: Status CFileSet::Iterator::DecodeCurrentKey(Arena* arena, gscoped_ptr<EncodedKey>* enc_key) { RETURN_NOT_OK_PREPEND(EncodedKey::DecodeEncodedString( base_data_->tablet_schema(), arena, key_iter_->GetCurrentValue(), enc_key), "Failed to decode current value from primary key index"); return Status::OK(); } So that we can call it here and elsewhere like: Arena arena(...); gscoped_ptr<EncodedKey> enc_key; RETURN_NOT_OK(DecodeCurrentKey(&arena, &enc_key)); http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@533 PS7, Line 533: // Set the predicate column with the predicate value. : // This is done to avoid overshooting the lower bound on the predicate value : // in the current scan range. : KuduPartialRow partial_row(&(base_data_->tablet_schema())); : gscoped_ptr<EncodedKey> key_with_pred_value = : GetKeyWithPredicateVal(&partial_row, enc_key.Pass()); : : return key_iter_->SeekAtOrAfter(*key_with_pred_value, &exact_match, /* set_current_value= */ true); Like I noted above, I think this part should be factored into its own function, like: // Seek to the next predicate match within the current prefix. Status CFileSet::Iterator::SeekToNextPredicateMatchCurPrefix(size_t num_prefix_cols) { gscoped_ptr<EncodedKey> enc_key; Arena arena(1024); RETURN_NOT_OK(DecodeCurrentKey(&arena, &enc_key)); // Check to see if the current key matches the predicate value. If so, then // there is no need to seek forward. if (CheckPredicateMatch(enc_key)) { return Status::OK(); } // If we got this far, the current key doesn't match the predicate, so search // for the next key that matches the current prefix and predicate. KuduPartialRow partial_row(&(base_data_->tablet_schema())); gscoped_ptr<EncodedKey> key_with_pred_value = GetKeyWithPredicateVal(&partial_row, enc_key); return key_iter_->SeekAtOrAfter(*key_with_pred_value, /* exact_match= */ nullptr, /* set_current_value= */ true); } http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@579 PS7, Line 579: bool CFileSet::Iterator::CheckKeyMatch(const std::vector<const void *> &raw_keys, int start_col_id, int end_col_id) { : // Encode the key currently pointed to by validx_iter_. : Arena arena2(1024); : gscoped_ptr<EncodedKey> cur_enc_key; : EncodedKey::DecodeEncodedString( : base_data_->tablet_schema(), &arena2, : Slice(key_iter_->GetCurrentValue()), &cur_enc_key); : : for (int col_id = start_col_id; col_id <= end_col_id; col_id++) { : if (base_data_->tablet_schema().column(col_id).Stringify(raw_keys[col_id]) != : base_data_->tablet_schema().column(col_id).Stringify(cur_enc_key->raw_keys()[col_id])) { : return false; : } : } : return true; : } Instead of checking that the key matches, why not just check that the predicates match the given key? Ex: // Returns true if the given encoded key matches the skip scan predicate. bool CFileSet::Iterator::CheckPredicateMatch(const gscoped_ptr<EncodedKey>& enc_key) const { return base_data_->tablet_schema().column(skip_scan_predicate_column_id_).Compare( skip_scan_predicate_value_, enc_key->raw_keys()[skip_scan_predicate_column_id_]) == 0; } http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@596 PS7, Line 596: // This is a three seek approach for index skip scan implementation: : // 1. Place the validx_iter_ at the entry containing the next : // unique "prefix_key" value. : // 2. Place the iterator at or after the entry formed by the "prefix_key" : // found in 1. and the predicate value. : // 3. If the seek in 2. results in exact match with the predicate value, : // store the row id that represents the last relevant entry (upper bound) wrt the : // current "prefix_key"(found in 1.) move this inside the function to line 605, since it describes what we are doing inside the function and this is not a header file http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@609 PS7, Line 609: bool lower_bound_key_found; : bool predicate_match = false; can we merge these two variables into the single variable lower_bound_key_found? http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@614 PS7, Line 614: // Whether to seek for the next unique prefix, this flag is false : // when the prefix key gets incremented while seeking for the lower bound : // entry in the loop. : bool continue_seeking_next_prefix = true; I think it would be more descriptive of the reasoning behind the program logic to rename this variable like so: // Whether the lower bound prefix key rolled over between the first and // second seek to determine the lower bound key. bool prefix_key_rolled = false; http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@621 PS7, Line 621: while (!predicate_match) { how about: for (int loop_num = 0; !lower_bound_key_found && loop_num < FLAGS_skip_scan_short_circuit_loops; loop_num++) { http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@678 PS7, Line 678: add: CHECK_OK(s) to ensure we didn't get something unexpected in 's' http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@681 PS7, Line 681: CheckKeyMatch Why do we care if we have an exact key match? Don't we only care if we have a predicate match here? http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@684 PS7, Line 684: // We didn't find an exact match for our lower bound, so re-seek. : if (!predicate_match) { : // If the prefix key has changed already, do not seek for the next prefix : // in the next iteration. : if ( !CheckKeyMatch(key_with_pred_value->raw_keys(), : 0, skip_scan_predicate_column_id_ - 1)) { : continue_seeking_next_prefix = false; : } : continue; : } I think this could be equivalently written as: // We weren't able to find a predicate match for our lower bound key, so loop and search again. if (!lower_bound_key_found) { // If the prefix key rolled between our initial lower bound next prefix // seek and our seek to the predicate match with that prefix, it's // possible that the latest prefix will have a predicate match, so on our // next iteration of the loop, we should not seek to the next prefix. if (!KeyColumnsMatch(next_prefix_key, lower_bound_key, /* start_col_id= */ 0, /* end_col_id= */ skip_scan_predicate_column_id_ - 1)) { prefix_key_rolled = true; } continue; } Where KeyColumnsMatch looks like: // Check if the column values in the range corresponding to the given // inclusive column id range [start_col_id, end_col_id] are equal between the // two given keys. bool CFileSet::Iterator::KeyColumnsMatch(const gscoped_ptr<EncodedKey>& key1, const gscoped_ptr<EncodedKey>& key2, int start_col_id, int end_col_id) const { const auto& schema = base_data_->tablet_schema(); for (int col_id = start_col_id; col_id <= end_col_id; col_id++) { if (schema.column(col_id).Compare(key1->raw_keys()[col_id], key2->raw_keys()[col_id]) != 0) { return false; } } return true; } Because we can easily save the results (current key) of our two lower bound key seeks and then avoid lots of additional encoding by just passing them in here at the call site. http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@703 PS7, Line 703: // We add 1 here because we want to increment the value of a prefix that : // includes even our predicate column. : // Note: After this step, the current value pointed by validx_iter_ is an : // exclusive upper bound on our scan range, therefore, key_iter->cur_val_ is not updated. : // to avoid skipping over this current "prefix_key" in the next iteration. : s = SeekToNextPrefixKey(skip_scan_predicate_column_id_+1, /* seek_to_upper_bound_key */ true); I think it would be easier to follow the logic if this said something more along the lines of: int num_prefix_cols_including_predicate = skip_scan_predicate_column_id_ + 1; // Do not cache the seeked value for the upper bound because the next time // we seek for the lower bound we want to be able to get a match on the // previous upper bound -- it's possible that it has a predicate match. s = SeekToNextPrefixKey(num_prefix_cols_including_predicate, /* cache_seeked_value=*/ false); http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@723 PS7, Line 723: continue remove (doesn't do anything at the end of the loop) http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@730 PS7, Line 730: TODO change this to TODO(anupama) http://gerrit.cloudera.org:8080/#/c/10983/7/src/kudu/tablet/cfile_set.cc@749 PS7, Line 749: (static_cast<int>(cur_idx_) >= skip_scan_upper_bound_idx_) Move the check for this boundary condition inside SkipToNextScan() to encapsulate the use of the skip_scan_upper_bound_idx_ variable into a single method instead of having to do checks for it so high up in the program logic and having to worry about this condition being part of the preconditions imposed upon the caller of the method and therefore part of its API contract. So at the top of SkipToNextScan() you can just do: // Keep scanning if we're still in the range that needs scanning from our // previous seek. if (cur_idx_ < skip_scan_upper_bound_idx_) { *remaining = std::max<int64_t>(skip_scan_upper_bound_idx_ - cur_idx_, 1); return; } -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 7 Gerrit-Owner: Anupama Gupta <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Anupama Gupta <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sun, 05 Aug 2018 17:49:45 +0000 Gerrit-HasComments: Yes
