Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 15: (16 comments) http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@792 PS14, Line 792: // Currently this block holds encoded composite key. : Arena arena(16*1024); : : DCHECK(!prepared_blocks_.empty()); : nit: As a general rule of thumb, if something is indicative of a programmer error, we'll add a CHECK or DCHECK, whereas if there's an issue with the underlying data, we'll return an error. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@797 PS14, Line 797: us nit: I know some of the older code has this reversed, but for new code, we favor placing the sigil immediately after the var name (PreparedBlock* pblk), here and elsewhere. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/cfile/cfile_reader.cc@871 PS14, Line 871: last_prepare_idx_ = b->first_row_idx() + b->dblk_->GetCurrentIndex(); : last_prepare_count_ = 0; : : p It's worth documenting in the header the side effect of `cache_seeked_value`, i.e. if it is set to true, this call to StoreCurrentValue() will seek an extra row, right? Another approach I chatted with you about before was that maybe we can avoid pushing this bookkeeping so low by slightly adjusting the APIs: Status SeekAtOrAfter(const EncodedKey& encoded_key, bool* exact_match); // Same behavior as before this patch. Status ConsumeNextValue(string* val); // Combines StoreCurrentValue() and GetCurrentValue() Then, callers (i.e. CFileSet::Iterator) could use key_iter->ConsumeNextValue() immediately after SeekAtOrAfter() instead of doing SeekAtOrAfter(/* cache_seeked_values=*/true), the CFileIterator::cur_val_ member could be moved to the CFileSet::Iterator with a more telling name, like 'skip_scan_cur_key_' or somesuch, and we wouldn't have to have caveats and comments explaining the restricted access to cur_val_. Might be a pretty substantial change, but it might with the broken encapsulation. WDYT? http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc File src/kudu/common/encoded_key.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/common/encoded_key.cc@92 PS14, Line 92: gscoped_ptr<EncodedKey> *key, : Arena* arena, int32_t num_columns Per the GSG (https://google.github.io/styleguide/cppguide.html): nit: spacing nit: output parameters should come after input parameters http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: PS14: nit: s/unique prefix/distinct prefix http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@174 PS14, Line 174: // : // Due to the caveat(see below) listed for SkipToNextScan(size_t *remaining), : // this class should not reference a separate "client" class that uses key_iter->cur_val. Hmm, I'm not sure what this means. See my comment in cfile_reader.cc re: an alternate approach. Also nit: add a space in "caveat (see" http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@208 PS14, Line 208: : // Decode the currently-seeked key into 'enc_key'. : Status Before getting into the nitty gritty of each method, I think it'd be helpful to start out by adding a block comment with: - a brief, high-level overview of what a skip scan is - definitions of "predicate column", "prefix columns", "prefix keys", in the context of a skip scan - definitions of a lower and upper bound in the context of a skip scan, and how they're used when seeking With the definitions in one place, it'd be easier to read about, and you wouldn't have to sprinkle definitions throughout the below comments. You've already got a good base at L238. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@209 PS14, Line 209: // Decode the currently-seeked key into 'enc_key'. : Status DecodeCurrentKey(Arena* arena, gscoped_ptr<EncodedKey>* enc_key); : : // This function is used to place the validx_iter_ at the next greater prefix key. : // prefix key refers to the first "num_prefix_cols" columns of the current key. : // current key is the key currently pointed to by validx_iter_ (prior to calling : // this function). : // If 'cache_seeked_value' is true, the validx_iter_ will store the value seeked to. : Status SeekToNextPrefixKey(size_t num_prefix_cols, bool cache_seeked_value); : : // Seek to the next predicate match within the current prefix. : Status SeekToRowWithCurPrefixMatchingPred(const gscoped_ptr<EncodedKey>& enc_key); : : // Build the key with the same prefix as 'cur_enc_key', that has : // 'skip_scan_predicate_value_' in its predicate column, : // and the minimum possible value for all other columns. : gscoped_ptr<EncodedKey> GetKeyWithPredicateVal(KuduPartialRow *p_row, : const gscoped_ptr<EncodedKey>& cur_enc_key); : : // Returns true if the given encoded key matches the skip scan predicate. : bool CheckPredicateMatch(const gscoped_ptr<EncodedKey>& enc_key) const; : : // Check if the column values in the range corresponding to the given : // inclusive column id range [start_col_id, end_col_id] are equal between the : // two given keys. : bool KeyColumnsMatch(const gscoped_ptr<EncodedKey>& key1, : const gscoped_ptr<EncodedKey>& key2, : int start_col_id, int end_col_id) const; : : // This method implements a "skip-scan" optimization, allowing a scan to use : // the primary key index to efficiently seek to matching rows where there are : // predicates on compound key columns that do not necessarily include the : // leading column of the primary key. At the time of writing, only a single : // equality predicate is supported, although the algorithm can support ranges : // of values. : // : // This method should be invoked during the PrepareBatch() phase of the row : // iterator lifecycle. : // : // Caveat: : // This method assumes exclusive access to key_iter->cur_val_. : // : // The in-out parameter 'remaining' refers to the number of rows remaining to : // scan. When this method is invoked, 'remaining' should contain the maximum : // number of remaining rows available to scan. Once this method returns, : // 'remaining' will contain the number of rows to scan to consume the : // available matching rows according to the equality predicate. Note: : // 'remaining' will always be at least 1, although it is a TODO to allow it : // to be 0 (0 violates CHECK conditions elsewhere in the scan code). : // : // Currently, skip scan will be dynamically disabled when the number of seeks : // for unique prefix keys exceeds sqrt(#total rows). We use sqrt(#total rows) : // as a cutoff because based on performance tests on upto 10M rows per tablet, : // the scan time for skip scan is the same as that of the current flow until : // #seeks = sqrt(#total_rows). Further increase in #seeks leads to a drop in : // skip scan performance wrt the current flow. This cutoff value is stored in : // 'skip_scan_num_seeks_cutoff'. : : // Preconditions upon entering this method: : // * key_iter_ is not NULL. : // : // Postconditions upon exiting this method: : // 1. cur_idx_ is updated to the row_id of the next row(containing the next : // higher unique prefix) to scan. : // 2. 'remaining' stores the number the entries to be scanned in the current scan range. : // : // See the .cc file for details on the approach and the implementation. : Status SkipToNextScan(size_t *remaining); : These could all be private, right? http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@299 PS14, Line 299: key(PK) nit: add a space http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@301 PS14, Line 301: predicate_column" nit: remove underscore http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.h@362 PS14, Line 362: int64_t skip_scan_num_seeks_cutoff; : nit: follow private members with an underscore http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/12/src/kudu/tablet/cfile_set.cc@551 PS12, Line 551: "Failed to decode current value from primary key index"); : return Status::OK(); : } > Noted. I will have to look more into this. Might be helpful: from util/memory/arena.h and the definition of Reset(): // Unless allocations exceed max_buffer_size, repetitive filling up and // resetting normally lead to quickly settling memory footprint and ceasing // buffer allocations, as the arena keeps reusing a single, large buffer. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@562 PS14, Line 562: RETURN_NOT_OK(EncodedKey::IncrementEncodedKeyColumns( : base_data_->tablet_schema(), : &enc_key, &arena, num_prefix_cols)); : : if (cache_seeked_value) { : // Set the predicate column to the predicate value in case we can find a : // predicate match in one search. As a side effect, GetKeyWithPredicateVal() : Please document in the .h that if `cache_seeked_value` is true, this will search for a different value than if it were not. Is that the intended API? http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@593 PS14, Line 593: /* cache_seeked_value= */ true, : /* exact_match= */ nullptr); Please sprinkle a couple TODOs about how to generalize this patch. E.g.: TODO(anupama): to support in-range predicates, generalize this to return a key with the same prefix as cur_enc_key, a predicate column populated with the lower bound of the predicate values, and the minimum value for all other columns. http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@680 PS14, Line 680: !lower_bound_key_found && loop_num < FLAGS_skip_scan_short_circuit_loops; : loop_num++) { : DCHECK_LT(cur_idx_, skip_scan_upper_bound_idx_); nit: since the steps are described in this method, we can probably get away with removing "in three seek approach", here and below. Also probably don't need the extra lines of "/////////////..." http://gerrit.cloudera.org:8080/#/c/10983/14/src/kudu/tablet/cfile_set.cc@770 PS14, Line 770: reak; How does this happen? Could you elaborate a bit? What does seeking backwards mean? -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 15 Gerrit-Owner: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Anupama Gupta <anupama.gu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 15 Aug 2018 06:04:58 +0000 Gerrit-HasComments: Yes