Andrew Wong has posted comments on this change.

Change subject: Predicate evaluation pushdown

Patch Set 7:

Commit Message:

Line 16:
> Nit: it would be nice to have the document in the Kudu source tree when it'
Will do.

Line 20:
> Ditto.
File src/kudu/benchmarks/tpch/

Line 87: DEFINE_string(tpch_path_to_dbgen_dir, ".",
> This does not look as a good default.  I think ".", "./tpch" or "./tpch-dbg
Done, this was not meant to be in the patch.
File src/kudu/cfile/

Line 252: // store their min/max values. CopyNextAndEval in these blocks could
> Ditto for the asterisk (star) as for the corresponding header file.

Line 259:   ctx->SetDecoderEvalSupported();
> Is this still relevant?  I think you have implemented CopyNextAndEval for p
This could be extended to other blocks. If, for instance, in integer blocks, we 
begin storing the the min and max of the block, we could either short-circuit 
or evaluate slice-by-slice.

Right now, anything that doesn't support decoder-level evaluation will just set 
eval_complete to false and go down the copy&evaluate path.

Line 264: 
> terminate all comments with a period.

Line 289:     // been cleared, in which case we can skip evaluation.
> nit: if I'm not mistaken, the style guide mentions if/else should be format
File src/kudu/cfile/

Line 191:       dict_decoder_(iter->GetDictDecoder()),
> Why not to move those into the initializer list?
File src/kudu/cfile/binary_dict_block.h:

Line 131:                          ColumnEvalContext *ctx,
> Nit: it seems in the majority of the code in this file the asterisk tends t

Line 140:     data_decoder_->SeekForward(n);
> drop 'virtual' and replace OVERRIDE with override, here and above.
This didn't get picked up by the last patch, but it will be fixed.
File src/kudu/cfile/binary_dict_block.h:

Line 131:                          ColumnEvalContext *ctx,
> Style nit: the rest of this file uses the "asterisk is closer to the type" 
File src/kudu/cfile/

Line 290:   Slice *out = reinterpret_cast<Slice *>(dst->data());
> If the 'i' variable is not needed outside of the cycle, why not to move it 
File src/kudu/cfile/

Line 306: Status BinaryPlainBlockDecoder::CopyNextValues(size_t *n, 
ColumnDataView *dst) {
> I wonder whether this function could share code with the new one using some
File src/kudu/cfile/binary_plain_block.h:

Line 110: 
> What if SeekForward returned OK only if it moved forward at the specified n
Fair point, may be confusing. An "incomplete" scan isn't an error, but rather 
it means we've reached the end of the decoder.
The status is not used, so it makes more sense for this to be void. 
Alternatively, the if HasNext() is false, it should return an error status and 
OK() otherwise.
File src/kudu/cfile/binary_plain_block.h:

Line 111:   void SeekForward(size_t *n) override {
> Does it make sense to implement this method in the base class instead?
I agree it'd be nice, but I'm hesitant since they are dependent on private 
members that may not be defined for all block encodings (e.g. dict blocks don't 
have a cur_idx_ member, but it is used in SeekForward).
We could just define these members in the base class and just leave them as 
unused when they aren't used, something to consider, definitely.
Edit: yes this can be done by using Count() and GetCurrentIdx(), but the issue 
is still that there needs to be functions that can modify cur_idx_, which is 
not a member of all blockdecoders.

Line 112:     DCHECK(HasNext());
> Why not to return non-OK status in that case instead?  I.e., always assign 
It isn't an error to not seek forward the specified number of positions. This 
case occurs when the batch is started near the end of a decoder. The batch may 
request 1000 rows, but the decoder itself may only have 100 left, in which case 
*n will be set to 100. This is the contract for CopyNextValues() as well.
File src/kudu/cfile/block_encodings.h:

PS2, Line 150: Fetch the next values from the block a
> I think this should be up to n, right?  That's at least the contract of Cop
You're right. *n should be set to the largest value <= to itself without going 
past the end of the block. The contract should be the same as CopyNextValues.
File src/kudu/cfile/block_encodings.h:

Line 162:                                  ColumnDataView *dst) {
> What if CopyNextValues() returned non-OK code?  Would it make sense to chec
Done. Should be wrapped with RETURN_NOT_OK()
File src/kudu/cfile/block_encodings.h:

Line 138:   virtual void SeekForward(size_t *n) = 0;
> I think it'd be better to use 'int' here, so you can DCHECK_GE(*n, 0) at th
I see, just to clarify, this would be used to detect cases where *n is too 
large and overflows.

Line 157:   // function is that the context is marked as supporting decoder 
eval. This
> I think there's a subtle requirement here that, given a particular encoding
Right, the current contract, although not explicitly enforced right now, is 
that all blocks of a block encoding will always set eval_complete to true or 
false. Enum state should work.

Line 162:                                  ColumnDataView *dst) {
File src/kudu/cfile/bshuf_block.h:

Line 278:   void SeekForward(size_t *n) override {
> Consider returning non-OK status code (like Status::Incomplete()) instead o
See comments in other implementations of SeekForward.
File src/kudu/cfile/

Line 728:       if (ctx_->pred()->predicate_type() == PredicateType::None) {
> wrap comments at 80 columns if possible (100 max).
File src/kudu/cfile/

Line 670:     }
> typo

Line 728:       if (ctx_->pred()->predicate_type() == PredicateType::None) {
> I think a getter like 'ctx_->try_decoder_predicate_evaluation()' would be a

Line 733:         if (ctx_->pred()->EvaluateCell(static_cast<const void 
*>(&cur_string))) {
> would rather add a wrapper in the test code instead of adding a compatibili

Line 747: }
> should remove this

Line 977: 
> again a getter would centralize these checks and make the code more readabl
File src/kudu/cfile/

Line 488:   ctx->SetDecoderEvalNotSupported();
> shouldn't this be done in PrepareEvalContext?
Done. This was put here to prevent Scan from being called before 
SetDecoderEvalNotSupported was called. This only happened in cfile-test, but 
can definitely be avoided in test rather than in code.

PS7, Line 489: ctx->block()
> given the number of times you use ctx->block() in this function I think it'
File src/kudu/cfile/cfile_reader.h:

PS2, Line 385: dicate o
> predicate

Line 387:   // is shared among the multiple BinaryDictBlockDecoders in a single 
> This seems like a strange place to put this method; why isn't it done by th
A single vocabulary is shared among the entire cfile, which may contain 
multiple dictionary blocks. This means the predicate set itself is owned by the 
cfile, not the decoders, and this provides the decoders a way to access it.
File src/kudu/cfile/cfile_reader.h:

Line 24: #include "kudu/common/columnblock.h"
> forward decl is sufficient
There seem to be a fair amount of dependencies that get pretty hairy when I 
forward declare this.

Line 222:   // Get the ordinal index that the iterator is currently pointed to.
> would be nice to avoid this "duplicate API" thing where one set of things g

Line 261:   virtual Status FinishBatch() = 0;
> see above, would be good to collapse the two APIs

Line 394: 
> this isn't returning a set of predicates, though -- it's returning a set of

Line 395:   struct PreparedBlock {
> should rename to something more specifically about dictionary... maybe GetC
Right, my thinking was it is a set of everything that satisfies the predicate. 
I see your point, will change.

Line 468: 
> rename (per above)
File src/kudu/cfile/cfile_reader.h:

PS7, Line 390: ;
> nit: space after ;

PS7, Line 463: or>cod
> nit: space after >
File src/kudu/cfile/plain_block.h:

Line 211:   virtual void SeekForward(size_t *n) OVERRIDE {
> Does it make sense to implement this method in the base class instead?
See replies in other implementations of SeekForward. The base abstract class 
does not have the right private members to implement this. Am considering 
exposing a public API for these members.
File src/kudu/common/column_eval_context.h:

PS2, Line 17: 
> The style guide forbids non-const reference args and fields.  This is a lit
eval_complete is a flag that gets set at the decoder level to indicate that 
decoder-level evaluation is supported by the block. It tells the materializing 
iterator that the decoder is only smart enough to copy values and that it still 
needs to evaluate each cell.

It could take a pointer though, which would have the same functionality and be 

PS2, Line 24: 
> ampersand should be with the type, same with the pointer * below.  I think 
File src/kudu/common/column_eval_context.h:

Line 36:     : col_idx_(col_idx),
> indent colon 4 spaces, and wrap list at commas
File src/kudu/common/column_eval_context.h:

Line 43:   // Column index in the parent CFileSet.
> I think this is incorrect and it's actually the index within the projection

Line 57:   // Must be greater than size 0 and must be large enough to cover the 
> isn't "greater than size 0" redundant with "large enough to cover the numbe

Line 79:   void SetDecoderEvalNotSupported() {
> particularly seems like here it should be a CHECK that it's kNotSet. Otherw
File src/kudu/common/

Line 270: 
> move this above the anonymous namespace above, the function defined in it i

Line 277:   // Going a step further we could do runtime codegen to inline the
> The IsNotNull and None branches shouldn't be necessary given the prior chec
File src/kudu/common/column_predicate.h:

Line 130:   //   selection vector to 0.
> I think this is going to be really slow, because it forces a bunch of branc
File src/kudu/common/column_predicate.h:

Line 128:   // This is evaluated as an 'AND' with the current contents of *sel:
> this comment can be more general -- i.e "when the predicate is based on a c

Line 129:   // - If the predicate evaluates to false, sets the appropriate bit 
in the
> nit: "should not" is vague. "Do not"? What happens if they do? It seems fun
Right, I had written this comment and then changed the contract. This comment 
is deprecated.

Line 130:   //   selection vector to 0.
> should clarify that this is a physical_type (same below)
File src/kudu/common/

PS7, Line 522:  = ColumnEvalContext
> can just be:

Line 526:     if (ctx.pred()->predicate_type() == PredicateType::None) {
> does a None predicate ever make it this far into the code? I would think th
Put this here since my unit tests cover None operations, but yes, in a real 
scan, this won't be reached.

Line 549:     ColumnEvalContext ctx = ColumnEvalContext(col_idx,
> same

Line 553:     ctx.SetDecoderEvalNotSupported();
> why's this necessary? shouldn't the nullptr predicate be sufficient for the
File src/kudu/common/generic_iterators.h:

Line 28: #include "kudu/common/column_predicate.h"
> this new include isn't used in this header file (include it in the .cc if i
File src/kudu/common/iterator.h:

Line 24: #include "kudu/common/rowblock.h"
> I think you can get away with forward decls here

Line 102: 
> is this still needed? It seems like the code is now calling the other overl

Line 103:   // Finish the current batch.
> need doc.
File src/kudu/common/iterator.h:

Line 35: class ScanSpec;
> keep this list sorted.
File src/kudu/common/rowblock.h:

Line 123: // and the view will move forward by the appropriate amount. In this 
way, the
> It's not clear why it would be necessary to have this wrapper for nullptr S

Line 125: class SelectionVectorView {
> Nit: the check for sel_vec_ is already performed by the Advance() method.

Line 133:     row_offset_ += skip;
> Why to advance the row_offset_ if there is no underlying data?  Please add 
There are some underlying structural issues that are being addressed. These all 
stem from an issue where the scan path diverges depending on the 
decoder-evaluation-support. Working on merging these paths, so should be able 
to avoid these.

Line 146:   }
> Nit: tab/shift is missing
File src/kudu/common/rowblock.h:

Line 121: // The SelectionVectorView keeps track of where in the selection 
vector a given
> doc

PS4, Line 137: + r
> nit: space around ' + ' (here and below)

Line 141:     BitmapClear(sel_vec_->mutable_bitmap(), row_offset_ + row_idx);
> all these if()s seem expensive as these are hot-path functions. Maybe we ca
The null selection vector is a result of there being two paths down which a 
scan can go. Collapsing them should solve this.

Line 168: // of the latter doesn't mean you _should_.
> SelectionVector * const sel_vec_;
File src/kudu/common/rowblock.h:

Line 132:     DCHECK_LE(skip, sel_vec_->nrows() - row_offset_);
> seems to me that these checks should take into account row_offset_
Right, should be comparing row_offset_+row_idx and nrows.

PS6, Line 154: 
> put * with type here and below

Line 155: 
> This seems a little strange, I would have expected the method to return the
I didn't want to return a pointer to an off-word address, but I agree that the 
whole point of this class is to keep an moving window.
Regardless, I think we can get by by just having the clear/set functions. I 
don't see a case where we would want the actual view into the bitmap for any 
other reason than to set or clear. Even now, SetBit is unused, but I've kept it 
in since I think it will be useful in future features (e.g. OR evaluation).
File src/kudu/common/types.h:

PS3, Line 59: bool AreConsecutive(const void* a, const voi
> Consider introducing typedef for this functor.
This should not have been in this patch.
File src/kudu/tablet/CMakeLists.txt:

Line 99: ADD_KUDU_TEST(tablet-decoder-eval-test)
Not included in the patch.
File src/kudu/tablet/

Line 105:   }
> Insert a space above.

PS6, Line 106: 
> I think this names a little confusing, since the wrapping and context are a

PS6, Line 107: ator *iter,
             :                            size_t col_idx,
> wrapping
File src/kudu/tablet/

Line 112:     ctx.SetDecoderEvalNotSupported();
> same comment here as elsewhere -- if there's no predicate we shouldn't have

Line 115:  private:
> nit: add blank line before 'private:'
File src/kudu/tablet/

Line 453:       Status s = col_iters_[i]->FinishBatch();
> nit: why WARNING, not ERROR?
I don't think not preparing the column would be a fatal error (it wouldn't 
cause a crash), so WARNING feels right. Other areas in this code use 
LOG(WARNING) as well.
File src/kudu/tablet/delta_store.h:

Line 176:   // Returns true if there might exist deltas to be applied. It is 
safe to
> typo: "there there"
File src/kudu/tablet/

Line 821:   // TODO: change the API to take in the col_to_apply and check for 
deltas on
> I think this could be better done by:
By the time this is called on the scan path, PrepareBatch() should indeed have 
already been called, although it would be good to do this check anyway.
DCHECK(prepared_) should suffice for the third bullet, since prepared_=true 
implies that PrepareBatch() has just been called, so the blocks should be 
popped to the point where we're within the current prepared batch.
As I understand it, if all of the delta blocks are past the end of the decoder, 
or if there are no blocks, ApplyUpdates should be a no-op.
File src/kudu/tablet/

Line 368:     return true;
> Consider using empty() method instead.

Line 372:       return true;
> Consider using empty() method instead of size().
File src/kudu/tablet/

Line 474: 
> Why is it necessary to have this alias?
It was there because there is a divergence in the scan-path whether or not you 
want the decoder-level evaluation or not. I am working on merging these paths 
so only one is needed (and PredPushedNextBlock will be taken out in favor of 
NextBlock with some flag)
File src/kudu/tablet/

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> nit: license header

PS4, Line 92: 
> Is this passed by a reference and then modified by the function?  If yes, t

PS4, Line 100: 
             :   void FillTestTablet(size_t cardinality, size_t strlen) {
             :     if (GetParam() == LARGE && !AllowSlowTests()) {
             :       return;
             :     }
             :     size_t nrows = static_cast<size_t>(GetParam());
             :     RowBuilder rb(client_schema_);
             :     LocalTabletWriter writer(tablet().get(), &client_schema_);
             :     KuduPartialRow row(&client_schema_);
             :     // Fill tablet with repeated pattern of strings.
             :     // e.g. Cardinality: 3, strlen: 2: "00", "01", "02", "00", 
"01", "02", ...
             :     for (int64_t i = 0; i < nrows; i++) {
             :       CHECK_OK(row.SetInt32(0, i));
             :       CHECK_OK(row.SetInt32(1, i * 10));
             :       CHECK_OK(row.SetStringCopy(2, StringPrintf(
             :               Substitute("%0$0$1", strlen, PRId64).c_str(),
             :               i%cardinality)));
> Is it possible to re-factor this separating the common code into a class me
File src/kudu/tablet/

Line 35: #else
Can be shortened.

Line 44: 
Not used.

PS6, Line 89: per) {
> here and below, prefer to use stings::Substitute instead of StringPrintf.  
Looks like Substitute can't specify the a string length in formatting, but at 
least it can still get away from doing concat and to_string.

Line 96:       last_chunk_count = upper - lower;
> remove?

PS6, Line 128: 
> const probably isn't necessary?
File src/kudu/tablet/tablet-test-util.h:

PS4, Line 144: inline
> Why is it necessary to have this function in-lined?  Just because there isn
This is used to benchmark the times, so to get the behavior of just the code in 
this function, it makes sense for this to be inlined.

PS4, Line 145: int* fetched
> Should be passed by pointer, if following google's style guide: https://goo
File src/kudu/tablet/tablet-test-util.h:

Line 146:                                                int limit = INT_MAX) {
> 'limit' is never used, right?
No, removed.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <>
Gerrit-Reviewer: Alexey Serbin <>
Gerrit-Reviewer: Andrew Wong
Gerrit-Reviewer: Dan Burkert <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to