Todd Lipcon has posted comments on this change. Change subject: Predicate evaluation pushdown ......................................................................
Patch Set 7: (31 comments) http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 306: Status BinaryPlainBlockDecoder::CopyNextValues(size_t *n, ColumnDataView *dst) { I wonder whether this function could share code with the new one using some templating (to force that they're separately instantiated) http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/binary_prefix_block.h File src/kudu/cfile/binary_prefix_block.h: Line 99: cur_idx_ += *n; surprised this doesn't need to update cur_val_ and next_ptr_ http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 488: ctx->SetDecoderEvalNotSupported(); shouldn't this be done in PrepareEvalContext? PS7, Line 489: ctx->block() given the number of times you use ctx->block() in this function I think it's worth just doing: ColumnBlock* dst = ctx->block() http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: Line 355: // Sets context if needed. This must happen before a call to no need for this line (since it's already described in the interface above) PS7, Line 390: ; nit: space after ; PS7, Line 463: or>cod nit: space after > Line 480: // Predicate and selection vector associated with the iterator would this ever be null? worth a comment explaining when it might be http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 190: void SeekForward(size_t *n) override { hrm, I'm skeptical of this function, I'd think it would need to update the rle_decoder_ as well. Maybe update encoding-test so that this is covered? Line 413: void SeekForward(size_t *n) override { same http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/column_eval_context.h File src/kudu/common/column_eval_context.h: Line 26: // get passed down to the decoders during predicate push-down. this is now more generally the set of objects passed down to the column iterator and decoder during column materialization, right? Line 31: public: nit: indent one Line 43: // Column index in the parent CFileSet. I think this is incorrect and it's actually the index within the projection schema, rather than within the CFileSet. Perhaps we should call it projection_idx or something to clarify? Line 51: // I.e. BinaryDictBlockDecoder.CopyNextAndEval(), CFileIterator.Scan() not quite following this comment. when is this used outside of scans? Line 54: // Selection vector reflecting the result of the predicate evaluation. should clarify whether this is an "out" or an "in-out" type parameter. In other words, if the scanner sees that a column is already non-selected in here, is it free to skip reading that row? I'm guessing so, but good to clarify. Line 57: // Must be greater than size 0 and must be large enough to cover the number isn't "greater than size 0" redundant with "large enough to cover the number of rows"? Line 63: // materializing_iterator_decoder_eval to false, forcing evaluation to fall I think this comment needs an update Line 74: if (decoder_eval_status_ == kNotSet) { it seems weird that this function is a silent no-op in the case that it's already been set. If it's set in a "conflicting" manner, what happens? seems like it should either be a DCHECK, or by a 'TrySetDecoderEvalSupported' type function that returns a bool about whether it successfully changed modes or not. Line 79: void SetDecoderEvalNotSupported() { particularly seems like here it should be a CHECK that it's kNotSet. Otherwise you might try to switch from supported to non-supported, and only partially have filled in the selection vector http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/generic_iterators.cc File src/kudu/common/generic_iterators.cc: Line 26: #include "kudu/common/column_eval_context.h" nit: sort the includes PS7, Line 522: = ColumnEvalContext can just be: ColumnEvalContext ctx(...) instead of the extra temporary construction and assignment Also, given that we now use ColumnEvalContext for all materialization, and not just the one with predicate evaluation, we should rename to ColumnMaterializationContext Line 526: if (ctx.pred()->predicate_type() == PredicateType::None) { does a None predicate ever make it this far into the code? I would think that the None would be handled way up above this layer (and this could be a DCHECK) Line 549: ColumnEvalContext ctx = ColumnEvalContext(col_idx, same Line 553: ctx.SetDecoderEvalNotSupported(); why's this necessary? shouldn't the nullptr predicate be sufficient for the code to know it doesn't need to evaluate any predicate? http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/generic_iterators.h File src/kudu/common/generic_iterators.h: Line 28: #include "kudu/common/column_predicate.h" this new include isn't used in this header file (include it in the .cc if it's only used in the .cc) http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/iterator.h File src/kudu/common/iterator.h: Line 97: // col_idx is within the projection schema, not the underlying schema. this comment probably needs to move to the context's constructor (since 'col_idx' is no longer a parameter here) http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/rowblock.h File src/kudu/common/rowblock.h: PS7, Line 121: The nit: "A SelectionVectorView" (rather than 'the', because there are probably many such instances) http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/types.h File src/kudu/common/types.h: Line 399 nit: removed a blank line here http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/tablet/cfile_set-test.cc File src/kudu/tablet/cfile_set-test.cc: Line 112: ctx.SetDecoderEvalNotSupported(); same comment here as elsewhere -- if there's no predicate we shouldn't have to do this, right? Line 115: private: nit: add blank line before 'private:' http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/tablet/tablet-test-util.h File src/kudu/tablet/tablet-test-util.h: Line 146: int limit = INT_MAX) { 'limit' is never used, right? -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <andrew.w...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes