Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext ......................................................................
Patch Set 14: (70 comments) http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 93: RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), state)); > build_expr is leaked if this returns. Good catch. Fixed here and everywhere. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: > This all works out a lot cleaner - nice! Done Line 124: RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i], > I think there's a potential resource leak if the expr Init() function fails Done Line 166: fn_pool_.reset(new MemPool(expr_mem_tracker()));; > extra semicolon Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 181: // TODO: codegen table sink > I'm not sure what this means. Is it something to be fixed in this patch? Ot I think this is IMPALA-4356. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 126: /// Not used in DataStreamSender and PHJBuilder. > Or NljBuilder. Maybe easier to just say that they're not used in all subcla Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: PS14, Line 451: inline > Is the inline needed here? The inline hint is for inlining into EvalConjuncts(). http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 261: /// Conjuncts and their evaluators in this node. > Maybe mention who owns the pointers? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: PS14, Line 86: expr_evaluator->root() > Is the plan to eventually pass around the Expr itself and avoid going via t Really depends on the context. I imagine most cases we will use Expr directly. In some cases when Expr is not directly available we may still go through the evaluator. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 120: FilterContext() { } > nit: can just remove this and rely on the default constructor Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 115: /// computing the size of a scratch row. > Maybe mention what 'pool' is for Done Line 415: Status Init(ObjectPool* pool, RuntimeState* state, int num_build_tuples); > What is pool used for? Comments added. Line 513: MemPool* mem_pool_; > Mention that it's not owned? I suppose that's implied. Comments added. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS14, Line 294: > tab? Done Line 733: vector<ScalarExprEvaluator*>& output_expr_evaluators) > Can't this be a const& still? The vector isn't being modified is it? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: Line 26: #include "exprs/scalar-expr.h" > Maybe not needed? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: PS14, Line 122: DCHECK(conjuncts_map_[entry.first].empty()); Use find() instead. Line 147: // TODO: Move this to Prepare() > Is this TODO in this patch, or does it depend on another change? I am thinking of moving it when PlanNode is introduced. Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ false)); > This is a little inconsistent - above on line 121 we create the RowDescript Good point. I think heap allocated may be safer in case the expression keeps a reference to it. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 146: const std::vector<ScalarExprEvaluator*> min_max_conjunct_evaluators() const { > Shouldn't this return a const&, rather than copying the vector? Yes, right. Looks like the existing code is broken. Line 393: /// the per-scanner ScannerContext.. > Add something like "These correspond to the exprs in 'filter_exprs_'" Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 362: MemPool filter_mem_pool(expr_mem_tracker()); > This is a little subtle - could do with a comment. Done Line 388: for (auto& ctx: filter_ctxs) { > Maybe use a goto and an exit block at the bottom of the function to enforce Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/kudu-scanner.h File be/src/exec/kudu-scanner.h: Line 95: boost::scoped_ptr<MemPool> expr_mem_pool_; > Put it in 'obj_pool_'? Appears to be more consistent with the pattern of hdfs-scanner to leave it as a scoped_ptr. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 164: pool_->Add(build_expr); > Maybe stick with the usual idiom of wrapping pool_->Add() around the alloca Done Line 694: AggFnEvaluator::ShallowClone(parent->pool_, agg_fn_pool.get(), > This should go into 'partition_pool_'. Otherwise if we have an Agg in a sub Done Line 1574: // TODO: consider moving the following codegen logic to AggFn. > This would make sense (not in this patch though). Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 109: // TODO: Move to Prepare(). > Is this to do in this patch or are you planning to do it as part of a futur In a future change. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: Line 243: /// Init() function inherited from DataSink. Overridden to be a no-op for now. > This is a bit of a red flag to me - I think we should structure things so t Init() is being called from DataSink::Create() for most DataSink types. Only the blocking-join build sides are the exceptions right now as they aren't true data sinks of fragments yet. Once they become true data sinks, they will also be initialized in DataSink::Create(). It just happens that the function name "Init" conflicts with the existing Init() function for PhjBuilder for now and I don't see a easy way to merge them. The original Init() function is now renamed to InitExprsAndFilters(). Also added a TODO to merge them once PhjBuilder becomes a true data sink. Line 381: /// List of filters to build. > Mention that there's a correspondence with 'filter_exprs_'? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: PS14, Line 95: nd > and Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 84: AggFnEvaluator::~AggFnEvaluator() { > Can we add a DCHECK() that it was closed? Done PS14, Line 87: inline > probably not needed Done PS14, Line 91: inline > probably not needed Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 250 > Consider leaving these as inline functions in the header - AnalyticEvalNode Done Line 72: /// Convenience functions for creating evaluators for multiple aggregate functions. > Mention that the caller is responsible for closing the created evaluators r Done Line 174: /// True if this evaluator is created from a Clone() call. > ShallowClone() Done PS14, Line 175: is_clone_ > Maybe is_shallow_clone_ to be more precise - although there is only one typ Keep as-is. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 87: RETURN_IF_ERROR(LibCache::instance()->GetSoFunctionPtr(fn_.hdfs_location, > Nit: I don't think the remainder of this function needs all the vertical wh Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.h File be/src/exprs/agg-fn.h: PS14, Line 52: state > value?. "state" could be taken to mean that it initialises the evaluator or Done Line 81: /// TODO: The aggregation node has custom codegen paths for a few of the builtins. > Consider removing the TODO - it's a bit vague and unclear what if anything It corresponds to the TODO in the CodegenUpdateSlot() in PartitionedAggregationNode. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 53: /// --- Expr overview > Nice overview Done Line 81: static Status CreateTree(const TExpr& texpr, ObjectPool* pool, Expr* root); > I think this should be protected since it's only called from ScalarExpr and Done Line 94: bool is_constant() const { return is_constant_; } > I think is_constant() should maybe only be defined in the ScalarExpr sub-hi Done Line 100: /// Simple debug string that provides no expr subclass-specific information. > Comment seems stale since there's no implementation. Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/hive-udf-call.h File be/src/exprs/hive-udf-call.h: Line 28: using namespace impala_udf; > I guess some of these 'using namespace' instances were pre-existing - still Switched to "using impala_udf::*" hidden inside impala namespace. Apparently, it's not legal to have random using declaration inside a class declaration. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/literal.h File be/src/exprs/literal.h: Line 28: using namespace impala_udf; > See other comment about using namespace in headers. Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 154: // Owner of mem_pool_ needs to call FreeAll() on it. > We don't really know what the owner wants to do. Maybe something like "Memo Done Line 436: // TODO: is there a better way to do this? > Consider removing TODO - I don't think we're likely to fix it. Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 29: using namespace impala_udf; > I'm not sure if this works, but maybe if we just move this "using" into the Switch to "using impala_udf::*" inside impala namespace. PS14, Line 43: types > type Done Line 122: void* GetValue(const ScalarExpr& e, const TupleRow* row); > Maybe GetSubExprValue()? I don't feel strongly but might make the intent cl This is used internally by the other GetValue() so it's not strictly used for sub-expression evaluation only. Line 144: Status GetError(int start_idx = 0, int end_idx = -1) const WARN_UNUSED_RESULT; > Arguments maybe need explanation. Done Line 147: void PrintValue(const TupleRow* row, std::string* str); > Does this evaluate the expr? Yes. Comments updated. Line 153: /// The last two are helper functions. > Not sure which "last two" refers to. Comment fixed. Line 157: /// Frees all local allocations made by fn_ctxs_. This can be called when result > Thinking out loud about future work: I think to fix memory transfer I'll ne Yes, it'd work out but not sure the difference in overhead for not reusing the free pool buffer. It's relying on the underlying buffer pool to recycle the buffers, right ? Is that something supported in the new buffer pool ? PS14, Line 158: last two > Same here - maybe out of sync with the code? Done Line 183: /// Users of private GetValue() or 'pool_'. > It might make sense to move the methods that the "friend class" is mean to Done PS14, Line 204: the evaluation > maybe something like "Stores the result of the last evaluation of the the r Comment fixed. We may use this for sub-expression evaluation too. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: PS14, Line 398: subclassed > I think this should be "overridden" Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: Line 164: bool is_interpreted = scalar_fn_wrapper_ == nullptr; > Thank you! This is much clearer. Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/slot-ref.h File be/src/exprs/slot-ref.h: Line 24: using namespace impala_udf; > I think generally we try to avoid using entire namespaces in the headers. M Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 49: /// TODO: Move this to the equivalent of PlanNode class for DataSink. > Is there a JIRA for this? This is essentially the next step for IMPALA-4192. The comment is not quite right. It's fixed in the new patch. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 486: DCHECK(partition_expr->IsLiteral()); > Can you add a comment in CatalogObjects.thrift that these are always litera Done Line 491: // literals Partition exprs are not used in the codegen case. Don't codegen > Missing . Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: Line 93: const TRuntimeFilterDesc& filter_desc_; > Comment what owns the filter? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/sorter.h File be/src/runtime/sorter.h: Line 105: Status Init(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool); > Comment what the arguments are for? The names could maybe be more descripti Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/service/fe-support.cc File be/src/service/fe-support.cc: PS14, Line 194: mem_pool > expr_mem_pool? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/udf/udf.h File be/src/udf/udf.h: Line 111: /// TODO: Move FRAGMENT_LOCAL states to query_state for multi-threading. > Do we have a JIRA trackign this? IMPALA-5356. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: Line 82: Status Open(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool); > Similar to my comment in Sorter - it's unclear what these different argumen Done -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
