Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext ......................................................................
Patch Set 17: (62 comments) http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 79: DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size()); move to c'tor Line 124: DCHECK_EQ(agg_fns_.size(), agg_fn_evaluators_.size()); nice, the separation is making the code a lot crisper. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 53: class AggregationNode : public ExecNode { it's sad that this still needs to be maintained. please apply my comments to the partitioined-*.{cc,h} as well, where it makes sense Line 80: /// FunctionContexts objects are stored in ObjectPool of RuntimeState. so what is agg_fn_pool_ for then? Line 98: boost::scoped_ptr<RowDescriptor> intermediate_row_desc_; any reason this can't go in state->obj_pool()? Line 150: /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed code. "for agg_fn_evaluators_.data()" is more idiomatic. are you going to do a rename of "evaluator[s]" suffixes to "evl[s]" or "eval[s]" at the end? Line 165: /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs). update comment http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 892: AggFn::Close(analytic_fns_); having an expr::close() and having to call that here is a bit counterintuitive (and manual, since we need to do that for every expr). would be nice to move that to a utility function in runtimestate (track the cache entries in some ancillary structure and then free them all at once instead of piecemeal). that would reduce the likelihood of forgetting this somewhere. not necessary to do it as part of this patch, but a todo would be nice. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 210: ScalarExpr* partition_by_eq_expr_; initialize inline Line 215: ScalarExpr* order_by_eq_expr_; same Line 216: ScalarExprEvaluator* order_by_eq_expr_evaluator_; same http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 100: RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs)); why not just call create() directly here? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 87: static Status Create(ObjectPool* pool, RuntimeState* state, state already contains a pool Line 123: boost::scoped_ptr<MemPool> expr_mem_pool_; do we really need a separate mempool for that? Line 131: virtual Status Init(RuntimeState* state, const TDataSink& tsink, move in/out param to back http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc File be/src/exec/exchange-node.cc: Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) { why did we not need this before? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 460: if (!EvalConjunct(evaluators[i], row)) return false; can't you just inline the call here? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 149: static bool EvalConjunct(ScalarExprEvaluator* evaluator, TupleRow* row); evalpredicate a predicate is a conjunct if it's part of a conjunctive clause, ie, a predicate on its own can't be a conjunct. Line 151: /// Evaluate the conjuncts in 'evaluators' over 'row'. add convenience function for a vector? Line 263: /// object pool. why? Line 290: boost::scoped_ptr<MemPool> expr_mem_pool_; what mem pool did we use for that before? in general, we want to get away from having mempools all over the place http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 102: Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, state contains pool Line 103: const FilterContext& from); move const args to front Line 114: /// Codegen Eval() by codegen'ing the expression evaluations and replacing the type update comment Line 117: static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* filter_expr, does filter_expr get updated in this function? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 129: void FreeBuildLocalAllocations(); FreeLocal<>Allocations()? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: Line 249: DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size()); this would benefit from a convenience function for a vector http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-table-writer.h File be/src/exec/hbase-table-writer.h: Line 96: /// The evaluators are owned by sink which owns this table writer. "the sink" http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-avro-table-writer.h File be/src/exec/hdfs-avro-table-writer.h: Line 68: const std::vector<ScalarExprEvaluator*>& output_expr_evaluators); do these come from the sink? you're already passing in the sink. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: Line 57: const std::vector<ScalarExprEvaluator*>& output_expr_evaluators); get those out of sink http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 387: /// the per-scanner ScannerContext. Correspond with exprs in 'filter_exprs_'. "Corresponds to" http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 122: // Setup the scan node's dictionary filtering conjuncts map. set up http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 166: friend class DataSink; or make everything that's private in datasink protected? Line 168: virtual Status Init(RuntimeState* state, const TDataSink& tsink, param order Line 202: const std::vector<ScalarExprEvaluator*>& evaluators, std::string* key); param order http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 431: if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjunct_evaluators_.size(), row)) { uniformly switch to vector<>.data() instead of &vector<>[0], the latter is undefined for empty vectors. jim is in the process of fixing all existing instances up. Line 482: if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjuncts_.size(), row)) { same Line 675: ht_ctx_.reset(); why? we want to stop tearing down control structures Line 1394: // void UpdateSlot(FunctionContext* agg_fn_ctx, ScalarExprEvaluator* agg_expr_evaluator, isn't this for the input expr? http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: Line 448: Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators, why non-const? Line 453: Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators, same here Line 479: void UpdateTuple(AggFnEvaluator** agg_fn_evaluators, Tuple* tuple, TupleRow* row, the signatures don't conform to our standard. leave todo in class header Line 491: Tuple* GetOutputTuple(vector<AggFnEvaluator*>& agg_fn_evaluators, missing std; why non-const? there are several more non-const vector<>& here Line 527: /// Accessor for the expression contexts of an AggFnEvaluator. Returns an array of update comment also, what is i? Line 641: /// Assumes is_merge = false; update comment to describe params http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/select-node.cc File be/src/exec/select-node.cc: Line 97: ScalarExprEvaluator* const* conjunct_evaluators = &conjunct_evaluators_[0]; .data() http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// TODO: Move the evaluation of constant input arguments to AggFn setup. once that's done, can open() go away? http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 266: const std::vector<ScalarExprEvaluator*>& partition_key_value_evaluators() const { value_evaluators feels redundant. partition_key_evls? Line 290: /// This evaluator is safe to be shared by all fragment instances as all expressions plural Line 497: static Status CreatePartKeyExprs(ObjectPool* pool, const HdfsTableDescriptor* hdfs_tbl) move pool to back why const* instead of const&, can this really be null? http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: Line 187: RETURN_IF_ERROR(DataSink::Create(obj_pool(), runtime_state_, fragment_ctx_, runtime_state_ already has an obj_pool, remove first param http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 276: Status status = DescriptorTbl::Create( single line? http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/sorter.h File be/src/runtime/sorter.h: Line 92: /// 'sort_materialize_exprs' are the slot exprs used to materialize the tuples to be i find this harder to follow than with the original name. these are the exprs that materialize the individual slots of the sort tuple, no? sort_tuple_input_exprs or something like that, analogous to agg? Line 96: Sorter(TupleRowComparator& compare_less_than, if you need non-const, then switch to * http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 77: // 'type' indicates the type of the expression value. it feels like this should live somewhere else (in /runtime/raw-value?) Line 80: if (value == nullptr) return; dcheck col_val != null Line 246: exprs[i]->Close(); evl->root()->Close() and then iterate over collection http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: Line 51: opened_ = false; meaning you can call open() again, which in turns re-creates the evaluators? is that intentional? or should the create() call move into an init() function? http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: Line 99: (*codegend_compare_fn_)(&ordering_expr_evaluators_lhs_[0], .data() Line 135: bool opened_; initialize inline, that way you don't need to change the c'tor if you ever move this around Line 146: std::vector<bool> is_asc_; make const, while you're here Line 147: std::vector<int8_t> nulls_first_; same -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes