Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext ......................................................................
Patch Set 16: (61 comments) http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 79: DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size()); > move to c'tor Done Line 124: &agg_fn_evaluators_)); > nice, the separation is making the code a lot crisper. Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 92: new SlotRef(desc) : new SlotRef(desc, TYPE_BOOLEAN); > Tab? Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 53: class AggregationNode : public ExecNode { > it's sad that this still needs to be maintained. please apply my comments t Done Line 80: /// FunctionContexts objects are stored in ObjectPool of RuntimeState. > so what is agg_fn_pool_ for then? Stale comment. It's the backing MemPools of 'agg_fn_evaluators_'. Line 98: boost::scoped_ptr<RowDescriptor> intermediate_row_desc_; > any reason this can't go in state->obj_pool()? Done Line 150: /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed code. > "for agg_fn_evaluators_.data()" is more idiomatic. Done Line 165: /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs). > update comment Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 892: AggFn::Close(analytic_fns_); > having an expr::close() and having to call that here is a bit counterintuit Expr::Close() will be called in PlanNode::Close() in the future. This is done here as a transitionary step. Also, I don't see Expr or evaluators any different from other data structures in the exec node. You can extend the argument to any structures you have in an exec node and argue that you can add them to a container which automatically call Close() on them. I don't see the need to create a special case for Expr. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 210: ScalarExpr* partition_by_eq_expr_; > initialize inline Done Line 215: ScalarExpr* order_by_eq_expr_; > same Done Line 216: ScalarExprEvaluator* order_by_eq_expr_evaluator_; > same Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 100: RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs)); > why not just call create() directly here? Not all data_sink have output_exprs. Some of them have partition_exprs only. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 123: boost::scoped_ptr<MemPool> expr_mem_pool_; > do we really need a separate mempool for that? What other MemPool can we use then ? Line 131: virtual Status Init(RuntimeState* state, const TDataSink& tsink, > move in/out param to back Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc File be/src/exec/exchange-node.cc: Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) { > why did we not need this before? This was not needed before as the exchange node owns the evaluators before and was added to vector of evaluators to free periodically. The evaluator is not stored in the tuple row comparator. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 460: if (!EvalConjunct(evaluators[i], row)) return false; > can't you just inline the call here? Not sure I understand the comment. EvalConjunct() is already inlined. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 263: /// object pool. > why? evaluators are execution related stuff so they are stored in the exec nodes' obj_pool (similar to other stuff which live in the obj_pool_). exprs are fragment level entity so it makes sense for it to live in fragment level's obj_pool. Line 290: boost::scoped_ptr<MemPool> expr_mem_pool_; > what mem pool did we use for that before? in general, we want to get away f We were passing the expr_mem_tracker directly into the ExprContext so there is one MemPool per ExprContext before. Now all exprs in an exec node share the expr_mem_pool of the exec node. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 102: Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, > state contains pool Caller can pass in pool with shorter life span than the entire query (e.g. scanner thread). Line 103: const FilterContext& from); > move const args to front Done Line 114: /// Codegen Eval() by codegen'ing the expression evaluations and replacing the type > update comment Done Line 117: static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* filter_expr, > does filter_expr get updated in this function? Yes. Its field ir_compute_fn_ may be populated. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 129: void FreeBuildLocalAllocations(); > FreeLocal<>Allocations()? "LocalAllocations" is a specific terminology we refer to at other places in our code base so not a good idea to split it up. One can look at the names as BuildSide's / ProbeSide's LocalAllocations(). http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: Line 249: DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size()); > this would benefit from a convenience function for a vector You probably still need to pass the expr and its evaluator as argument so there's not much saving. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-table-writer.h File be/src/exec/hbase-table-writer.h: Line 96: /// The evaluators are owned by sink which owns this table writer. > "the sink" Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-avro-table-writer.h File be/src/exec/hdfs-avro-table-writer.h: Line 68: const std::vector<ScalarExprEvaluator*>& output_expr_evaluators); > do these come from the sink? you're already passing in the sink. Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: Line 57: const std::vector<ScalarExprEvaluator*>& output_expr_evaluators); > get those out of sink Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 387: /// the per-scanner ScannerContext. Correspond with exprs in 'filter_exprs_'. > "Corresponds to" Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 122: // Setup the scan node's dictionary filtering conjuncts map. > set up Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 166: friend class DataSink; > or make everything that's private in datasink protected? friend class removed. Not needed. Line 168: virtual Status Init(RuntimeState* state, const TDataSink& tsink, > param order Done Line 202: const std::vector<ScalarExprEvaluator*>& evaluators, std::string* key); > param order The first two params are constant params. Not sure what you meant. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 431: if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjunct_evaluators_.size(), row)) { > uniformly switch to vector<>.data() instead of &vector<>[0], the latter is Done Line 482: if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], conjuncts_.size(), row)) { > same Done Line 675: ht_ctx_.reset(); > why? To remove any possible reference to it. We already called ht_ctx_->Close(state) the line before. Line 1394: // void UpdateSlot(FunctionContext* agg_fn_ctx, ScalarExprEvaluator* agg_expr_evaluator, > isn't this for the input expr? Stale comment. Comment updated. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: Line 448: Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators, > why non-const? Done Line 453: Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& agg_fn_evaluators, > same here Done Line 479: void UpdateTuple(AggFnEvaluator** agg_fn_evaluators, Tuple* tuple, TupleRow* row, > the signatures don't conform to our standard. leave todo in class header In what way ? Line 491: Tuple* GetOutputTuple(vector<AggFnEvaluator*>& agg_fn_evaluators, > missing std; why non-const? Done Line 527: /// Accessor for the expression contexts of an AggFnEvaluator. Returns an array of > update comment Stale declaration. Removed. Line 641: /// Assumes is_merge = false; > update comment to describe params Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/select-node.cc File be/src/exec/select-node.cc: Line 97: ScalarExprEvaluator* const* conjunct_evaluators = &conjunct_evaluators_[0]; > .data() Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// TODO: Move the evaluation of constant input arguments to AggFn setup. > once that's done, can open() go away? Not really as we still have to open the evaluator for the input expressions which involve allocating the function context etc http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: Line 96: } > I was even thinking it might be less error-prone if, on failure, this close I believe the caller has to call ExecNode::Close() anyway to free other resources so I don't see this being more error-prone than other existing control structures in ExecNode. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 266: const std::vector<ScalarExprEvaluator*>& partition_key_value_evaluators() const { > value_evaluators feels redundant. partition_key_evls? The function name seems fine to me. I renamed it partition_key_value_evals(). Line 290: /// This evaluator is safe to be shared by all fragment instances as all expressions > plural Done Line 497: static Status CreatePartKeyExprs(ObjectPool* pool, const HdfsTableDescriptor* hdfs_tbl) > move pool to back Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: Line 187: RETURN_IF_ERROR(DataSink::Create(obj_pool(), runtime_state_, fragment_ctx_, > runtime_state_ already has an obj_pool, remove first param Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 276: Status status = DescriptorTbl::Create( > single line? Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/sorter.h File be/src/runtime/sorter.h: Line 92: /// 'sort_materialize_exprs' are the slot exprs used to materialize the tuples to be > i find this harder to follow than with the original name. these are the exp Renamed to sort_tuple_exprs Line 96: Sorter(TupleRowComparator& compare_less_than, > if you need non-const, then switch to * Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 77: // 'type' indicates the type of the expression value. > it feels like this should live somewhere else (in /runtime/raw-value?) This is a static function used only in FE requests. Don't see why it needs to be moved. Line 80: if (value == nullptr) return; > dcheck col_val != null Done Line 246: exprs[i]->Close(); > evl->root()->Close() and then iterate over collection This would require a const cast which I try to avoid as much as possible. eval->root() return const ScalarExpr& http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: Line 51: opened_ = false; > meaning you can call open() again, which in turns re-creates the evaluators The problem is that the ordering_expr_evaluators_rhs_ is cloned from ordering_expr_evaluators_lhs_ which cannot be done after Open() is called anyway. I suppose it's an okay exception to call ScalarExprEvaluator::Open() in Init() once and not call it again when the exec node is re-opened. We currently call ScalarExprEvaluator::Open() on the re-open of an Exec node in a subplan but I don't really see the point of it as the initialization is skipped after the first time anyway. I renamed Open() to Init() and removed opened_ and updated the callers to call Init() once. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: Line 99: (*codegend_compare_fn_)(&ordering_expr_evaluators_lhs_[0], > .data() Done Line 135: bool opened_; > initialize inline, that way you don't need to change the c'tor if you ever Done Line 146: std::vector<bool> is_asc_; > make const, while you're here Done Line 147: std::vector<int8_t> nulls_first_; > same This one cannot be constant due to the way it's initialized in the constructor. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
