Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext ......................................................................
Patch Set 10: (57 comments) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 66: output_tuple_desc_(descs.GetTupleDescriptor(output_tuple_id_)), > initialize in .h where possible While I appreciate the suggestion, I prefer not to add more unrelated clean-up to this already gigantic patch. Line 88: desc->type() == grouping_exprs_[i]->type()); > i would intuitively think that the build row is the input row. renamed to intermediate_row_desc_. Line 97: > it feels dicey to pass a reference to an automatic variable, Init() might h Done Line 232: int count = 0; > isn't that the same as const X**? Not really :-). const X** is pointer to pointer to constant X. X* const * is pointer to constant pointer to X. The latter fits the bill of vector::data(). http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 122: /// Output expressions to convert row batches onto output values. > It's confusing that these aren't used in all subclasses of DataSink (e.g. D Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/hash-table.h File be/src/exec/hash-table.h: PS9, Line 485: ? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS9, Line 69: > Does this mean that multiple scanner threads share a single MemPool? That d Nice catch. I hit similar bugs in HDFS scanners before. Those scanners also have a per-scanner MemPool. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h File be/src/exec/sort-node.h: PS9, Line 68: :vector<ScalarExpr > materialize_tuple_ doesn't seem to be defined. I think we always need these Done Line 69: > See my comment about the name in sorter.h. Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h File be/src/exec/subplan-node.h: Line 65: /// tree rooted at 'node' and does any initialization that is required as a result of > * and do any initialization that is required as a result of setting the sub Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS9, Line 78: output_tuple_exprs_; > output_tuple_exprs_? Usually when we have exprs materializing tuples it's n Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 96: MemPool* mem_pool, AggFnEvaluator** result) { > move input params to front Done Line 161: Status AggFnEvaluator::Open( > why not const&? Done Line 514: > single line, here and subsequent functions, where possible Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// and caches all constant input arguments. > regarding the caching: is there already a todo to move that into the expr s TODO added. PS9, Line 94: o, it > Maybe this should be called ShallowClone() to make it more explicit. Withou Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 68: } > scalarexpr does that in create. do the same for this class? Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 87: for (int i = 0; i < num_children; ++i) { > we don't use 'const ..' elsewhere for automatic variables. Done Line 88: *child_node_idx += 1; > formatting Done Line 92: return Status::OK(); > what is this referring to? Stale comment. Removed. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 82: > this is really CreateChildren, not CreateTree. Moved in/out params to end. As for the naming, this function actually builds out the entire expression tree rooted at 'root'. CreateChildren() doesn't seem to fit. May be I misunderstood your comment. Line 134: /// Called recursively to create children expr trees for sub-expressions. > misleading: the root of the created tree is not 'parent' (the root is insta I refactored this function and CreateTree() a bit so it's easier to follow now. Line 135: /// > "for subexpressions" Done Line 140: /// root: root of the new tree. Created and initialized by the caller. > let's call this root_idx, makes it easier to follow It's called child_node_idx in the new patch. Line 146: /// !status.ok() if tree is inconsistent or corrupt > while we're cleaning up, move in/out params to end Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 69: > initialize vars in .h where practical Done Line 82: (*evaluator)->fn_ctxs_ptr_ = (*evaluator)->fn_ctxs_.data(); > why do you need that (for a newly created object)? Removed. Line 86: DCHECK_EQ(root.fn_ctx_idx_, -1); > do that in CreateFnCtxs? May be easier to do it here given CreateFnCtxs() is called recursively. Line 133: FunctionContext::FunctionStateScope scope = > what's the need for guarding against multiple open calls? (how would that h In a subplan node, we do call Open() multiple times. Line 138: Status ScalarExprEvaluator::Open( > is this really instance_local? if so, leave todo in the right places (presu Good point. TODO added here and udf.h Line 155: closed_ = true; > instead put a unique_ptr into fn_ctxs? Good idea but it may not jive well with fn_ctx_ptr_ which is used by the codegend code which expects to take an array of pointer. I guess we can add more IR in ScalarFnCall::GetCodegendComputeFn() to make it work but may not be worth it. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 72: WARN_UNUSED_RESULT; > todo? TODO added. I noticed that some of the fragment local states being shared may not be easy to move: for instance, like-predicate is implemented as a ScalarFnCall() and its shared fragment local state (i.e. RE2 object) is not created until the built-in prepare() function is called. I believe it could be a violation of the existing UDF contract to call prepare() in Expr::Init(). We can convert all of our built-in functions to store the fragment local states in Expr but we cannot convert UDF to do so at least until C6. The divergence of UDF and built-in is different from our existing approach of treating built-in and UDF the same way. This may be up for discussion once we get around to tackling IMPALA-4743. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 55: /// rooted at a node is defined by [fn_ctx_idx_start_, fn_ctx_idx_end_). > defined Done Line 65: static Status Create(const ScalarExpr& expr, RuntimeState* state, ObjectPool* pool, > move in/out params to back Done Line 81: static Status Open(const std::vector<ScalarExprEvaluator*>& evaluators, > why a & and not a const&? Done Line 89: static void Close(const std::vector<ScalarExprEvaluator*>& evaluators, > same here Done Line 108: const std::vector<ScalarExprEvaluator*>& evaluators, > 'this expr': does this refer to 'this' or to parameter 'expr'? same for 'th Done Line 182: friend class ScalarFnCall; > "and live" Done Line 205: bool is_clone_ = false; > initialized_ is less awkward Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: Line 87: Status ScalarExpr::Create(const vector<TExpr>& texprs, const RowDescriptor& row_desc, > is there a reason this gets called before init()? if so, point it out. There is no ordering requirement wrt Init(). http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 155: friend class AggFnEvaluator; > ? Doesn't work. Cannot cast vector<AggFn*> to vector<Expr*>; http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 78: /// ScalarExpr contains query compile-time information about an expression (e.g. > "static states": in expr.h you call this query compile-time information, le Done Line 98: /// is stored in ObjectPool 'state->obj_pool()' and returned in 'expr'. 'row_desc' is > adapt comment (you're still referring to parameter pool) Done Line 156: friend class AggregationNode; > why are these all friends? it looks like more of the "protected" functions The goal is to hide both Init() and Get*Val() from irrelevant classes to enforce the idea that expressions should be evaluated through an evaluator. For instance, an exec node should have no business of calling Init() of a ScalarExpr. The peculiarity of C++ prevents a derived class of ScalarExpr from calling child[0]->Get*Val() directly as child[0] is of type ScalarExpr* (i.e. base class). It requires casting child[0] to the exact derived class type. This is a bit unfortunate. One potential workaround is to promote the private static Get*Val() functions below to protected and make all derived classes of ScalarExpr call them instead. Not particularly straightforward either. Line 192: /// nodes which need FunctionContext in the tree. 'next_fn_ctx_idx' is the index > "functioncontext vector in an evaluator": include reference to exact member Done Line 194: /// called recursively down the tree. > describe param Done Line 199: /// Creates a single ScalarExpr node based on 'texpr_node' and returns it > there is no more fn_ctx_idx_ptr Done Line 237: RuntimeState* state, ScalarExprEvaluator* evaluator) const > move in/out params to back Done. I think 'scope' was placed last due to the use of default parameter value which is not needed anymore. It appears to be a prevalent pattern to not always put in/out params near the end in the argument list. Line 247: RuntimeState* state, ScalarExprEvaluator* evaluator) const; > same here Done Line 260: llvm::Value* (*args)[2]); > same here Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: Line 164: bool is_interpreted = scalar_fn_wrapper_ == nullptr; > 'interpreted' or 'is_interpreted' is a bit less awkward Done Line 179: // Only evaluate constant arguments at the top level of function contexts. > what does 'top level of function contexts' refer to? It means the function context in the evaluator which clones copy from. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 134: /// per-row partition values for shuffling exchange; > move to class comment, so it's not as easy to overlook. either of this clas Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS9, Line 182: sort_tuple_exprs_; > Maybe something like 'sort_tuple_exprs_'? I think the important thing is th Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 78: static void SetTColumnValue( > move input params to front It's used only to serialize the result after evaluating constant expression sent from the FE. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: Line 49: ScalarExprEvaluator::Close(ordering_expr_evaluators_rhs_, state); > opened_ = false? Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: PS9, Line 141: ordering_expr_evaluators > ordering_expr_evaluators_lhs_ to match the expr name? Done -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
