Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
......................................................................


Patch Set 10:

(57 comments)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 66:     output_tuple_desc_(descs.GetTupleDescriptor(output_tuple_id_)),
> initialize in .h where possible
While I appreciate the suggestion, I prefer not to add more unrelated clean-up 
to this already gigantic patch.


Line 88:         desc->type() == grouping_exprs_[i]->type());
> i would intuitively think that the build row is the input row.
renamed to intermediate_row_desc_.


Line 97: 
> it feels dicey to pass a reference to an automatic variable, Init() might h
Done


Line 232:   int count = 0;
> isn't that the same as const X**?
Not really :-).

const X** is pointer to pointer to constant X.
X* const * is pointer to constant pointer to X.

The latter fits the bill of vector::data().


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 122:   /// Output expressions to convert row batches onto output values.
> It's confusing that these aren't used in all subclasses of DataSink (e.g. D
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

PS9, Line 485: 
?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS9, Line 69: 
> Does this mean that multiple scanner threads share a single MemPool? That d
Nice catch. I hit similar bugs in HDFS scanners before. Those scanners also 
have a per-scanner MemPool.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

PS9, Line 68: :vector<ScalarExpr
> materialize_tuple_ doesn't seem to be defined. I think we always need these
Done


Line 69: 
> See my comment about the name in sorter.h.
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h
File be/src/exec/subplan-node.h:

Line 65:   /// tree rooted at 'node' and does any initialization that is 
required as a result of
> * and do any initialization that is required as a result of setting the sub
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS9, Line 78: output_tuple_exprs_;
> output_tuple_exprs_? Usually when we have exprs materializing tuples it's n
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 96:     MemPool* mem_pool, AggFnEvaluator** result) {
> move input params to front
Done


Line 161: Status AggFnEvaluator::Open(
> why not const&?
Done


Line 514: 
> single line, here and subsequent functions, where possible
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// and caches all constant input arguments.
> regarding the caching: is there already a todo to move that into the expr s
TODO added.


PS9, Line 94: o, it
> Maybe this should be called ShallowClone() to make it more explicit. Withou
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 68:   }
> scalarexpr does that in create. do the same for this class?
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 87:   for (int i = 0; i < num_children; ++i) {
> we don't use 'const ..' elsewhere for automatic variables.
Done


Line 88:     *child_node_idx += 1;
> formatting
Done


Line 92:   return Status::OK();
> what is this referring to?
Stale comment. Removed.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 82: 
> this is really CreateChildren, not CreateTree.
Moved in/out params to end.

As for the naming, this function actually builds out the entire expression tree 
rooted at 'root'. CreateChildren() doesn't seem to fit. May be I misunderstood 
your comment.


Line 134:   /// Called recursively to create children expr trees for 
sub-expressions.
> misleading: the root of the created tree is not 'parent' (the root is insta
I refactored this function and CreateTree() a bit so it's easier to follow now.


Line 135:   ///
> "for subexpressions"
Done


Line 140:   ///   root: root of the new tree. Created and initialized by the 
caller.
> let's call this root_idx, makes it easier to follow
It's called child_node_idx in the new patch.


Line 146:   ///   !status.ok() if tree is inconsistent or corrupt
> while we're cleaning up, move in/out params to end
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 69: 
> initialize vars in .h where practical
Done


Line 82:     (*evaluator)->fn_ctxs_ptr_ = (*evaluator)->fn_ctxs_.data();
> why do you need that (for a newly created object)?
Removed.


Line 86:     DCHECK_EQ(root.fn_ctx_idx_, -1);
> do that in CreateFnCtxs?
May be easier to do it here given CreateFnCtxs() is called recursively.


Line 133:   FunctionContext::FunctionStateScope scope =
> what's the need for guarding against multiple open calls? (how would that h
In a subplan node, we do call Open() multiple times.


Line 138: Status ScalarExprEvaluator::Open(
> is this really instance_local? if so, leave todo in the right places (presu
Good point. TODO added here and udf.h


Line 155:   closed_ = true;
> instead put a unique_ptr into fn_ctxs?
Good idea but it may not jive well with fn_ctx_ptr_ which is used by the 
codegend code which expects to take an array of pointer. I guess we can add 
more IR in ScalarFnCall::GetCodegendComputeFn() to make it work but may not be 
worth it.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 72:       WARN_UNUSED_RESULT;
> todo?
TODO added. I noticed that some of the fragment local states being shared may 
not be easy to move: for instance, like-predicate is implemented as a 
ScalarFnCall() and its shared fragment local state (i.e. RE2 object) is not 
created until the built-in prepare() function is called. I believe it could be 
a violation of the existing UDF contract to call prepare() in Expr::Init().

We can convert all of our built-in functions to store the fragment local states 
in Expr but we cannot convert UDF to do so at least until C6. The divergence of 
UDF and built-in is different from our existing approach of treating built-in 
and UDF the same way. This may be up for discussion once we get around to 
tackling IMPALA-4743.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 55: /// rooted at a node is defined by [fn_ctx_idx_start_, 
fn_ctx_idx_end_).
> defined
Done


Line 65:   static Status Create(const ScalarExpr& expr, RuntimeState* state, 
ObjectPool* pool,
> move in/out params to back
Done


Line 81:   static Status Open(const std::vector<ScalarExprEvaluator*>& 
evaluators,
> why a & and not a const&?
Done


Line 89:   static void Close(const std::vector<ScalarExprEvaluator*>& 
evaluators,
> same here
Done


Line 108:       const std::vector<ScalarExprEvaluator*>& evaluators,
> 'this expr': does this refer to 'this' or to parameter 'expr'? same for 'th
Done


Line 182:   friend class ScalarFnCall;
> "and live"
Done


Line 205:   bool is_clone_ = false;
> initialized_ is less awkward
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

Line 87: Status ScalarExpr::Create(const vector<TExpr>& texprs, const 
RowDescriptor& row_desc,
> is there a reason this gets called before init()? if so, point it out.
There is no ordering requirement wrt Init().


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 155:   friend class AggFnEvaluator;
> ?
Doesn't work.

Cannot cast vector<AggFn*> to vector<Expr*>;


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 78: /// ScalarExpr contains query compile-time information about an 
expression (e.g.
> "static states": in expr.h you call this query compile-time information, le
Done


Line 98:   /// is stored in ObjectPool 'state->obj_pool()' and returned in 
'expr'. 'row_desc' is
> adapt comment (you're still referring to parameter pool)
Done


Line 156:   friend class AggregationNode;
> why are these all friends? it looks like more of the "protected" functions 
The goal is to hide both Init() and Get*Val() from irrelevant classes to 
enforce the idea that expressions should be evaluated through an evaluator. For 
instance, an exec node should have no business of calling Init() of a 
ScalarExpr.

The peculiarity of C++ prevents a derived class of ScalarExpr from calling 
child[0]->Get*Val() directly as child[0] is of type ScalarExpr* (i.e. base 
class). It requires casting child[0] to the exact derived class type. This is a 
bit unfortunate. One potential workaround is to promote the private static 
Get*Val() functions below to protected and make all derived classes of 
ScalarExpr call them instead. Not particularly straightforward either.


Line 192:   /// nodes which need FunctionContext in the tree. 'next_fn_ctx_idx' 
is the index
> "functioncontext vector in an evaluator": include reference to exact member
Done


Line 194:   /// called recursively down the tree.
> describe param
Done


Line 199:   /// Creates a single ScalarExpr node based on 'texpr_node' and 
returns it
> there is no more fn_ctx_idx_ptr
Done


Line 237:       RuntimeState* state, ScalarExprEvaluator* evaluator) const
> move in/out params to back
Done. I think 'scope' was placed last due to the use of default parameter value 
which is not needed anymore.

It appears to be a prevalent pattern to not always put in/out params near the 
end in the argument list.


Line 247:       RuntimeState* state, ScalarExprEvaluator* evaluator) const;
> same here
Done


Line 260:       llvm::Value* (*args)[2]);
> same here
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

Line 164:   bool is_interpreted = scalar_fn_wrapper_ == nullptr;
> 'interpreted' or 'is_interpreted' is a bit less awkward
Done


Line 179:   // Only evaluate constant arguments at the top level of function 
contexts.
> what does 'top level of function contexts' refer to?
It means the function context in the evaluator which clones copy from.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

Line 134:   /// per-row partition values for shuffling exchange;
> move to class comment, so it's not as easy to overlook. either of this clas
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS9, Line 182: sort_tuple_exprs_;
> Maybe something like 'sort_tuple_exprs_'? I think the important thing is th
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 78: static void SetTColumnValue(
> move input params to front
It's used only to serialize the result after evaluating constant expression 
sent from the FE.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 49:   ScalarExprEvaluator::Close(ordering_expr_evaluators_rhs_, state);
> opened_ = false?
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

PS9, Line 141: ordering_expr_evaluators
> ordering_expr_evaluators_lhs_ to match the expr name?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to