Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 14:

(13 comments)

The new code is much clearer! I think we can still improve it further though. 
Happy to go over the subtleties in person if you prefer.

http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 138:   while (const_expr_list_idx_ < const_expr_lists_.size() && 
!row_batch->AtCapacity() && !ReachedLimit()) {
long line


Line 144:   if (const_expr_list_idx_ == const_expr_lists_.size()) *eos = true;
*eos = const_expr_list_idx_ == const_expr_lists_.size();


Line 196:     if (child_batch_.get() == NULL) {
nullptr


Line 210:       // There are only 3 ways of getting out of this loop:
We also break out if we fetch an empty child batch


Line 214:       RETURN_IF_ERROR(QueryMaintenance(state));
remove


Line 274:   if (const_todo_) {
Not a big deal, but I think we should do the const exprs last because most of 
the time this branch will not be taken.


Line 275:     RETURN_IF_ERROR(GetNextConst(state, row_batch, &done));
why not pass &const_todo_ directly, and same for the other cases below


Line 278:     RETURN_IF_ERROR(GetNextPassThrough(state, row_batch, &done));
It might be simpler (== less error prone) overall to order the operands based 
on passthrough, or create two lists of child indexes (passthrough and 
non-passthrough), populated in Init().


Line 282:       child_idx_ = 0;
The code here and the setting of child_idx_ in Open() is kind of subtle. 
There's a baked in assumption that passthrough is done before materialized, but 
the code that makes that work is spread in different places. I think using two 
separate child lists, or ordering the operands would help clarify this.

I'm ok with ordering or separating, but I do have a slight preference for 
ordering because then the evaluation order is clear even at the plan level. 
Otherwise, you need to understand the union implementation in detail to know 
how it gets evaluated.


Line 286:     RETURN_IF_ERROR(GetNextMaterialized(state, row_batch, &done));
add a DCHECK here that child_eos_ is true if there were any passthrough children


Line 290:   *eos = ReachedLimit() || (!const_todo_ && !passthrough_todo_ && 
!materialize_todo_);
Isn't the last condition the same as !materialized_todo_ since we are going 
over the cases strictly in order?


http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 39: /// and expressions don't need to be evaluated. The UnionNode pulls 
row batches
Comment says union goes through children sequentially, which makes to maintain 
imo.


Line 107:   Status GetNextConst(RuntimeState* state, RowBatch* row_batch, bool* 
eos);
Use a different name then eos, e.g. 'done' to avoid confusion with the real eos.

We might not need the param at all if we order the children or separate the 
child lists into passthrough and non-passthrough


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to