Dan Hecht has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 19:

(14 comments)

nice cleanup

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 111:   DCHECK_LT(child_idx_, children_.size());
this DCHECK should come before any use of child_idx_ since if it's violated, it 
doesn't make sense to call e.g. IsChildPassthrough() with chidl_idx_.


Line 114:     
DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
what do these dchecks have to do with child_eos_? i.e. shouldn't they true 
regardless of the value of child_eos_?


PS19, Line 132: child_idx_++
++child_idx_ per our style.


PS19, Line 147: child_idx_ < children_.size()
Would make more sense as HasMoreMaterialized()


PS19, Line 150: children
children that need materialization


PS19, Line 153: Row
Child row batch


Line 168:       // There are only 3 ways of getting out of this loop:
it would be helpful to concisely say what this loop is responsible for given 
it's complexity.  Something like:
This loop fetches row batches from a single child and materializes each output 
row, until one of these conditions:
1) ...


Line 193:         COUNTER_SET(rows_returned_counter_, num_rows_returned_);
should we do: child_batch_.reset() here? we shouldn't ever reference it again, 
but seems cleaner to keep the invariant that child_batch_ always corresponds to 
the open child (otherwise its memory references aren't valid).


PS19, Line 200:     // We end up here iff one of the following is true (or 
both).
              :     // 1. We are done consuming all batches from the current 
child and we need to move on
              :     //    to the next child.
              :     // 2. The output row batch is at capacity.
this could be summarized for a quicker read:

// Either we've finished the current child or the output batch is at capacity, 
or both.


PS19, Line 204:  In other words, the only way to not end up here if we entered 
the outer loop is if
              :     // the limit is reached.
rather than say that in a comment, how about:

DCHECK(!ReachedLimit());


Line 263:       (!HasMorePassthrough() && !HasMoreMaterialized() && 
!HasMoreConst(state));
shouldn't we put lines 254-263 inside a do-while loop, so that we don't return 
partially filled row-batches when we still have output to produce?  Returning 
partially filled row-batches is legal but can be confusing to clients 
(admittedly it can happen for other reasons currently, but it'd be nice to 
avoid doing that unless there's a good reason).

do {
...
while (!*eos && !row_batch->AtCapacity());


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:

PS19, Line 115: child_idx
'child_idx'


Line 129:     return child_idx_ >= first_materialized_child_idx_ && child_idx_ 
< children_.size();
this suggestion is okay to ignore, but i find these kind of conditions easier 
to read as:

x <= y && y < z

so that it looks more close to how you'd read it in math: x <= y < z.

That said, I don't see how the condition:

first_materialized_child_idx_ <= child_idx_ < children_.size() 

is correct for whether there are more materialized children. i.e. when 
child_idx_ < first_materialized_child_idx, we still have more materalized 
children, right?

So, shouldn't this be:

// We have children that need materialization and haven't processed them all 
yet.
first_materizlied_child_idx_ != children_.size() && child_idx_ < 
children_.size()


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:

Line 94:     DCHECK(false);
this is confusing. either it's an invariant or it's not. it's even more 
confusing because now it makes this case and the one at line 79 different for 
no good reason, and future code readers won't know why.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to