Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 7:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 145:   DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc()));
This should be IsPrefixOf() because we sanity checking the row descriptors of 
exec nodes (and not row batches).


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 115:   // passthrough case, the child can be closed right away because the 
row batch received
the child can be closed right away -> the child was already closed?


Line 116:   // from the child is copied (more details below).
accuracy: the row batch wasn't copied


Line 121:   if (child_idx_ < children_.size() && 
isChildPassThrough(child_idx_)) {
Suggest comment:
// Handle passthrough children. We pass 'row_batch' directly into the GetNext() 
call on the child.


Line 122:     // If the rows from the current child can be passed through, the 
physical row layout
This comment doesn't seem to add anything, I suggest removing it.


Line 131:     // It may be possible that the row batch is not empty, so we save 
the previous value.
More details would be helpful. If the batch has rows at this point, I'd imagine 
it can cause all sorts of other problems. How can the batch already have rows?


Line 148:       // 'needs_deep_copy' lets us safely close the child in the next 
GetNext() call.
style: 'needs_deep_copy' is not a visible variable here, suggest just saying 
"Marking the batch as needing a deep copy let's us ...


Line 154: 
DCHECK that child_idx_ is not passthrough here


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 67:   // Which children can be passed through, without being materialized.
... without evaluating and materializing their exprs.


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 412:   /// Return true if the physical layout of this descriptor matches 
the physical layout
matches that of other_desc


Line 413:   /// of other_desc, but not necessarily ids.
bot not necessarily the id.


Line 565:   /// of other_desc, but not necessarily ids.
but not necessarily the ids


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h
File be/src/service/query-options.h:

Line 38:       TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\
I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a little 
easier to understand. What do you think?


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 227:   public boolean isEquivalent(SlotDescriptor other) {
Unfortunately, the term 'equivalent' already has a different meaning in the FE 
for slots, so it would be good to the existing term fro this new one. Maybe 
isLayoutEquivalent()?


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:

Line 616:   public List<Expr> getUnionNodeResultExprs() {
getUnionResultExprs()


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 47:  * a child has an identical tuple layout as the output of the union 
node, the
... as the output of the union node, and the child only has naked SlotRefs as 
result exprs, then the child is marked as 'passthrough'. The rows of 
passthrough children are directly returned by the union node, instead of 
materializing the child's result exprs into new tuples.


Line 57:   protected final List<Expr> resultExprs_;
unionResultExprs_ to make distinguish it from the resultExprLists_ and such


Line 73:   // If false, no batches from child nodes would be passed through in 
the backend.
Comment should describe what this flag is. Also you mean "true" right?


Line 76:   // Indicates for which child nodes batches can be passed through in 
the backend.
remove "in the backend" (it's clear that execution happens there)


Line 81:   protected UnionNode(PlanNodeId id, TupleId tupleId) {
Is this c'tor still needed? If not, remove.


Line 89:                       List<Expr> resultExprs, boolean isInSubplan) {
indentation, unionResultExprs


Line 182:    * Compute whether we can pass through rows without materializing 
for the given child.
Can combine/simplify like this:

Returns true if rows from the child with 'childTupleIds' and 'childResultExprs' 
can be returned directly by the union node (without materialization into a new 
tuple).

Might be good to list the conditions for passthrough compatibility.


Line 189:       Analyzer analyzer, List<TupleId> childTupleIds, List<Expr> 
childExprList) {
childExprList -> childResultExprs


Line 190:     if (analyzer.getQueryOptions().isDisable_union_passthrough()) 
return false;
seems clearer to move this into init() and not execute any of the passthrough 
code


Line 193:     // TODO: Remove this as part of IMPALA-4179.
Move TODO to the class comment


Line 194:     if (isInSubplan_) return false;
same here, seems easier to move this check into init()


Line 205:     // Verify that the union node has one slot for every expression.
union node -> union tuple descriptor


Line 212:     if (resultExprs_.size() != 
childTupleDescriptor.getSlots().size()) return false;
I don't think this tricky check is correct because it won't allow passthrough 
for something like:

select int_col, int_col, int_col from functional.alltypes
union all
select int_col, int_col, int_col from functional.alltypes


Line 218:       SlotRef unionSlot = resultExprs_.get(i).unwrapSlotRef(false);
unionSlotRef, childSlotRef


Line 220:       if (!unionTupleDescriptor.getSlots().get(i).isMaterialized()) 
continue;
move above the unwrapSlotRef() calls


Line 221:       
Preconditions.checkState(unionSlot.getDesc().getParent().getId().equals(tupleId_));
Don't think we need this check, but something like this would be good:

Preconditions.checkStateNotNull(unionSlotRef);


Line 223:         Preconditions.checkState(!(childSlot instanceof SlotRef));
No need for this check


Line 262:     // Compute which nodes can be passed through.
which child nodes


Line 266:       // Check that if the child outputs a single tuple, then it's 
not nullable. Tuple
move into computePassThrough


http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 329: # IMPALA-3586: The operand with the Kudu scan cannot be passed 
through because it's not
because id is not-nullable (primary key)


Line 346: select id from functional_kudu.alltypes
do select *


http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3104: # IMPALA-3678: Both union operands should produce rows with 
non-nullable slots which
remove "should"


Line 3124: # IMPALA-3678: The Union operands that contain a join should not be 
passed through,
nice


Line 3184: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count for consistency


http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count for consistency


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to