Alex Behm has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through ......................................................................
Patch Set 5: (42 comments) http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG Commit Message: Line 14: Testing: Would be nice to get some idea of the perf improvement. The JIRA has an interesting query. A small microbenchmark would also be useful. http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 102 There was a specific reason why Open() was called here. There is an expectation that GetNext() returns quickly after Open() is called. This expectation has to do with our client/server interaction and the query state transitions. The query goes into the FINISHED state once Open() succeeded on the coordinator fragment. Does test_rows_availability.py succeed? Line 62: pass_through_children_ = tnode.union_node.pass_through; move to initializer list in cosntructor Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state); Needs comment Line 124: if (child_idx_ < children_.size() && isPassThrough(child_idx_)) { High-level comment what is happening here (passthrough). Line 140: if (child_eos_) { Add a comment why it's not ok to Close() the child in this passthrough mode even if the child is at eos. In the meterialization case below, we can and do close the child as soon as possible. Line 141: row_batch->MarkNeedsDeepCopy(); Needs comment Line 150: if (child_idx_ < children_.size() || Might as well reverse this check and move it up, set eos to true and return OK. Easier to see that we're just going to skip the remaining code. http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 36: /// Node that merges the results of its children by materializing their update comment Line 79: std::vector<bool> pass_through_children_; is_child_passthrough_? The existing variable name makes it sound like it is a list of children that are pass through, but there is actually an entry for all children. Move this member out of the "Members that need to be reset()" section. Line 100: inline bool isPassThrough(int idx) { idx -> child_idx const method Line 101: DCHECK(idx < pass_through_children_.size()); DCHECK_LT http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 132: if (this->type() != other_desc.type()) return false; can get rid of this-> http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 554: /// Comparison is done by the contents of the tuple descriptors and not the ids. I'd prefer to preserve the meaning of these existing functions (IsPrefixOf() and Equals(). We have several interesting DCHECKs that require the ids (and not just the physical layout) to be identical. If you really need these functions we should give them new names kind of like we have in the FE for this check. http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 433: // List of booleans that indicates which children can be passed through Remove "List of booleans" since that's redundant Line 435: 4: required list<bool> pass_through is_child_passthrough (good to keep names consistent) http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 222: Would be good to keep the name of this function and the BE equivalent the same. Line 223: public boolean hasEqualPhysicalLayout(SlotDescriptor other) { needs brief comment Line 224: if (!this.getType().matchesType(other.getType())) return false; shouldn't the types be equal()? Line 226: if (this.getByteSize() != other.getByteSize()) return false; can remove 'this' http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java: Line 155: // List of output expressions of the Union node. This should be the same as result List of output expressions produced by the union without the ORDER BY portion (if any). Same as resultExprs_ if there is no ORDER BY. Line 156: // resultExprs_ if the UnionStmt does not have an Order By. Otherwise resultExprs_ Let's avoid referring to specific plan nodes at this stage and instead try to describe 'semantically' what these exprs contain. Line 158: private List<Expr> unionNodeResultExprs_ = Lists.newArrayList(); unionResultExprs_ Line 191: for (Expr e: other.unionNodeResultExprs_) unionNodeResultExprs_.add(e.clone()); use Expr.cloneList() Line 501: for (UnionOperand op: operands_) { combine with the loop over operands_ in L526 http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 46: * the corresponding result exprs into a new tuple. update comment to reflect passthrough capability Line 56: // List of output expressions of the Union node. List of union result exprs of the originating UnionStmt. Used for determining passthrough-compatibility of children. No need to new this list. final Move this above resultExprLists_ to hopefully minimize confusion by visual separation. Line 69: protected boolean passThroughEnabled = true; Seems clearer to make this isInSubplan_ or something and pass that in the constructor as well. The class comment should describe the passthrough capability and why we don't want it inside a subplan. Line 72: protected List<Boolean> passThrough_ = Lists.newArrayList(); isChildPassthrough (at least something that is consistent across FE/BE and thrift) Line 76: protected UnionNode(PlanNodeId id, TupleId tupleId) { remove? Line 263: TupleDescriptor this_tuple_desc = analyzer.getDescTbl().getTupleDesc(tupleId_); use FE camel-case style Line 281: msg.union_node = new TUnionNode( add Preconditions check that asserts the correct size of passThrough Line 299: if (!passThrough_.isEmpty()) { Isn't this always non-empty? Line 300: List<String> passThroughNodes = Lists.newArrayList(); passThroughNodeIds Line 308: Joiner.on(", ").join(passThroughNodes) + "\n"); nit: we usually don't add a spaces after the comma for explain plan stuff Line 309: } might be nice to produce "all" instead of listing all plan node ids if all children are passthrough http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 187: | pass through nodes: 01, 02 pass-through-operands: (to be consistent with our existing "constant-operands") http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/empty.test File testdata/workloads/functional-planner/queries/PlannerTest/empty.test: Line 506: # IMPALA-3586: Verify that Union pass through is disabled in subplans. Might be good to add a separate test for this, since this one is kind of weird for it (unions with only a single operand) http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/union.test File testdata/workloads/functional-planner/queries/PlannerTest/union.test: Line 3085: | partitions=4/4 files=4 size=460B Add a new Kudu planner test (kudu.test) for: select * from functional.alltypes union all select * from functional_kudu.alltypes The operand with the Kudu scan cannot be passed through. However if both operands are Kudu scans, then they can be passed through. http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-query/queries/QueryTest/union.test File testdata/workloads/functional-query/queries/QueryTest/union.test: Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children No need to describe the failure mode of that specific bug you hit during development. Better to describe what case this test is covering: Input tuples that have non-nullable slots. I believe that this should now do passthrough right? Line 1082: # IMPALA-3586: Test the case where no nodes are passed though. Is this not already covered? Line 1093: # IMPALA-3586: Test the case where 1 node is passed though, and one is not. Not already covered? -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
