Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through ......................................................................
Patch Set 5: (56 comments) Thanks for the reviews. http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG Commit Message: Line 14: Testing: > Would be nice to get some idea of the perf improvement. The JIRA has an int I'll do a benchmark after the next patch (codegen). Or do you think it's worth doing a benchmark for both patches? http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state); > Can you add a comment that this only applies to passthrough? E.g. "The prev Done Line 133: row_batch->set_num_rows(limit_ - num_rows_returned_); > There's a corner case that breaks this calculation. The problem is that 'ro Done. I added saving the number of rows. I am not sure that this case is possible though, because passthrough is disabled if we're in a subplan. I don't think that adding a test you suggested would be useful also because passthrough is disabled in that case. Line 150: if (child_idx_ < children_.size() || > Is this just an optimisation? Might be best to remove it and keep the code Moved this check to the top as Alex suggested. That also takes care of the case if someone calls getnext after the Union node set eos to true (which shouldnt happen). http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 102 > There was a specific reason why Open() was called here. There is an expecta Done Line 62: pass_through_children_ = tnode.union_node.pass_through; > move to initializer list in cosntructor I noticed that vectors don't get initialized in the constructor in other nodes. For example, is_asc_order_ in sort_node.h (It's done in Init there). Do you still think it's a good idea to move it out of Init? Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state); > Needs comment Done Line 124: if (child_idx_ < children_.size() && isPassThrough(child_idx_)) { > High-level comment what is happening here (passthrough). Done Line 140: if (child_eos_) { > Add a comment why it's not ok to Close() the child in this passthrough mode Done Line 141: row_batch->MarkNeedsDeepCopy(); > Needs comment Done Line 150: if (child_idx_ < children_.size() || > Might as well reverse this check and move it up, set eos to true and return Great suggestion, Done. http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 99: /// Returns true if the child can be passed through. > Nit: "if the child at 'idx'" Done http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 36: /// Node that merges the results of its children by materializing their > update comment Done Line 79: std::vector<bool> pass_through_children_; > is_child_passthrough_? Done Line 100: inline bool isPassThrough(int idx) { > idx -> child_idx Done Line 101: DCHECK(idx < pass_through_children_.size()); > DCHECK_LT Done. Why do we have DCHECK_LT or DCHECK_GT? Why not just use DCHECK? Is it because then it will be able to print the actual values if the dcheck fails? http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 132: if (this->type() != other_desc.type()) return false; > can get rid of this-> Done http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 541: > This comment needs updating to reflect the new behaviour. I kept Equals unmodified and added Equivalent(). http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 554: /// Comparison is done by the contents of the tuple descriptors and not the ids. > I'd prefer to preserve the meaning of these existing functions (IsPrefixOf( Done http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 433: // List of booleans that indicates which children can be passed through > Remove "List of booleans" since that's redundant Done Line 435: 4: required list<bool> pass_through > is_child_passthrough (good to keep names consistent) Done http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 228: if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) return false; > Also need to compare the NullIndicatorBit(). Done. It actually turns out that that a non-nullable Kudu column does not get a null bit. For example in this query the tuple size is 8 bytes (no null bits) for both operands: select kudu_idx from functional_kudu.alltypesagg_idx limit 5 union all select count(*) from functional.alltypestiny; Added it to planner tests. http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 222: > Would be good to keep the name of this function and the BE equivalent the s Done Line 223: public boolean hasEqualPhysicalLayout(SlotDescriptor other) { > needs brief comment Done Line 224: if (!this.getType().matchesType(other.getType())) return false; > shouldn't the types be equal()? Done Line 226: if (this.getByteSize() != other.getByteSize()) return false; > can remove 'this' Done http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java: Line 155: // List of output expressions of the Union node. This should be the same as result > List of output expressions produced by the union without the ORDER BY porti Done Line 156: // resultExprs_ if the UnionStmt does not have an Order By. Otherwise resultExprs_ > Let's avoid referring to specific plan nodes at this stage and instead try Done Line 158: private List<Expr> unionNodeResultExprs_ = Lists.newArrayList(); > unionResultExprs_ Done Line 191: for (Expr e: other.unionNodeResultExprs_) unionNodeResultExprs_.add(e.clone()); > use Expr.cloneList() Done Line 501: for (UnionOperand op: operands_) { > combine with the loop over operands_ in L526 Done http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 1536: if (ctx_.hasSubplan()) unionNode.disablePassthrough(); > Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget. Added a TODO in a different file. http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 68: // If false, no child nodes would be passed through in the backend. > nit: "batches from child nodes" to be a bit clearer Done Line 71: // Indicates which child nodes can be passed through in the backend. > nit: "batches from child nodes" to be a bit clearer Done PS4, Line 176: /* > /** Done Line 177: * Compute the children for which rows can be forwarded by the Union node without being > It's a little unclear what the input is. Done Line 183: // Pass through is only done for the simple case where the row has a single tuple. > What's the motivation for this? Is it because the union output is always a Yes, I think so. Added comment. Another motivation is that it's very rare for the tuple layout to match exactly for all operands if the number of tuples is greater than 1, (for example both sides would have to have a join with an identical layout). Line 266: analyzer, children_.get(i).getTupleIds(), resultExprLists_.get(i))); > I *think* in principle we may need to also check the nullable tuple IDs, si Not exactly sure what you mean. We are guaranteed that the layout will be identical by line 214 in this file. Also, I don't think the output tuple of Union is always non-nullable. Line 299: if (!passThrough_.isEmpty()) { > We probably don't want to print this at explain_level MINIMAL. Done http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 56: // List of output expressions of the Union node. > List of union result exprs of the originating UnionStmt. Used for determini Done. I think this still needs to be initialized to an empty list. Because the union node might not have any children. The other alternative is to set it to null, but it would make the code less clean. I moved the initilization to the constructor. Line 69: protected boolean passThroughEnabled = true; > Seems clearer to make this isInSubplan_ or something and pass that in the c Done Line 72: protected List<Boolean> passThrough_ = Lists.newArrayList(); > isChildPassthrough (at least something that is consistent across FE/BE and Done Line 76: protected UnionNode(PlanNodeId id, TupleId tupleId) { > remove? We actually use both constructors. This constructor is used for creating a node with no children. (only constant expressions). Line 263: TupleDescriptor this_tuple_desc = analyzer.getDescTbl().getTupleDesc(tupleId_); > use FE camel-case style removed Line 281: msg.union_node = new TUnionNode( > add Preconditions check that asserts the correct size of passThrough Done Line 299: if (!passThrough_.isEmpty()) { > Isn't this always non-empty? It's empty in the case that this node only has const exprs. Removed the check anyways because the for loop takes care of the empty case. Line 300: List<String> passThroughNodes = Lists.newArrayList(); > passThroughNodeIds Done Line 308: Joiner.on(", ").join(passThroughNodes) + "\n"); > nit: we usually don't add a spaces after the comma for explain plan stuff Done Line 309: } > might be nice to produce "all" instead of listing all plan node ids if all Done http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 187: | pass through nodes: 01, 02 > pass-through-operands: Done http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/empty.test File testdata/workloads/functional-planner/queries/PlannerTest/empty.test: Line 506: # IMPALA-3586: Verify that Union pass through is disabled in subplans. > Might be good to add a separate test for this, since this one is kind of we Done http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/union.test File testdata/workloads/functional-planner/queries/PlannerTest/union.test: Line 3085: | partitions=4/4 files=4 size=460B > Add a new Kudu planner test (kudu.test) for: Done http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-query/queries/QueryTest/union.test File testdata/workloads/functional-query/queries/QueryTest/union.test: Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children > I think this may be something to do with count(*) being non-nullable. Maybe The bug is already fixed. During development I did not take into account the whether a slot is nullable when comparing tuples. This is fixed now. http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-query/queries/QueryTest/union.test File testdata/workloads/functional-query/queries/QueryTest/union.test: Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children > No need to describe the failure mode of that specific bug you hit during de Done. Yes, this should be passthrough. Line 1082: # IMPALA-3586: Test the case where no nodes are passed though. > Is this not already covered? I think it's better to have an explicit test case for this. There are very few tests in this file that are similar to this one. Line 1093: # IMPALA-3586: Test the case where 1 node is passed though, and one is not. > Not already covered? Same as above. Let me know if you disagree. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
