Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................


Patch Set 5:

(56 comments)

Thanks for the reviews.

http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG
Commit Message:

Line 14: Testing:
> Would be nice to get some idea of the perf improvement. The JIRA has an int
I'll do a benchmark after the next patch (codegen). Or do you think it's worth 
doing a benchmark for both patches?


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 122:   if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) 
child(child_idx_ - 1)->Close(state);
> Can you add a comment that this only applies to passthrough? E.g. "The prev
Done


Line 133:       row_batch->set_num_rows(limit_ - num_rows_returned_);
> There's a corner case that breaks this calculation. The problem is that 'ro
Done. I added saving the number of rows. I am not sure that this case is 
possible though, because passthrough is disabled if we're in a subplan. I don't 
think that adding a test you suggested would be useful also because passthrough 
is disabled in that case.


Line 150:   if (child_idx_ < children_.size() ||
> Is this just an optimisation? Might be best to remove it and keep the code 
Moved this check to the top as Alex suggested. That also takes care of the case 
if someone calls getnext after the Union node set eos to true (which shouldnt 
happen).


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 102
> There was a specific reason why Open() was called here. There is an expecta
Done


Line 62:   pass_through_children_ = tnode.union_node.pass_through;
> move to initializer list in cosntructor
I noticed that vectors don't get initialized in the constructor in other nodes. 
For example, is_asc_order_ in sort_node.h (It's done in Init there).

Do you still think it's a good idea to move it out of Init?


Line 122:   if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) 
child(child_idx_ - 1)->Close(state);
> Needs comment
Done


Line 124:   if (child_idx_ < children_.size() && isPassThrough(child_idx_)) {
> High-level comment what is happening here (passthrough).
Done


Line 140:     if (child_eos_) {
> Add a comment why it's not ok to Close() the child in this passthrough mode
Done


Line 141:       row_batch->MarkNeedsDeepCopy();
> Needs comment
Done


Line 150:   if (child_idx_ < children_.size() ||
> Might as well reverse this check and move it up, set eos to true and return
Great suggestion, Done.


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 99:   /// Returns true if the child can be passed through.
> Nit: "if the child at 'idx'"
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 36: /// Node that merges the results of its children by materializing their
> update comment
Done


Line 79:   std::vector<bool> pass_through_children_;
> is_child_passthrough_?
Done


Line 100:   inline bool isPassThrough(int idx) {
> idx -> child_idx
Done


Line 101:     DCHECK(idx < pass_through_children_.size());
> DCHECK_LT
Done.

Why do we have DCHECK_LT or DCHECK_GT? Why not just use DCHECK? Is it because 
then it will be able to print the actual values if the dcheck fails?


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 132:   if (this->type() != other_desc.type()) return false;
> can get rid of this->
Done


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 541: 
> This comment needs updating to reflect the new behaviour.
I kept Equals unmodified and added Equivalent().


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 554:   /// Comparison is done by the contents of the tuple descriptors and 
not the ids.
> I'd prefer to preserve the meaning of these existing functions (IsPrefixOf(
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 433:   // List of booleans that indicates which children can be passed 
through
> Remove "List of booleans" since that's redundant
Done


Line 435:   4: required list<bool> pass_through
> is_child_passthrough (good to keep names consistent)
Done


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 228:     if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) 
return false;
> Also need to compare the NullIndicatorBit().
Done. It actually turns out that that a non-nullable Kudu column does not get a 
null bit. For example in this query the tuple size is 8 bytes (no null bits) 
for both operands:
select kudu_idx from functional_kudu.alltypesagg_idx limit 5 union all select 
count(*) from functional.alltypestiny;

Added it to planner tests.


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 222: 
> Would be good to keep the name of this function and the BE equivalent the s
Done


Line 223:   public boolean hasEqualPhysicalLayout(SlotDescriptor other) {
> needs brief comment
Done


Line 224:     if (!this.getType().matchesType(other.getType())) return false;
> shouldn't the types be equal()?
Done


Line 226:     if (this.getByteSize() != other.getByteSize()) return false;
> can remove 'this'
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:

Line 155:   // List of output expressions of the Union node. This should be the 
same as result
> List of output expressions produced by the union without the ORDER BY porti
Done


Line 156:   // resultExprs_ if the UnionStmt does not have an Order By. 
Otherwise resultExprs_
> Let's avoid referring to specific plan nodes at this stage and instead try 
Done


Line 158:   private List<Expr> unionNodeResultExprs_ = Lists.newArrayList();
> unionResultExprs_
Done


Line 191:     for (Expr e: other.unionNodeResultExprs_) 
unionNodeResultExprs_.add(e.clone());
> use Expr.cloneList()
Done


Line 501:       for (UnionOperand op: operands_) {
> combine with the loop over operands_ in L526
Done


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1536:     if (ctx_.hasSubplan()) unionNode.disablePassthrough();
> Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget.
Added a TODO in a different file.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 68:   // If false, no child nodes would be passed through in the backend.
> nit: "batches from child nodes" to be a bit clearer
Done


Line 71:   // Indicates which child nodes can be passed through in the backend.
> nit: "batches from child nodes" to be a bit clearer
Done


PS4, Line 176: /*
> /**
Done


Line 177:    * Compute the children for which rows can be forwarded by the 
Union node without being
> It's a little unclear what the input is.
Done


Line 183:     // Pass through is only done for the simple case where the row 
has a single tuple.
> What's the motivation for this? Is it because the union output is always a 
Yes, I think so. Added comment.

Another motivation is that it's very rare for the tuple layout to match exactly 
for all operands if the number of tuples is greater than 1, (for example both 
sides would have to have a join with an identical layout).


Line 266:             analyzer, children_.get(i).getTupleIds(), 
resultExprLists_.get(i)));
> I *think* in principle we may need to also check the nullable tuple IDs, si
Not exactly sure what you mean. We are guaranteed that the layout will be 
identical by line 214 in this file.

Also, I don't think the output tuple of Union is always non-nullable.


Line 299:     if (!passThrough_.isEmpty()) {
> We probably don't want to print this at explain_level MINIMAL.
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 56:   // List of output expressions of the Union node.
> List of union result exprs of the originating UnionStmt. Used for determini
Done. I think this still needs to be initialized to an empty list. Because the 
union node might not have any children. The other alternative is to set it to 
null, but it would make the code less clean. I moved the initilization to the 
constructor.


Line 69:   protected boolean passThroughEnabled = true;
> Seems clearer to make this isInSubplan_ or something and pass that in the c
Done


Line 72:   protected List<Boolean> passThrough_ = Lists.newArrayList();
> isChildPassthrough (at least something that is consistent across FE/BE and 
Done


Line 76:   protected UnionNode(PlanNodeId id, TupleId tupleId) {
> remove?
We actually use both constructors. This constructor is used for creating a node 
with no children. (only constant expressions).


Line 263:     TupleDescriptor this_tuple_desc = 
analyzer.getDescTbl().getTupleDesc(tupleId_);
> use FE camel-case style
removed


Line 281:     msg.union_node = new TUnionNode(
> add Preconditions check that asserts the correct size of passThrough
Done


Line 299:     if (!passThrough_.isEmpty()) {
> Isn't this always non-empty?
It's empty in the case that this node only has const exprs. Removed the check 
anyways because the for loop takes care of the empty case.


Line 300:       List<String> passThroughNodes = Lists.newArrayList();
> passThroughNodeIds
Done


Line 308:             Joiner.on(", ").join(passThroughNodes) + "\n");
> nit: we usually don't add a spaces after the comma for explain plan stuff
Done


Line 309:       }
> might be nice to produce "all" instead of listing all plan node ids if all 
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 187: |  pass through nodes: 01, 02
> pass-through-operands:
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/empty.test
File testdata/workloads/functional-planner/queries/PlannerTest/empty.test:

Line 506: # IMPALA-3586: Verify that Union pass through is disabled in subplans.
> Might be good to add a separate test for this, since this one is kind of we
Done


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3085: |     partitions=4/4 files=4 size=460B
> Add a new Kudu planner test (kudu.test) for:
Done


http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of 
the children
> I think this may be something to do with count(*) being non-nullable. Maybe
The bug is already fixed. During development I did not take into account the 
whether a slot is nullable when comparing tuples. This is fixed now.


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of 
the children
> No need to describe the failure mode of that specific bug you hit during de
Done. Yes, this should be passthrough.


Line 1082: # IMPALA-3586: Test the case where no nodes are passed though.
> Is this not already covered?
I think it's better to have an explicit test case for this. There are very few 
tests in this file that are similar to this one.


Line 1093: # IMPALA-3586: Test the case where 1 node is passed though, and one 
is not.
> Not already covered?
Same as above. Let me know if you disagree.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to