Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 6:

(26 comments)

Patch is looking good!

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> My point is that in "select * from A union B", A.col and B.col doesn't appe
Got it, good point! Now I see where your original formulation is coming from. 
How about this:

Slot A has a value transfer to slot B if a predicate on A can be applied to B 
at some point in the plan. Determining the lowest correct placement of that 
predicate is subject to the conventional assignment rules.


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1987
PS6, Line 1987:         Preconditions.checkState(false, "Condensed transitive 
closure doesn't equal to " +
throw new IllegalStatsException(<msg>)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1988
PS6, Line 1988:             "uncondensed transitive closure. Uncondensed 
graph:\n" + tc +
Graph (capital)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS6, Line 1989:             "Condensed Graph:\n" + condensedTc);
I think we need a \n before "Condensed" as well


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2095
PS6, Line 2095:     Collections.sort(result);
comment why we are sorting here, also adjust function comment accordingly


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@692
PS6, Line 692:    * 1. The tree structures are the same and every pair of 
corresponding Expr nodes have
The tree structures ignoring implicit casts are the same.

(we don't explicitly check the return type, that's really part of condition 3 
localEquals())


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@712
PS6, Line 712:     return localEquals(that);
Move this above the loop to detect mismatches faster?


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@716
PS6, Line 716:    * Local eq comparator. It returns whether this expr is equal 
to 'that' ignoring
Returns true if this expr is equal to 'that' ignoring children.


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@406
PS6, Line 406:       // For right anti and semi joins the lhs join slots does 
not appear in the output.
Right anti and semi joins do not produce NULLs so should the output partition 
not be lhsJoinPartition? Like you said, the slots in the rhsJoinPartition are 
not returned from such joins.


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@413
PS6, Line 413:       // Otherwise it's good to use the lhs partition.
Otherwise we're good to...


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@25
PS6, Line 25: import static java.lang.Math.addExact;
remove (not needed and does not compile on Java7)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@62
PS6, Line 62:     /** Return an iterator of vertex IDs with an edge from 
srcVid. */
remove (comment in super class is sufficient)


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@176
PS6, Line 176:         final int[] condensedAdjList = 
condensed_.adjList_[sccIds_[srcVid]];
members should be private


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@185
PS6, Line 185:             adjListPos += 1;
++adjListPos


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@194
PS6, Line 194:           memberPos += 1;
++memberPos


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS6, Line 223:       // Step 2: Compute the reflexive transitive closure  . 
O(V(V+E))
extra spaces before O


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@356
PS6, Line 356:         condensedAdjList[srcSccId] = new int[bs.cardinality()];
same code is here and in L139, factor our into a helper


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@20
PS6, Line 20: import java.util.Arrays;
not needed


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@21
PS6, Line 21: import java.util.Iterator;
not needed


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@88
PS6, Line 88:       int pos_ = 0;
private


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntIterator.java
File fe/src/main/java/org/apache/impala/util/IntIterator.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntIterator.java@28
PS6, Line 28:       int pos_ = 0;
private


http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@22
PS6, Line 22: /** Unit test for intArrayList. */
remove (pretty clear from the class name)


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test:

http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1055
PS6, Line 1055: # The last join in this query is the test subject and the first 
2 joins are used to get
The second part needs a clearer explanation, for example,

the first two joins are used to ensure 'a' is made nullable before the fragment 
executing the full outer join


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1056
PS6, Line 1056: # around a check in refsNullableTupleId().
* I think the existing refsNullableTupleId() is subtly wrong. It should not 
matter whether an input fragment has made a tuple nullable already. If the 
tuple goes through an outer join, then the partition compatibility may not hold.
* I believe your change in createPartitionedHashJoinFragment() is the correct 
one and makes the refsNullableTupleId() check obsolete.

Hopefully this can help simplify this test.

You can also add another test to test the RANDOM partitioning of full outer 
joins that does not rely on aggregation (e.g. it tests join partition 
compatibility instead)


This query shows the incorrectness of the refsNullableTupeId() check:

select /* +straight_join */ t2.id, count(*)
from functional.alltypes t1
left outer join /* +shuffle */ functional.alltypessmall t2
  on t1.int_col = t2.int_col
right outer join /* +shuffle */ functional.alltypestiny t3
  on t2.id = t3.id
group by t2.id

Can you please file a new JIRA to track the refsNullableTupleId() bug? We can 
chat to see whether it's sensible to fix that in a separate patch or not.


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1057
PS6, Line 1057: select STRAIGHT_JOIN a.int_col, count(*) from 
functional.alltypes as a inner join
use /* +straight_join */ and also annotate the joins with /* +shuffle */ to 
make sure we get partitioned joins


http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1097
PS6, Line 1097: 04:HASH JOIN [INNER JOIN, PARTITIONED]
I understand we need join 05 to make 'a' nullable but what's this join for?



--
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Comment-Date: Fri, 17 Nov 2017 08:06:38 +0000
Gerrit-HasComments: Yes

Reply via email to