Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )
Change subject: IMPALA-5976: Remove equivalence class computation in FE ...................................................................... Patch Set 6: (26 comments) Patch is looking good! http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: /** > My point is that in "select * from A union B", A.col and B.col doesn't appe Got it, good point! Now I see where your original formulation is coming from. How about this: Slot A has a value transfer to slot B if a predicate on A can be applied to B at some point in the plan. Determining the lowest correct placement of that predicate is subject to the conventional assignment rules. http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1987 PS6, Line 1987: Preconditions.checkState(false, "Condensed transitive closure doesn't equal to " + throw new IllegalStatsException(<msg>) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1988 PS6, Line 1988: "uncondensed transitive closure. Uncondensed graph:\n" + tc + Graph (capital) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989 PS6, Line 1989: "Condensed Graph:\n" + condensedTc); I think we need a \n before "Condensed" as well http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2095 PS6, Line 2095: Collections.sort(result); comment why we are sorting here, also adjust function comment accordingly http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@692 PS6, Line 692: * 1. The tree structures are the same and every pair of corresponding Expr nodes have The tree structures ignoring implicit casts are the same. (we don't explicitly check the return type, that's really part of condition 3 localEquals()) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@712 PS6, Line 712: return localEquals(that); Move this above the loop to detect mismatches faster? http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@716 PS6, Line 716: * Local eq comparator. It returns whether this expr is equal to 'that' ignoring Returns true if this expr is equal to 'that' ignoring children. http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@406 PS6, Line 406: // For right anti and semi joins the lhs join slots does not appear in the output. Right anti and semi joins do not produce NULLs so should the output partition not be lhsJoinPartition? Like you said, the slots in the rhsJoinPartition are not returned from such joins. http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@413 PS6, Line 413: // Otherwise it's good to use the lhs partition. Otherwise we're good to... http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java File fe/src/main/java/org/apache/impala/util/Graph.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@25 PS6, Line 25: import static java.lang.Math.addExact; remove (not needed and does not compile on Java7) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@62 PS6, Line 62: /** Return an iterator of vertex IDs with an edge from srcVid. */ remove (comment in super class is sufficient) http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@176 PS6, Line 176: final int[] condensedAdjList = condensed_.adjList_[sccIds_[srcVid]]; members should be private http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@185 PS6, Line 185: adjListPos += 1; ++adjListPos http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@194 PS6, Line 194: memberPos += 1; ++memberPos http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@223 PS6, Line 223: // Step 2: Compute the reflexive transitive closure . O(V(V+E)) extra spaces before O http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/Graph.java@356 PS6, Line 356: condensedAdjList[srcSccId] = new int[bs.cardinality()]; same code is here and in L139, factor our into a helper http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java File fe/src/main/java/org/apache/impala/util/IntArrayList.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@20 PS6, Line 20: import java.util.Arrays; not needed http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@21 PS6, Line 21: import java.util.Iterator; not needed http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntArrayList.java@88 PS6, Line 88: int pos_ = 0; private http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntIterator.java File fe/src/main/java/org/apache/impala/util/IntIterator.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/main/java/org/apache/impala/util/IntIterator.java@28 PS6, Line 28: int pos_ = 0; private http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java: http://gerrit.cloudera.org:8080/#/c/8317/6/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@22 PS6, Line 22: /** Unit test for intArrayList. */ remove (pretty clear from the class name) http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test: http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1055 PS6, Line 1055: # The last join in this query is the test subject and the first 2 joins are used to get The second part needs a clearer explanation, for example, the first two joins are used to ensure 'a' is made nullable before the fragment executing the full outer join http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1056 PS6, Line 1056: # around a check in refsNullableTupleId(). * I think the existing refsNullableTupleId() is subtly wrong. It should not matter whether an input fragment has made a tuple nullable already. If the tuple goes through an outer join, then the partition compatibility may not hold. * I believe your change in createPartitionedHashJoinFragment() is the correct one and makes the refsNullableTupleId() check obsolete. Hopefully this can help simplify this test. You can also add another test to test the RANDOM partitioning of full outer joins that does not rely on aggregation (e.g. it tests join partition compatibility instead) This query shows the incorrectness of the refsNullableTupeId() check: select /* +straight_join */ t2.id, count(*) from functional.alltypes t1 left outer join /* +shuffle */ functional.alltypessmall t2 on t1.int_col = t2.int_col right outer join /* +shuffle */ functional.alltypestiny t3 on t2.id = t3.id group by t2.id Can you please file a new JIRA to track the refsNullableTupleId() bug? We can chat to see whether it's sensible to fix that in a separate patch or not. http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1057 PS6, Line 1057: select STRAIGHT_JOIN a.int_col, count(*) from functional.alltypes as a inner join use /* +straight_join */ and also annotate the joins with /* +shuffle */ to make sure we get partitioned joins http://gerrit.cloudera.org:8080/#/c/8317/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test@1097 PS6, Line 1097: 04:HASH JOIN [INNER JOIN, PARTITIONED] I understand we need join 05 to make 'a' nullable but what's this join for? -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Comment-Date: Fri, 17 Nov 2017 08:06:38 +0000 Gerrit-HasComments: Yes
