Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )
Change subject: IMPALA-5976: Remove equivalence class computation in FE ...................................................................... Patch Set 4: (24 comments) http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: /** > I think If the predicate transfer is the starting point to consider whether Fair point. I was thinking of a definition like this: Slot A has a value transfer to slot B if for all rows containing both slots slot B has the same value as slot A or the tuple containing slot B is NULL. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102 PS4, Line 102: * Slot value transfer: Slot value transfers: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105 PS4, Line 105: * Its symmetric closure is a equivalence relation. Its equivalence class is called slot What does "Its" refer to here? I think it's easier to state less formally: Each slot is contained in exactly one equivalence class. A slot equivalence class is a set of slots where each pair of slots has a mutual value transfer. Equivalence classes are assigned an arbitrary id to distinguish them from another. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527 PS4, Line 1527: // select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null to drive it home even further use "B.col is null" as the predicate to show that the NULL-checking predicate is unrelated to the slots participating in the equivalence http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648 PS4, Line 1648: // A map from equivalence class IDs to equivalence classese. The equivalence classes typo: classes http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813 PS4, Line 1813: * Returns a map of slot equivalence classes on the set of slots in given tuples. the given tuples http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838 PS4, Line 1838: * propagation. Each mapping assigns every slot in srcSids to slot in destTid which has Each mapping assigns every slot in srcSids to a slot in destTid which has a value transfer from srcSid. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989 PS4, Line 1989: + "\n" + tc + "Condensed Graph:\n" + condensedTc); move the first "\n" to previous line http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998 PS4, Line 1998: // transform equality predicates into a transfer graph remove (pretty clear from function comment) http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691 PS4, Line 691: interface SlotRefComparator { Move this into SlotRef since it's SlotRef specific? http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696 PS4, Line 696: * Returns if this expr matches 'that'. Two exprs match if: Returns true if this expr matches 'that' http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699 PS4, Line 699: * 2. For every pair of corresponding SlotRef, slotRefCmp.matches returns true. For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700 PS4, Line 700: * 3. For every pair of corresponding non-SlotRef exprs, localEquals returns true. localEquals() (we generally use () for function names to make it clear we are referring to a function) http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719 PS4, Line 719: if (fn_ == null && that.fn_ == null) return true; I think we should separate matches() and localEquals() more cleanly. I think localEquals() should be non-abstract and have a default implementation that checks the type and fn_ of this and that. Subclasses can override and call super.localEquals() first. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@727 PS4, Line 727: * children and fn_. Not checking fn_ seems a little strange. The function semantics would be cleaner if localEquals() compared this and that without children. Following my above suggestion will also avoid the mysterious "return true" implementations of localEquals() in a few places. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@962 PS4, Line 962: * Return a new list without duplicate exprs (according to cmp). according to matches() using 'cmp' http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198 PS4, Line 198: * A wrapper around localEquals checking type equality, used for A wrapper around localEquals(). (no need to explain further, the "checking type equality" is more confusing than helpful) http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399 PS3, Line 399: boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN || > This is the output partitioning of this fragment. If it is a right outer jo Makes sense, please add a comment to explain. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java File fe/src/main/java/org/apache/impala/util/Graph.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52 PS3, Line 52: */ > Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is mo This addresses the performance issue, but it does not address the testing/readability issue. If we expose a generic API like this on every subclass of Graph then this signals to the reader that it is meaningful to call this function on every subclass and therefore we need to test that all variants actually work. We "could" add tests that show SccCondensedGraph.reflexiveTransitiveClosure() works correctly. But that is wasted effort and unnecessary code because we never intend/require reflexiveTransitiveClosure() to be called on an SccCondensedGraph. So the bigger issue to me is the "unnecessary abstraction" which adds cognitive burden to readers. Here's a proposal to solve this issue. * I believe we need reflexiveTransitiveClosure() on WritableGraph for validate() and and RandomAccessibleGraph for condensedReflexiveTransitiveClosure() * How about we make reflexiveTransitiveClosure() a member of RandomAccessibleGraph or we make reflexiveTransitiveClosure() accept a RandomAccessibleGraph or something that makes the call specific RandomAccessibleGraph * To address the validate() use case we can add a simple function that converts a WritableGraph to a RandomAccessibleGraph This way we don't expose unnecessary generality and our existing tests cover all the code. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java File fe/src/main/java/org/apache/impala/util/Graph.java: http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@91 PS4, Line 91: public int numVertices() { return adjList_.length; } @Override http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@106 PS4, Line 106: // Adjacency list. Each in list the adjacency list is sorted. Don't understand the second sentence http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@134 PS4, Line 134: * An strongly-connected-component(SCC)-condensed graph. Vertices are mapped to their A graph condensed by its strongly-connected components (SCC). Vertices are mapped to their SCCs and an inner graph on the SCCs is stored. http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@152 PS4, Line 152: public int numVertices() { return sccIds_.length; } @Override http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@199 PS4, Line 199: public static SccCondensedGraph condensedReflexiveTransitiveClosure(WritableGraph g) { Very clear now! Thanks. -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Comment-Date: Tue, 14 Nov 2017 01:26:29 +0000 Gerrit-HasComments: Yes
