Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 4:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> I think If the predicate transfer is the starting point to consider whether
Fair point. I was thinking of a definition like this:

Slot A has a value transfer to slot B if for all rows containing both slots 
slot B has the same value as slot A or the tuple containing slot B is NULL.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102
PS4, Line 102:  * Slot value transfer:
Slot value transfers:


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105
PS4, Line 105:  * Its symmetric closure is a equivalence relation. Its 
equivalence class is called slot
What does "Its" refer to here?

I think it's easier to state less formally:

Each slot is contained in exactly one equivalence class. A slot equivalence 
class is a set of slots where each pair of slots has a mutual value transfer. 
Equivalence classes are assigned an arbitrary id to distinguish them from 
another.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527
PS4, Line 1527:       // select * from (select A.a, B.b from A left join B on 
A.a=B.b) v where b is null
to drive it home even further use "B.col is null" as the predicate to show that 
the NULL-checking predicate is unrelated to the slots participating in the 
equivalence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648
PS4, Line 1648:     // A map from equivalence class IDs to equivalence 
classese. The equivalence classes
typo: classes


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813
PS4, Line 1813:    * Returns a map of slot equivalence classes on the set of 
slots in given tuples.
the given tuples


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838
PS4, Line 1838:    * propagation. Each mapping assigns every slot in srcSids to 
slot in destTid which has
Each mapping assigns every slot in srcSids to a slot in destTid which has a 
value transfer from srcSid.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS4, Line 1989:             + "\n" + tc + "Condensed Graph:\n" + condensedTc);
move the first "\n" to previous line


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998
PS4, Line 1998:     // transform equality predicates into a transfer graph
remove (pretty clear from function comment)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691
PS4, Line 691:   interface SlotRefComparator {
Move this into SlotRef since it's SlotRef specific?


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696
PS4, Line 696:    * Returns if this expr matches 'that'. Two exprs match if:
Returns true if this expr matches 'that'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699
PS4, Line 699:    * 2. For every pair of corresponding SlotRef, 
slotRefCmp.matches returns true.
For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS4, Line 700:    * 3. For every pair of corresponding non-SlotRef exprs, 
localEquals returns true.
localEquals()

(we generally use () for function names to make it clear we are referring to a 
function)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719
PS4, Line 719:     if (fn_ == null && that.fn_ == null) return true;
I think we should separate matches() and localEquals() more cleanly. I think 
localEquals() should be non-abstract and have a default implementation that 
checks the type and fn_ of this and that. Subclasses can override and call 
super.localEquals() first.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@727
PS4, Line 727:    * children and fn_.
Not checking fn_ seems a little strange. The function semantics would be 
cleaner if localEquals() compared this and that without children. Following my 
above suggestion will also avoid the mysterious "return true" implementations 
of localEquals() in a few places.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@962
PS4, Line 962:    * Return a new list without duplicate exprs (according to 
cmp).
according to matches() using 'cmp'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS4, Line 198:    * A wrapper around localEquals checking type equality, used 
for
A wrapper around localEquals().

(no need to explain further, the "checking type equality" is more confusing 
than helpful)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == 
RIGHT_OUTER_JOIN ||
> This is the output partitioning of this fragment. If it is a right outer jo
Makes sense, please add a comment to explain.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:    */
> Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is mo
This addresses the performance issue, but it does not address the 
testing/readability issue. If we expose a generic API like this on every 
subclass of Graph then this signals to the reader that it is meaningful to call 
this function on every subclass and therefore we need to test that all variants 
actually work.

We "could" add tests that show SccCondensedGraph.reflexiveTransitiveClosure() 
works correctly. But that is wasted effort and unnecessary code because we 
never intend/require reflexiveTransitiveClosure() to be called on an 
SccCondensedGraph. So the bigger issue to me is the "unnecessary abstraction" 
which adds cognitive burden to readers.

Here's a proposal to solve this issue.
* I believe we need reflexiveTransitiveClosure() on WritableGraph for 
validate() and and RandomAccessibleGraph for 
condensedReflexiveTransitiveClosure()
* How about we make reflexiveTransitiveClosure() a member of 
RandomAccessibleGraph or we make reflexiveTransitiveClosure() accept a 
RandomAccessibleGraph or something that makes the call specific 
RandomAccessibleGraph
* To address the validate() use case we can add a simple function that converts 
a WritableGraph to a RandomAccessibleGraph

This way we don't expose unnecessary generality and our existing tests cover 
all the code.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@91
PS4, Line 91:     public int numVertices() { return adjList_.length; }
@Override


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@106
PS4, Line 106:     // Adjacency list. Each in list the adjacency list is sorted.
Don't understand the second sentence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS4, Line 134:    * An strongly-connected-component(SCC)-condensed graph. 
Vertices are mapped to their
A graph condensed by its strongly-connected components (SCC). Vertices are 
mapped to their SCCs and an inner graph on the SCCs is stored.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@152
PS4, Line 152:     public int numVertices() { return sccIds_.length; }
@Override


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@199
PS4, Line 199:     public static SccCondensedGraph 
condensedReflexiveTransitiveClosure(WritableGraph g) {
Very clear now! Thanks.



--
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:26:29 +0000
Gerrit-HasComments: Yes

Reply via email to