Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 4:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281:
> Let's please avoid code changes unrelated to the purpose of this patch as m
Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> We need a precise definition. The original definition was more precise.
I think If the predicate transfer is the starting point to consider whether 
there is a value transfer between slots, it should be defined as so. The 
original sentence "is computed based on..." is not really a definition to me.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  *
> Not sure this part adds value. What's the significance of these statements
Removed. I intended to use these statements to show that the symmetric closure 
is an equivalence relation.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277:     // Tracks all warnings (e.g. non-fatal errors) that were 
generated during analysis.
> The SCC-condensed graph representation of all slot value transfers.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278:     // These are passed to the backend and eventually propagated 
to the shell. Maps from
> public/private?
Done. private.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:           conjunctIds.add(e.getId());
> a mutual value transfer
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:     List<TupleId> tids = Lists.newArrayList();
> how about: if every source slot has a value transfer to a slot in destTid
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:       boolean hasOuterJoinedTuple = false;
> Let's simplify the example and make it as clear as possible:
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:         // check if we already created this predicate
> Need a definition of equivalence class in the Analyzer class comment. You d
I think they are the same.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636:    * equivalent slot sets of lhsTids and rhsTids.
> Garbled sentence, please clean up
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:   public boolean isVisible(TupleId tid) {
> the registered value transfers
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973:         new 
WritableGraph(globalState_.descTbl.getMaxSlotId().asInt() + 1);
> missing space in string
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:       RandomAccessibleGraph reference =
> Add value-transfer edges to 'g' based on the registered equi-join conjuncts
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:       if (tblRef.getJoinOp() == JoinOperator.LEFT_OUTER_JOIN
> of the given slot id
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:         g.addEdge(outerSlot.asInt(), innerSlot.asInt());
> slot -> sid
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2066
PS3, Line 2066:       }
> Unfortunate that we have to do this. Should probably clean this up later by
Ack


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2073
PS3, Line 2073:    * Time complexity: O(V) where V = number of slots
> Simpler to avoid "image set" and just use your second sentence to describe.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2087
PS3, Line 2087:    * Time complexity: O(V) where V = number of slots
> Second sentence should be part of the equivalence class definition in the A
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2089
PS3, Line 2089:   public List<SlotId> getValueTransferTargets(SlotId srcSid) {
> We generally avoid these @param for brevity. I concede they might make sens
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@136
PS3, Line 136:   public boolean localEquals(Expr that) { return true; }
> Missing implementation?
localEquals only checks members defined in this subclass. ArithmeticExprs' 
node-local information is fn_, which is defined and compared in Expr.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@687
PS3, Line 687:     return Joiner.on(", ").join(strings);
> * Confusing name because we have a BinaryPredicate Expr and it's not clear
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS3, Line 700:    * 3. For every pair of corresponding non-SlotRef exprs, 
localEquals returns true.
> I think similar() is confusing, how about matches() or equivalent()
Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@720
PS3, Line 720:     if (fn_ == null || that.fn_ == null) return false; // One 
null, one not
> It returns whether this expr is equal to that ignoring children.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@721
PS3, Line 721:     // Both fn_'s are not null
> Fix comment: Only one expr given to this function
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@723
PS3, Line 723:   }
> protected? Doesn't seem like a good public interface to force callers to ch
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@728
PS3, Line 728:    * Caller should guarantee that 'this' and 'that' have the 
same type.
> public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE
Changed to SlotrefComparator and moved into SlotRef.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@784
PS3, Line 784:     if (id_ == null) {
> update comment, should this function live in Analyzer instead?
Comment is updated and it's moved to Analyzer.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@121
PS3, Line 121:     if (v == 0 && 1.0 / v == Double.NEGATIVE_INFINITY) return 
false;
> Let's keep this patch focused on minimize unrelated changes like this one.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@182
PS3, Line 182:       if (type_ == null) {
> Let's keep this patch focused on minimize unrelated changes like this one.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@522
PS3, Line 522:     public boolean isCompatible(AnalyticExpr analyticExpr) {
> Please revert. This reformatted condition is pretty much incomprehensible t
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == 
RIGHT_OUTER_JOIN ||
> I don't understand this change. What does the join op have to do with how t
This is the output partitioning of this fragment. If it is a right outer join 
then the output is partitioned by the rhs join exprs, not lhs because the nulls 
in lhs can be in any partition. The partitioning compatibility check doesn't 
use equivalence class anymore and is wrong without this.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1377
PS3, Line 1377:       TableRef rhsTblRef = analyzer.getTableRef(rhsId);
> simplify with analyzer.getTupleId(lhsSid)
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@31
PS3, Line 31:
> The flow of calls/code is somewhat hard to follow for our specific use case
Specialized most of them except for reflexiveTransitiveClosure(). It is called 
with different types for validation purpose.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@40
PS3, Line 40:       sb.append(i).append(" => ");
> mention that BFS is implemented iteratively to avoid unbounded stack usage
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@43
PS3, Line 43:       }
> This can be called on any type of Graph but we don't test that it actually
It is called on RandomAccessibleGraph and WritableGraph. The former is called 
in condensedReflexiveTransitiveClosure and the latter is called in 
Analyzer.computeValueTransferGraph to generate a reference for graph 
validation. Maybe use a precondition check to exclude SccCondensedGraph?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@49
PS3, Line 49:   /**
> How about moving this out of the loop and preallocating it to numVertices()
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:    */
> This line in particular is confusing because dsts() is abstract and it's no
Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is more 
of a problem with the dsts interface. dsts() is not a good API because:
1. SccCondensedGraph.dsts() constructs a new array every time it's called.
2. The return value of SccCondensedGraph.dsts is owned by the caller. On the 
other hand the return value of RandomAccessibleGraph.dsts() and 
WritableGraph.dsts() is read only. There isn't a good way to enforce the 
read-only property.
3. RandomAccessibleGraph just needs an int[] rather than an IntArrayList. But 
dsts() forces it to use the latter.

I changed dsts() interface to returning an iterator. In this way 
SccCondensedGraph.dsts() doesn't need to "decompress" the result and the caller 
cannot modify the data in Graphs. Does this solve this problem?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@53
PS3, Line 53:   public RandomAccessibleGraph reflexiveTransitiveClosure() {
> style: ++j
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@98
PS3, Line 98:     public void addEdge(int srcVid, int dstVid) {
> Duplicate edges are allowed.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@128
PS3, Line 128:       int idx = Arrays.binarySearch(adjList_[srcVid], dstVid);
> lists
I just checked Wikipedia and a lecture note. An adjacency list is a list of 
lists. According to this, "sorted adjacency list" might be vague. I added some 
comments there.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@132
PS3, Line 132:
> Construct from a list of sorted adjacency lists.
Reworded a little. But an adjacency list is a list of lists.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@150
PS3, Line 150:     }
> for the typical -> because the typical
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@210
PS3, Line 210:      * Get the ID of the strongly connected component containing 
'vid'.
> What types of input Graphs are really supported/expected?
This function is merged into condensedReflexiveTransitiveClosure(). See below.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@222
PS3, Line 222:     /**
> This takes a Graph as input. Does it really work on any type of Graph? I do
Changed to WritableGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS3, Line 223:      * Get an array of vertices IDs in the scc. The caller 
shouldn't modify the returned
> Flow here is confusing because we create several intermediate SccCondensedG
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@257
PS3, Line 257:         // SCC as vid.
> Imo this doesn't really give the intuition for why the algorithm works.
Changed to a link.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@258
PS3, Line 258:         int unAssignedStackPos;
> lowest -> smallest?
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@268
PS3, Line 268:       int nextDfsIndex = 0;
> What kind of graph is expected here?
WritableGraph. Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@271
PS3, Line 271: u
> vertex ids
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@273
PS3, Line 273:           DfsContext ctx = dfsStack.peek();
> A mapping from vertex id to its DFS preordering index. -1 means not visited
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@282
PS3, Line 282:             // All successors have been searched. Check if this 
is the root of an SCC.
> * final
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@299
PS3, Line 299:               // Tree edge. DFS this successor. ctx.dstIt is not 
advanced until the DFS
> int nextDfsIndex = 0;
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@313
PS3, Line 313:               ctx.dstIt.next();
> typo: been
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@359
PS3, Line 359:         while (refIt.hasNext() || condensedIt.hasNext()) {
> What kind of graph is expected here?
WritableGraph. Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@26
PS3, Line 26: public class IntArrayList {
> private, you can add a data() getter
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@29
PS3, Line 29:
> public
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@33
PS3, Line 33:     data_ = new int[capacity];
> public
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@44
PS3, Line 44:       int[] newData = new int[Math.max(data_.length * 2, 
capacity)];
> style: remove spaces after and before paranthesis
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@58
PS3, Line 58:   /**
> remove "some"
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@67
PS3, Line 67:
> single line where it fits; apply everywhere in this file
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@115
PS3, Line 115:
> Do we really need this? If not, please remove
Removed. It was used to enable Collections.sort() in Graph.validate().


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@49
PS3, Line 49:       fail();
> We should be sure to test all public functionality. As far as I can tell th
Done



--
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Comment-Date: Fri, 03 Nov 2017 01:36:08 +0000
Gerrit-HasComments: Yes

Reply via email to