Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )
Change subject: IMPALA-5976: Remove equivalence class computation in FE ...................................................................... Patch Set 4: (62 comments) http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281 PS3, Line 281: > Let's please avoid code changes unrelated to the purpose of this patch as m Done. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: /** > We need a precise definition. The original definition was more precise. I think If the predicate transfer is the starting point to consider whether there is a value transfer between slots, it should be defined as so. The original sentence "is computed based on..." is not really a definition to me. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93 PS3, Line 93: * > Not sure this part adds value. What's the significance of these statements Removed. I intended to use these statements to show that the symmetric closure is an equivalence relation. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277 PS3, Line 277: // Tracks all warnings (e.g. non-fatal errors) that were generated during analysis. > The SCC-condensed graph representation of all slot value transfers. Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278 PS3, Line 278: // These are passed to the backend and eventually propagated to the shell. Maps from > public/private? Done. private. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146 PS3, Line 1146: conjunctIds.add(e.getId()); > a mutual value transfer Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467 PS3, Line 1467: List<TupleId> tids = Lists.newArrayList(); > how about: if every source slot has a value transfer to a slot in destTid Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513 PS3, Line 1513: boolean hasOuterJoinedTuple = false; > Let's simplify the example and make it as clear as possible: Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618 PS3, Line 1618: // check if we already created this predicate > Need a definition of equivalence class in the Analyzer class comment. You d I think they are the same. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636 PS3, Line 1636: * equivalent slot sets of lhsTids and rhsTids. > Garbled sentence, please clean up Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956 PS3, Line 1956: public boolean isVisible(TupleId tid) { > the registered value transfers Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973 PS3, Line 1973: new WritableGraph(globalState_.descTbl.getMaxSlotId().asInt() + 1); > missing space in string Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982 PS3, Line 1982: RandomAccessibleGraph reference = > Add value-transfer edges to 'g' based on the registered equi-join conjuncts Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059 PS3, Line 2059: if (tblRef.getJoinOp() == JoinOperator.LEFT_OUTER_JOIN > of the given slot id Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062 PS3, Line 2062: g.addEdge(outerSlot.asInt(), innerSlot.asInt()); > slot -> sid Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2066 PS3, Line 2066: } > Unfortunate that we have to do this. Should probably clean this up later by Ack http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2073 PS3, Line 2073: * Time complexity: O(V) where V = number of slots > Simpler to avoid "image set" and just use your second sentence to describe. Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2087 PS3, Line 2087: * Time complexity: O(V) where V = number of slots > Second sentence should be part of the equivalence class definition in the A Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2089 PS3, Line 2089: public List<SlotId> getValueTransferTargets(SlotId srcSid) { > We generally avoid these @param for brevity. I concede they might make sens Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@136 PS3, Line 136: public boolean localEquals(Expr that) { return true; } > Missing implementation? localEquals only checks members defined in this subclass. ArithmeticExprs' node-local information is fn_, which is defined and compared in Expr. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@687 PS3, Line 687: return Joiner.on(", ").join(strings); > * Confusing name because we have a BinaryPredicate Expr and it's not clear Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700 PS3, Line 700: * 3. For every pair of corresponding non-SlotRef exprs, localEquals returns true. > I think similar() is confusing, how about matches() or equivalent() Done. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@720 PS3, Line 720: if (fn_ == null || that.fn_ == null) return false; // One null, one not > It returns whether this expr is equal to that ignoring children. Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@721 PS3, Line 721: // Both fn_'s are not null > Fix comment: Only one expr given to this function Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@723 PS3, Line 723: } > protected? Doesn't seem like a good public interface to force callers to ch Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@728 PS3, Line 728: * Caller should guarantee that 'this' and 'that' have the same type. > public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE Changed to SlotrefComparator and moved into SlotRef. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@784 PS3, Line 784: if (id_ == null) { > update comment, should this function live in Analyzer instead? Comment is updated and it's moved to Analyzer. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@121 PS3, Line 121: if (v == 0 && 1.0 / v == Double.NEGATIVE_INFINITY) return false; > Let's keep this patch focused on minimize unrelated changes like this one. Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@182 PS3, Line 182: if (type_ == null) { > Let's keep this patch focused on minimize unrelated changes like this one. Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@522 PS3, Line 522: public boolean isCompatible(AnalyticExpr analyticExpr) { > Please revert. This reformatted condition is pretty much incomprehensible t Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399 PS3, Line 399: boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN || > I don't understand this change. What does the join op have to do with how t This is the output partitioning of this fragment. If it is a right outer join then the output is partitioned by the rhs join exprs, not lhs because the nulls in lhs can be in any partition. The partitioning compatibility check doesn't use equivalence class anymore and is wrong without this. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1377 PS3, Line 1377: TableRef rhsTblRef = analyzer.getTableRef(rhsId); > simplify with analyzer.getTupleId(lhsSid) Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java File fe/src/main/java/org/apache/impala/util/Graph.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@31 PS3, Line 31: > The flow of calls/code is somewhat hard to follow for our specific use case Specialized most of them except for reflexiveTransitiveClosure(). It is called with different types for validation purpose. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@40 PS3, Line 40: sb.append(i).append(" => "); > mention that BFS is implemented iteratively to avoid unbounded stack usage Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@43 PS3, Line 43: } > This can be called on any type of Graph but we don't test that it actually It is called on RandomAccessibleGraph and WritableGraph. The former is called in condensedReflexiveTransitiveClosure and the latter is called in Analyzer.computeValueTransferGraph to generate a reference for graph validation. Maybe use a precondition check to exclude SccCondensedGraph? http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@49 PS3, Line 49: /** > How about moving this out of the loop and preallocating it to numVertices() Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52 PS3, Line 52: */ > This line in particular is confusing because dsts() is abstract and it's no Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is more of a problem with the dsts interface. dsts() is not a good API because: 1. SccCondensedGraph.dsts() constructs a new array every time it's called. 2. The return value of SccCondensedGraph.dsts is owned by the caller. On the other hand the return value of RandomAccessibleGraph.dsts() and WritableGraph.dsts() is read only. There isn't a good way to enforce the read-only property. 3. RandomAccessibleGraph just needs an int[] rather than an IntArrayList. But dsts() forces it to use the latter. I changed dsts() interface to returning an iterator. In this way SccCondensedGraph.dsts() doesn't need to "decompress" the result and the caller cannot modify the data in Graphs. Does this solve this problem? http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@53 PS3, Line 53: public RandomAccessibleGraph reflexiveTransitiveClosure() { > style: ++j Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@98 PS3, Line 98: public void addEdge(int srcVid, int dstVid) { > Duplicate edges are allowed. Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@128 PS3, Line 128: int idx = Arrays.binarySearch(adjList_[srcVid], dstVid); > lists I just checked Wikipedia and a lecture note. An adjacency list is a list of lists. According to this, "sorted adjacency list" might be vague. I added some comments there. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@132 PS3, Line 132: > Construct from a list of sorted adjacency lists. Reworded a little. But an adjacency list is a list of lists. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@150 PS3, Line 150: } > for the typical -> because the typical Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@210 PS3, Line 210: * Get the ID of the strongly connected component containing 'vid'. > What types of input Graphs are really supported/expected? This function is merged into condensedReflexiveTransitiveClosure(). See below. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@222 PS3, Line 222: /** > This takes a Graph as input. Does it really work on any type of Graph? I do Changed to WritableGraph. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@223 PS3, Line 223: * Get an array of vertices IDs in the scc. The caller shouldn't modify the returned > Flow here is confusing because we create several intermediate SccCondensedG Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@257 PS3, Line 257: // SCC as vid. > Imo this doesn't really give the intuition for why the algorithm works. Changed to a link. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@258 PS3, Line 258: int unAssignedStackPos; > lowest -> smallest? Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@268 PS3, Line 268: int nextDfsIndex = 0; > What kind of graph is expected here? WritableGraph. Done. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@271 PS3, Line 271: u > vertex ids Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@273 PS3, Line 273: DfsContext ctx = dfsStack.peek(); > A mapping from vertex id to its DFS preordering index. -1 means not visited Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@282 PS3, Line 282: // All successors have been searched. Check if this is the root of an SCC. > * final Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@299 PS3, Line 299: // Tree edge. DFS this successor. ctx.dstIt is not advanced until the DFS > int nextDfsIndex = 0; Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@313 PS3, Line 313: ctx.dstIt.next(); > typo: been Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@359 PS3, Line 359: while (refIt.hasNext() || condensedIt.hasNext()) { > What kind of graph is expected here? WritableGraph. Done. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java File fe/src/main/java/org/apache/impala/util/IntArrayList.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@26 PS3, Line 26: public class IntArrayList { > private, you can add a data() getter Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@29 PS3, Line 29: > public Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@33 PS3, Line 33: data_ = new int[capacity]; > public Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@44 PS3, Line 44: int[] newData = new int[Math.max(data_.length * 2, capacity)]; > style: remove spaces after and before paranthesis Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@58 PS3, Line 58: /** > remove "some" Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@67 PS3, Line 67: > single line where it fits; apply everywhere in this file Done http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@115 PS3, Line 115: > Do we really need this? If not, please remove Removed. It was used to enable Collections.sort() in Graph.validate(). http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@49 PS3, Line 49: fail(); > We should be sure to test all public functionality. As far as I can tell th Done -- To view, visit http://gerrit.cloudera.org:8080/8317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a Gerrit-Change-Number: 8317 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Comment-Date: Fri, 03 Nov 2017 01:36:08 +0000 Gerrit-HasComments: Yes
