Tim Armstrong has posted comments on this change. Change subject: IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8 ......................................................................
Patch Set 5: (6 comments) I had a few high-level comments: * A lot of this depends on order being preserved across multiple classes. If one class depends on another class returning things in a particular order, we should document that as part of the interface (either in a comment or via the types) * In a lot of cases, if it's important that a particular data structure preserves insertion order, we should declare the member or return value as a LinkedHash* instead of just a Hash*. Otherwise it's all very mysterious. * I think in many cases, instead of trying to preserve an arbitrary order across multiple processing steps, we should instead put the elements into a canonical order at the point where we iterate over them. It's hard to understand why things work otherwise. http://gerrit.cloudera.org:8080/#/c/7073/5/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: Line 443: Map<String, List<PrivilegeRequest>> tablePrivReqs = Maps.newLinkedHashMap(); Mention that the LinkedHashMap is used to preserve the order. I also wonder if TreeMap would be better - that way the order would be alphabetical rather than arbitrary. http://gerrit.cloudera.org:8080/#/c/7073/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 203: public final Map<ExprId, Expr> conjuncts = Maps.newLinkedHashMap(); The changes in here all feel a bit arbitrary. Do we understand why it's necessary to change all of these? Line 1811: private Map<EquivalenceClassId, List<SlotId>> getEquivClasses(List<TupleId> tids) { I don't understand why the order should matter here. It looks like there is some interaction between this, DisjointSet, and createEquivConjuncts() but it's not clear what the contract between the components is with regards to ordering. http://gerrit.cloudera.org:8080/#/c/7073/5/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java: It's really hard to understand why the ordering is necessary in most of these places or what the contracts are between the different data structures. Is there a way we can just enforce the ordering on output, rather than trying to maintain an arbitrary order the whole way through. http://gerrit.cloudera.org:8080/#/c/7073/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 187: Maps.newLinkedHashMap(innerStmt.getTblProperties()); It was not obvious the me that getTblProperties() had any specific order. I had to look at the cup file to figure that out! http://gerrit.cloudera.org:8080/#/c/7073/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: Line 437: public Set<RuntimeFilter> getRuntimeFilters() { I think it would be less brittle to just return a list of filters sorted by filterId here. That also will make it less likely that we will unintentionally change the result if we change the algorithms in here. -- To view, visit http://gerrit.cloudera.org:8080/7073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iad8e1dccec3a51293a109c420bd2b88b9d1e0625 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
