Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10771 )
Change subject: IMPALA-110: Support for multiple DISTINCT ...................................................................... Patch Set 5: (13 comments) I'm a +2 on the backend part. I took a look at the frontend and I think the high-level structure makes sense. The predicate assignment and smaps logic is a bit beyond me but I could invest some time in understanding that better. http://gerrit.cloudera.org:8080/#/c/10771/4/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/10771/4/be/src/exec/streaming-aggregation-node.cc@139 PS4, Line 139: continue; > I'm not sure that its that much clearer, and the logic isn't quite the same Seems ok, thanks for explaining. http://gerrit.cloudera.org:8080/#/c/10771/5/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/10771/5/be/src/exec/streaming-aggregation-node.cc@158 PS5, Line 158: This nit: caps http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@36 PS5, Line 36: * Encapsulates all the information needed to compute a list of aggregate functions with It might be worth explaining how this relates to MultiAggregateInfo - why it is separate (we use a different strategy for generating plans). http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@38 PS5, Line 38: * A mix distinct and non-distinct aggregation functions is allowed as long as all nit: mix of http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@205 PS5, Line 205: throw new AnalysisException( Is this reachable now? Can it be a precondition? I don't see this error message in a test so either it's not reachable or we're missing test coverage. http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java: http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java@33 PS5, Line 33: * Base class for AggregateInfo and AnalyticInfo containing the intermediate and output Update? http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@40 PS5, Line 40: * SELECT block including their distributed execution. Might be worth explaining more directly the relationship to AggregateInfo up the top. http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@366 PS5, Line 366: getTranspseGroupingExprs typo: transpose http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@521 PS5, Line 521: ExprSubstitutionMap result = new ExprSubstitutionMap(lhsExprs, rhsExprs); nit: unnecessary intermediate variable 'result' http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@176 PS5, Line 176: // TODO: If this is the transposition phase, then we can push conjuncts that Is this an important optimisation? http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@277 PS5, Line 277: MURMUR_HASH Tianyi decided to use FastHash for exchanges over murmur because it has slightly better properties (I can find the JIRA if needed). I wonder if we should also be using FastHash here instead. It probably doesn't make a difference in practice I guess so unclear if it's worth the effort, but it seems like the better thing to do. http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@453 PS5, Line 453: // TODO: Determine node-level spillable buffer size. Does it even make sense? Probably not if we're making the decision at the aggregator level. http://gerrit.cloudera.org:8080/#/c/10771/5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@519 PS5, Line 519: ResourceProfile result = new ResourceProfileBuilder() nit: Unnecessary intermediate var. -- To view, visit http://gerrit.cloudera.org:8080/10771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Gerrit-Change-Number: 10771 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 17 Aug 2018 20:31:12 +0000 Gerrit-HasComments: Yes
