Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10771 )
Change subject: IMPALA-110: Support for multiple DISTINCT ...................................................................... Patch Set 10: (2 comments) Thanks for applying the changes! http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@53 PS10, Line 53: contains the phase 2 aggregate functions and : * grouping exprs > No, those will be in aggInfo.secondPhaseDistinctAggInfo.mergeAggInfo As far as I understood mergeAggInfo is only relevant in distributed plans when we need to merge the results of several backends. But, in phase second (not merge) we still need to merge the non-distinct aggregates, because they are grouped too fine grained (GROUP BY exprs + DISTINCT exprs, while they only need to be grouped on GROUP BY exprs). The following query and plan seems to support this: set num_nodes = 1; explain select count(date_string_col), avg(distinct month) from functional.alltypes group by year; +------------------------------------------------------------+ | Explain String | +------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=3.88MB Threads=2 | | Per-Host Resource Estimates: Memory=74MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 02:AGGREGATE [FINALIZE] | | | output: avg(month), count:merge(date_string_col) | | | group by: year | | | | | 01:AGGREGATE | | | output: count(date_string_col) | | | group by: year, month | | | | | 00:SCAN HDFS [functional.alltypes] | | partitions=24/24 files=24 size=478.45KB | +------------------------------------------------------------+ I also attached to the impalad via a debugger too see what's in the aggInfo's aggregateExprs_ in the different phases. They align with that we see in the plan. http://gerrit.cloudera.org:8080/#/c/10771/11/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/11/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@56 PS11, Line 56: nit: plus -- To view, visit http://gerrit.cloudera.org:8080/10771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Gerrit-Change-Number: 10771 Gerrit-PatchSet: 10 Gerrit-Owner: Thomas Marshall <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 10 Sep 2018 15:10:45 +0000 Gerrit-HasComments: Yes
