Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10771 )

Change subject: IMPALA-110: Support for multiple DISTINCT
......................................................................


Patch Set 10:

(2 comments)

Thanks for applying the changes!

http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@53
PS10, Line 53: contains the phase 2 aggregate functions and
             :  *     grouping exprs
> No, those will be in aggInfo.secondPhaseDistinctAggInfo.mergeAggInfo
As far as I understood mergeAggInfo is only relevant in distributed plans when 
we need to merge the results of several backends.

But, in phase second (not merge) we still need to merge the non-distinct 
aggregates, because they are grouped too fine grained (GROUP BY exprs + 
DISTINCT exprs, while they only need to be grouped on GROUP BY exprs).

The following query and plan seems to support this:

 set num_nodes = 1;
 explain select count(date_string_col), avg(distinct month) from 
functional.alltypes group by year;

 +------------------------------------------------------------+
 | Explain String                                             |
 +------------------------------------------------------------+
 | Max Per-Host Resource Reservation: Memory=3.88MB Threads=2 |
 | Per-Host Resource Estimates: Memory=74MB                   |
 | Codegen disabled by planner                                |
 |                                                            |
 | PLAN-ROOT SINK                                             |
 | |                                                          |
 | 02:AGGREGATE [FINALIZE]                                    |
 | |  output: avg(month), count:merge(date_string_col)        |
 | |  group by: year                                          |
 | |                                                          |
 | 01:AGGREGATE                                               |
 | |  output: count(date_string_col)                          |
 | |  group by: year, month                                   |
 | |                                                          |
 | 00:SCAN HDFS [functional.alltypes]                         |
 |    partitions=24/24 files=24 size=478.45KB                 |
 +------------------------------------------------------------+


I also attached to the impalad via a debugger too see what's in the aggInfo's 
aggregateExprs_ in the different phases. They align with that we see in the 
plan.


http://gerrit.cloudera.org:8080/#/c/10771/11/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/11/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@56
PS11, Line 56:
nit: plus



--
To view, visit http://gerrit.cloudera.org:8080/10771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Gerrit-Change-Number: 10771
Gerrit-PatchSet: 10
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 10 Sep 2018 15:10:45 +0000
Gerrit-HasComments: Yes

Reply via email to