jaltekruse opened a new pull request #27224: [SPARK-30523][SQL] - Collapse 
nested aggregates
URL: https://github.com/apache/spark/pull/27224
   ### What changes were proposed in this pull request?
   Combines two adjacent Aggregate operators into one, if the first one is not 
   If we are referencing the outputs of aggregate functions in the inner 
aggregate from the outer
   one, check if they are being used in outer aggregates in a way that can be 
collapsed into a
   single aggregate. A sum of sums, or a max of max, or min of min are all 
   avg over avg will not be collapsible because different number of raw rows 
will have contributed
   to the partial averages of the inner aggregate.
   Min an Max can be folded in the case described above, or if they are 
   the group by columns from the inner aggregate, as they can safely be 
computed just
   using the set of unique values.
   SELECT sum(sumAgg) as a, year from (
         select sum(1) as sumAgg, course, year FROM courseSales GROUP BY 
course, year
   ) group by year
   // can be collapsed to
   SELECT sum(1) as `a`, year from courseSales group by year
   SELECT sum(agg), min(a), b from (
        select sum(1) as agg, a, b FROM testData2 GROUP BY a, b
        ) group by b
   // can be collapsed to
   SELECT sum(1) as `sum(agg)`, min(a) as `min(a)`, b from testData2 group by b
   ### Why are the changes needed?
   Improve performance of nested aggregation queries.
   ### Does this PR introduce any user-facing change?
   ### How was this patch tested?
   Lot's of tests added with the changeset.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to