jaltekruse opened a new pull request #27224: [SPARK-30523][SQL] - Collapse nested aggregates URL: https://github.com/apache/spark/pull/27224 ### What changes were proposed in this pull request? Combines two adjacent Aggregate operators into one, if the first one is not necessary. If we are referencing the outputs of aggregate functions in the inner aggregate from the outer one, check if they are being used in outer aggregates in a way that can be collapsed into a single aggregate. A sum of sums, or a max of max, or min of min are all collapsible. avg over avg will not be collapsible because different number of raw rows will have contributed to the partial averages of the inner aggregate. Min an Max can be folded in the case described above, or if they are referencing the group by columns from the inner aggregate, as they can safely be computed just using the set of unique values. ``` SELECT sum(sumAgg) as a, year from ( select sum(1) as sumAgg, course, year FROM courseSales GROUP BY course, year ) group by year // can be collapsed to SELECT sum(1) as `a`, year from courseSales group by year ``` ``` SELECT sum(agg), min(a), b from ( select sum(1) as agg, a, b FROM testData2 GROUP BY a, b ) group by b // can be collapsed to SELECT sum(1) as `sum(agg)`, min(a) as `min(a)`, b from testData2 group by b ``` ### Why are the changes needed? Improve performance of nested aggregation queries. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Lot's of tests added with the changeset.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org