Jason Altekruse created SPARK-30523:
---------------------------------------

             Summary: Collapse back to back aggregations into a single 
aggregate to reduce the number of shuffles
                 Key: SPARK-30523
                 URL: https://issues.apache.org/jira/browse/SPARK-30523
             Project: Spark
          Issue Type: Improvement
          Components: Optimizer
    Affects Versions: 3.0.0
            Reporter: Jason Altekruse


Queries containing nested aggregate operations can in some cases be computable 
with a single phase of aggregation. This Jira seeks to introduce a new 
optimizer rule to identify some of those cases and rewrite plans to avoid 
needlessly re-shuffling and generating the aggregation hash table data twice.

Some examples of collapsible aggregates:
{code:java}
SELECT sum(sumAgg) as a, year from (
      select sum(1) as sumAgg, course, year FROM courseSales GROUP BY course, 
year
) group by year

// can be collapsed to
SELECT sum(1) as `a`, year from courseSales group by year
{code}
{code}
SELECT sum(agg), min(a), b from (
     select sum(1) as agg, a, b FROM testData2 GROUP BY a, b
     ) group by b

// can be collapsed to
SELECT sum(1) as `sum(agg)`, min(a) as `min(a)`, b from testData2 group by b
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to