Cheng Hao created SPARK-4367:
--------------------------------

             Summary: Process the "distinct" value before shuffling for 
aggregation
                 Key: SPARK-4367
                 URL: https://issues.apache.org/jira/browse/SPARK-4367
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Cheng Hao


Most of aggregate function(e.g average) with "distinct" value will requires all 
of the records in the same group to be shuffled into a single node, however, as 
part of the optimization, those records can be partially aggregated before 
shuffling, that probably reduces the overhead of shuffling significantly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to