Cheng Lian created SPARK-2554:
---------------------------------

             Summary: CountDistinct and SumDistinct should do partial 
aggregation
                 Key: SPARK-2554
                 URL: https://issues.apache.org/jira/browse/SPARK-2554
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.0.1, 1.0.2
            Reporter: Cheng Lian


{{CountDistinct}} and {{SumDistinct}} should first do a partial aggregation and 
return unique value sets in each partition as partial results. Shuffle IO can 
be greatly reduced in in cases that there are only a few unique values.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to