GitHub user stanzhai opened a pull request:

    https://github.com/apache/spark/pull/19301

    [SPARK-22084][SQL] Fix performance regression in aggregation strategy

    ## What changes were proposed in this pull request?
    
    This PR fix a performance regression in aggregation strategy which 
introduced in Spark 2.0.
    
    For the following SQL:
    
    ```SQL
    SELECT a, SUM(b) AS b0, SUM(b) AS b1 
    FROM VALUES(1, 1), (2, 2) AS (a, b) 
    GROUP BY a
    ```
    
    Before the fix:
    
    ```
    == Physical Plan ==
    *HashAggregate(keys=[a#11], functions=[sum(cast(b#12 as bigint)), 
sum(cast(b#12 as bigint))])
    +- Exchange hashpartitioning(a#11, 200)
       +- *HashAggregate(keys=[a#11], functions=[partial_sum(cast(b#12 as 
bigint)), partial_sum(cast(b#12 as bigint))])
          +- LocalTableScan [a#11, b#12]
    ```
    
    After
    
    ```
    == Physical Plan ==
    *HashAggregate(keys=[a#11], functions=[sum(cast(b#12 as bigint))])
    +- Exchange hashpartitioning(a#11, 2)
       +- *HashAggregate(keys=[a#11], functions=[partial_sum(cast(b#12 as 
bigint))])
          +- LocalTableScan [a#11, b#12]
    ```
    
    ## How was this patch tested?
    
    WIP

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/stanzhai/spark improve-aggregate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19301
    
----
commit 6f555c20c5c6d2821410aff671758ba73cd8f300
Author: Stan Zhai <[email protected]>
Date:   2017-09-19T09:27:35Z

    use hashCode as exprId

commit 5aaae4caa6225ecc6d174afb2eefa8d68af5471a
Author: Stan Zhai <[email protected]>
Date:   2017-09-19T09:53:56Z

    typo

commit adce4740c3c41000215f5d7cc0285701d15bb7cf
Author: Stan Zhai <[email protected]>
Date:   2017-09-20T07:12:23Z

    Merge branch 'master' of https://github.com/apache/spark into 
improve-aggregate

commit bf7d2cf103e2a0caf1538e3df5c174df173cfc56
Author: Stan Zhai <[email protected]>
Date:   2017-09-21T05:19:20Z

    Merge branch 'master' of https://github.com/apache/spark into 
improve-aggregate

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to