GitHub user stanzhai opened a pull request:
https://github.com/apache/spark/pull/19301
[SPARK-22084][SQL] Fix performance regression in aggregation strategy
## What changes were proposed in this pull request?
This PR fix a performance regression in aggregation strategy which
introduced in Spark 2.0.
For the following SQL:
```SQL
SELECT a, SUM(b) AS b0, SUM(b) AS b1
FROM VALUES(1, 1), (2, 2) AS (a, b)
GROUP BY a
```
Before the fix:
```
== Physical Plan ==
*HashAggregate(keys=[a#11], functions=[sum(cast(b#12 as bigint)),
sum(cast(b#12 as bigint))])
+- Exchange hashpartitioning(a#11, 200)
+- *HashAggregate(keys=[a#11], functions=[partial_sum(cast(b#12 as
bigint)), partial_sum(cast(b#12 as bigint))])
+- LocalTableScan [a#11, b#12]
```
After
```
== Physical Plan ==
*HashAggregate(keys=[a#11], functions=[sum(cast(b#12 as bigint))])
+- Exchange hashpartitioning(a#11, 2)
+- *HashAggregate(keys=[a#11], functions=[partial_sum(cast(b#12 as
bigint))])
+- LocalTableScan [a#11, b#12]
```
## How was this patch tested?
WIP
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/stanzhai/spark improve-aggregate
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19301.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19301
----
commit 6f555c20c5c6d2821410aff671758ba73cd8f300
Author: Stan Zhai <[email protected]>
Date: 2017-09-19T09:27:35Z
use hashCode as exprId
commit 5aaae4caa6225ecc6d174afb2eefa8d68af5471a
Author: Stan Zhai <[email protected]>
Date: 2017-09-19T09:53:56Z
typo
commit adce4740c3c41000215f5d7cc0285701d15bb7cf
Author: Stan Zhai <[email protected]>
Date: 2017-09-20T07:12:23Z
Merge branch 'master' of https://github.com/apache/spark into
improve-aggregate
commit bf7d2cf103e2a0caf1538e3df5c174df173cfc56
Author: Stan Zhai <[email protected]>
Date: 2017-09-21T05:19:20Z
Merge branch 'master' of https://github.com/apache/spark into
improve-aggregate
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]