Github user maropu commented on the issue:
https://github.com/apache/spark/pull/17164
This pr added an new SQL option `spark.sql.aggregate.preferSortAggregate`
to preferably select `SortAggregate` for easy-to-test in
`DataFrameAggregateSuite.scala`. In some cases (e.g., input data is already
sorted in cache), sort aggregate is faster than hash one (See:
https://issues.apache.org/jira/browse/SPARK-18591). But, you know, the current
spark does not adaptively select sort aggregate in these cases. So, I probably
think this option is some useful to control aggregate strategies by user. What
do u think? cc: @hvanhovell If yes, I'd like to make another pr to add this
option before this pr reviewed.
https://github.com/apache/spark/compare/master...maropu:SPARK-16844-3
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]