[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-28 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-650844833 The current approach looks a bit hacky because you implemented the optimization directly in generated aggregate code. You might be able to use `Aggregation` in `SparkStrategies`

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-27 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-650636794 > No, The Final aggregation will take care giving the right results. This is like more like setting the Aggregation mode to

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-27 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-650552251 > it is more of a manual step and can be used only if the user knows the nature of data upfront.Like in my benchmark, where we expect the the all but few grouping keys to be

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-20 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-646974298 > When the cardinality of grouping column is close to the total number of records being processed Ur, one more question; how do we know that the cardinality is close the

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-20 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-646973316 > When the cardinality of grouping column is close to the total number of records being processed, the sorting of data spilling to disk is not required, since it is kind of

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-18 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-645846710 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-06-17 Thread GitBox
maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-645729114 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to