maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-650844833
The current approach looks a bit hacky because you implemented the
optimization directly in generated aggregate code. You might be able to use
`Aggregation` in `SparkStrategies`
maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-650636794
> No, The Final aggregation will take care giving the right results.
This is like more like setting the Aggregation mode to
maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-650552251
> it is more of a manual step and can be used only if the user knows the
nature of data upfront.Like in my benchmark, where we expect the the all but
few grouping keys to be
maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-646974298
> When the cardinality of grouping column is close to the total number of
records being processed
Ur, one more question; how do we know that the cardinality is close the
maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-646973316
> When the cardinality of grouping column is close to the total number of
records being processed, the sorting of data spilling to disk is not required,
since it is kind of
maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-645846710
add to whitelist
This is an automated message from the Apache Git Service.
To respond to the message, please log
maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-645729114
ok to test
This is an automated message from the Apache Git Service.
To respond to the message, please log on to