maropu commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-646973316
> When the cardinality of grouping column is close to the total number of records being processed, the sorting of data spilling to disk is not required, since it is kind of no-op and we can directly use rows in Final aggregation. I do not look into the code yet, but one question I have; does this optimization get benefits only when codegen enabled? When I read the description above, I thought this was more general one though. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
