[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

GitBox Sat, 20 Jun 2020 03:21:50 -0700


maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-646973316



   > When the cardinality of grouping column is close to the total number of 
records being processed, the sorting of data spilling to disk is not required, 
since it is kind of no-op and we can directly use rows in Final aggregation.
   
   I do not look into the code yet, but one question I have; does this 
optimization get benefits only when codegen enabled? When I read the 
description above, I thought this was more general one though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

Reply via email to