maropu edited a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-673753587
> But shuffle is happened during Aggregate here, right? By splitting, the total amount of shuffled data is not changed, but split into several ones. Does it really result significant improvement? As @viirya said above, I think the same. Why can this reduce the amount of shuffle writes (and improve the performance)? In the case of `expand -> partial aggregates`, the aggregates seem to have the same **total** amount of output size. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
