[GitHub] [spark] maropu edited a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis

GitBox Thu, 13 Aug 2020 16:11:54 -0700


maropu edited a comment on pull request #29360:
URL: https://github.com/apache/spark/pull/29360#issuecomment-673753587



   > But shuffle is happened during Aggregate here, right? By splitting, the 
total amount of shuffled data is not changed, but split into several ones. Does 
it really result significant improvement?
   
   As @viirya said above, I think the same. Why can this reduce the amount of 
shuffle writes (and improve the performance)? In the case of `expand -> partial 
aggregates`, the aggregates seem to have the same **total** amount of output 
size.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu edited a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis

Reply via email to