[GitHub] [spark] Karl-WangSK commented on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates

GitBox Thu, 13 Aug 2020 21:01:54 -0700


Karl-WangSK commented on pull request #29360:
URL: https://github.com/apache/spark/pull/29360#issuecomment-673868646



   yes.The shuffle output is the same, because the size of the data is the 
same. 
   As you can see the benchmark:
   cube 7 fields k1, k2, k3, k4, k5, k6, k7(128x projections)  and cube 6 
fields k1, k2, k3, k4, k5, k6(64x projections) with  grouping off
   data size is double ,but the time ,one is 2.4min ,the another one is 8.7min, 
not just double time .It will be affected by data size Especially when the 
memory is limited.
   The original data I created is about 20M, executor memory is 1g. when it 
expands to 64x or  128x. It will have big impact on shuffle performance.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Karl-WangSK commented on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates

Reply via email to