Karl-WangSK commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-673868646
yes.The shuffle output is the same, because the size of the data is the same. As you can see the benchmark: cube 7 fields k1, k2, k3, k4, k5, k6, k7(128x projections) and cube 6 fields k1, k2, k3, k4, k5, k6(64x projections) with grouping off data size is double ,but the time ,one is 2.4min ,the another one is 8.7min, not just double time .It will be affected by data size Especially when the memory is limited. The original data I created is about 20M, executor memory is 1g. when it expands to 64x or 128x. It will have big impact on shuffle performance. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org