carsonwang commented on issue #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/20303#issuecomment-459666711 @justinuang , in that article, only a few queries can benefit from optimizing the join type or handling skewed join at runtime. Most of the queries only benefit from setting the reducer number which improved about 1-20% performance. The percentage also depends on how we set the shuffle partition number in non-AE mode and the minNumPostShufflePartitions/maxNumPostShufflePartitions in AE . For a small data scale, the default shuffle partition number 200 is enough. But for 100 TB data scale, we set it to 10976 so all queries can run.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
