carsonwang commented on issue #20303: [SPARK-23128][SQL] A new approach to do 
adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/20303#issuecomment-459666711
 
 
   @justinuang , in that article, only a few queries can benefit from 
optimizing the join type or handling skewed join at runtime. Most of the 
queries only benefit from setting the reducer number which improved about 1-20% 
performance. The percentage also depends on how we set the shuffle partition 
number in non-AE mode and the 
minNumPostShufflePartitions/maxNumPostShufflePartitions in AE . For a small 
data scale, the default shuffle partition number 200 is enough. But for 100 TB 
data scale, we set it to 10976 so all queries can run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to