cloud-fan edited a comment on issue #26409: [SPARK-29655][SQL] Enable adaptive 
execution should not add more ShuffleExchange
URL: https://github.com/apache/spark/pull/26409#issuecomment-552502824
 
 
   let's think about the expected behavior. This is truly a cost problem but we 
should figure out a simple rule as estimating cost is not realistic in Spark 
for now.
   1. we should not blindly avoid shuffle. very few buckets can lead to poor 
performance as parallelism is low.
   2. we should avoid shuffle if the number of buckets is reasonable.
   
   Image we join a bucketed table with a big table. We can avoid shuffle if the 
number of buckets is reasonable as the number of partitions to shuffle the big 
table. It's hard to define reasonable here and I think it's OK to take the 
value of `spark.sql.shuffle.partitions` as the reasonable number.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to