[GitHub] [spark] thomasg19930417 commented on pull request #34542: [SPARK-37267][SQL] OptimizeSkewInRebalancePartitions support optimize non-root node

via GitHub Thu, 23 Mar 2023 05:21:07 -0700


thomasg19930417 commented on PR #34542:
URL: https://github.com/apache/spark/pull/34542#issuecomment-1481097597


   When using this optimization to insert hive dynamic partitions, it is found 
that when the spark.sql.shuffle.partitions setting is very large, some result 
files are particularly large and this part of the task will take time and 
length. These files are not in accordance with the self-adaptive Target size to 
split,But there is no problem when spark.sql.shuffle.partitions is set very 
small
   config:
   spark.sql.adaptive.enabled         true
   spark.sql.adaptive.coalescePartitions.enabled true
   spark.sql.adaptive.skewedJoin.enabled true
   spark.sql.optimizer.dynamicPartitionPruning.enabled   false
   spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
   spark.sql.adaptive.coalescePartitions.minPartitionSize  20M
   celeborn.shuffle.rangeReadFilter.enabled true
   celeborn.shuffle.partitionSplit.mode hard
   
   
   
![image](https://user-images.githubusercontent.com/20243868/227201276-3d9be34e-4e05-4c5b-869d-e35018e73174.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] thomasg19930417 commented on pull request #34542: [SPARK-37267][SQL] OptimizeSkewInRebalancePartitions support optimize non-root node

Reply via email to