Wan Kun created SPARK-40096:
-------------------------------

             Summary: Finalize shuffle merge slow due to connection creation 
fails
                 Key: SPARK-40096
                 URL: https://issues.apache.org/jira/browse/SPARK-40096
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.3.0
            Reporter: Wan Kun


*How to reproduce this issue*
 * Enable push based shuffle
 * Remove some merger nodes before sending finalize RPCs
 * Driver try to connect those merger shuffle services and send finalize RPC 
one by one, each connection creation will timeout after 
SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default)

 
We can send these RPCs in *shuffleMergeFinalizeScheduler*  thread pool and 
handle the connection creation exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to