Wan Kun created SPARK-40096: ------------------------------- Summary: Finalize shuffle merge slow due to connection creation fails Key: SPARK-40096 URL: https://issues.apache.org/jira/browse/SPARK-40096 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Wan Kun
*How to reproduce this issue* * Enable push based shuffle * Remove some merger nodes before sending finalize RPCs * Driver try to connect those merger shuffle services and send finalize RPC one by one, each connection creation will timeout after SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default) We can send these RPCs in *shuffleMergeFinalizeScheduler* thread pool and handle the connection creation exception -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org