Zhongwei Zhu created SPARK-41953: ------------------------------------ Summary: Shuffle output location refetch during shuffle migration in decommission Key: SPARK-41953 URL: https://issues.apache.org/jira/browse/SPARK-41953 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.1 Reporter: Zhongwei Zhu
When shuffle migration enabled during spark decommissionm, shuffle data will be migrated into live executors, then update latest location to MapOutputTracker. It has some issues: # Executors only do map output location fetch in the beginning of the reduce stage, so any shuffle output location change in the middle of reduce will cause FetchFailed as reducer fetch from old location. Even stage retries could solve this, this still cause lots of resource waste as all shuffle read and compute happened before FetchFailed partition will be wasted. # During stage retries, less running tasks cause more executors to be decommissioned and shuffle data location keep changing. In the worst case, stage could need lots of retries, further breaking SLA. So I propose to support refetch map output location during reduce phase if shuffle migration is enabled and FetchFailed is caused by a decommissioned dead executor. The detailed steps as -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org