Zhongwei Zhu created SPARK-41953:
------------------------------------

             Summary: Shuffle output location refetch during shuffle migration 
in decommission
                 Key: SPARK-41953
                 URL: https://issues.apache.org/jira/browse/SPARK-41953
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.3.1
            Reporter: Zhongwei Zhu


When shuffle migration enabled during spark decommissionm, shuffle data will be 
migrated into live executors, then update latest location to MapOutputTracker. 
It has some issues:
 # Executors only do map output location fetch in the beginning of the reduce 
stage, so any shuffle output location change in the middle of reduce will cause 
FetchFailed as reducer fetch from old location. Even stage retries could solve 
this, this still cause lots of resource waste as all shuffle read and compute 
happened before FetchFailed partition will be wasted.
 # During stage retries, less running tasks cause more executors to be 
decommissioned and shuffle data location keep changing. In the worst case, 
stage could need lots of retries, further breaking SLA.

So I propose to support refetch map output location during reduce phase if 
shuffle migration is enabled and FetchFailed is caused by a decommissioned dead 
executor. The detailed steps as



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to