[jira] [Created] (SPARK-41953) Shuffle output location refetch during shuffle migration in decommission

Zhongwei Zhu (Jira) Mon, 09 Jan 2023 17:34:06 -0800

Zhongwei Zhu created SPARK-41953:
------------------------------------

             Summary: Shuffle output location refetch during shuffle migration 
in decommission
                 Key: SPARK-41953
                 URL: https://issues.apache.org/jira/browse/SPARK-41953
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.3.1
            Reporter: Zhongwei Zhu



When shuffle migration enabled during spark decommissionm, shuffle data will be 
migrated into live executors, then update latest location to MapOutputTracker. 
It has some issues:
 # Executors only do map output location fetch in the beginning of the reduce 
stage, so any shuffle output location change in the middle of reduce will cause 
FetchFailed as reducer fetch from old location. Even stage retries could solve 
this, this still cause lots of resource waste as all shuffle read and compute 
happened before FetchFailed partition will be wasted.
 # During stage retries, less running tasks cause more executors to be 
decommissioned and shuffle data location keep changing. In the worst case, 
stage could need lots of retries, further breaking SLA.

So I propose to support refetch map output location during reduce phase if 
shuffle migration is enabled and FetchFailed is caused by a decommissioned dead 
executor. The detailed steps as



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41953) Shuffle output location refetch during shuffle migration in decommission

Reply via email to