[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44635:
---
Labels: pull-request-available  (was: )

> Handle shuffle fetch failures in decommissions
> --
>
> Key: SPARK-44635
> URL: https://issues.apache.org/jira/browse/SPARK-44635
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Spark's decommission feature supports migration of shuffle data. However 
> shuffle data fetcher will only look at the location (`BlockManagerId`) when 
> it is initialized. This can lead to shuffle fetch failures when the shuffle 
> read tasks are long.
>  
> To mitigate this, shuffle data fetchers should be able to look for the 
> updated locations after decommissions, and fetch from there instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions

2023-08-02 Thread Bo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Zhang updated SPARK-44635:
-
Description: 
Spark's decommission feature supports migration of shuffle data. However 
shuffle data fetcher will only look at the location (`BlockManagerId`) when it 
is initialized. This can lead to shuffle fetch failures when the shuffle read 
tasks are long.

 

To mitigate this, shuffle data fetchers should be able to look for the updated 
locations after decommissions, and fetch from there instead.

  was:Spark's decommission feature supports migration of shuffle data. However 
shuffle data fetcher will only look at the location (`BlockManagerId`) when it 
is initialized. This can lead to shuffle fetch failures when the shuffle read 
tasks are long.


> Handle shuffle fetch failures in decommissions
> --
>
> Key: SPARK-44635
> URL: https://issues.apache.org/jira/browse/SPARK-44635
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Priority: Major
>
> Spark's decommission feature supports migration of shuffle data. However 
> shuffle data fetcher will only look at the location (`BlockManagerId`) when 
> it is initialized. This can lead to shuffle fetch failures when the shuffle 
> read tasks are long.
>  
> To mitigate this, shuffle data fetchers should be able to look for the 
> updated locations after decommissions, and fetch from there instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org