[ 
https://issues.apache.org/jira/browse/SPARK-26907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26907.
----------------------------------
    Resolution: Invalid

> Does ShuffledRDD Replication Work With External Shuffle Service
> ---------------------------------------------------------------
>
>                 Key: SPARK-26907
>                 URL: https://issues.apache.org/jira/browse/SPARK-26907
>             Project: Spark
>          Issue Type: Question
>          Components: Block Manager, YARN
>    Affects Versions: 2.3.2
>            Reporter: Han Altae-Tran
>            Priority: Major
>
> I am interested in working with high replication environments for extreme 
> fault tolerance (e.g. 10x replication), but have noticed that when using 
> groupBy or groupWith followed by persist (with 10x replication), even if one 
> node fails, the entire stage can fail with FetchFailedException.
>  
> Is this because the External Shuffle Service writes and services intermediate 
> shuffle data only to/from the local disk attached to the executor that 
> generated it, causing spark to ignore possible replicated shuffle data (from 
> the persist) that may be serviced elsewhere? If so, is there any way to 
> increase the replication factor of the External Shuffle Service to make it 
> fault tolerant?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to