mridulm opened a new pull request, #40307:
URL: https://github.com/apache/spark/pull/40307

   ### What changes were proposed in this pull request?
   
   Currently, if there is an executor node loss, we assume the shuffle data on 
that node is also lost. This is not necessarily the case if there is a shuffle 
component managing the shuffle data and reliably maintaining it (for example, 
in distributed filesystem or in a disaggregated shuffle cluster).
   
   ### Why are the changes needed?
   
   Downstream projects have patches to Apache Spark in order to workaround this 
issue, for example Apache Celeborn has 
[this](https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch).
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Enhances the `ShuffleDriverComponents` API, but defaults to current behavior.
   
   
   ### How was this patch tested?
   
   Existing unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to