[GitHub] [spark] mridulm opened a new pull request, #40307: Draft: SPARK-42689: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

via GitHub Mon, 06 Mar 2023 11:31:26 -0800


mridulm opened a new pull request, #40307:
URL: https://github.com/apache/spark/pull/40307


   ### What changes were proposed in this pull request?
   
   Currently, if there is an executor node loss, we assume the shuffle data on 
that node is also lost. This is not necessarily the case if there is a shuffle 
component managing the shuffle data and reliably maintaining it (for example, 
in distributed filesystem or in a disaggregated shuffle cluster).
   
   ### Why are the changes needed?
   
   Downstream projects have patches to Apache Spark in order to workaround this 
issue, for example Apache Celeborn has 
[this](https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch).
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Enhances the `ShuffleDriverComponents` API, but defaults to current behavior.
   
   
   ### How was this patch tested?
   
   Existing unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm opened a new pull request, #40307: Draft: SPARK-42689: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

Reply via email to