mridulm opened a new pull request, #40307: URL: https://github.com/apache/spark/pull/40307
### What changes were proposed in this pull request? Currently, if there is an executor node loss, we assume the shuffle data on that node is also lost. This is not necessarily the case if there is a shuffle component managing the shuffle data and reliably maintaining it (for example, in distributed filesystem or in a disaggregated shuffle cluster). ### Why are the changes needed? Downstream projects have patches to Apache Spark in order to workaround this issue, for example Apache Celeborn has [this](https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch). ### Does this PR introduce _any_ user-facing change? Enhances the `ShuffleDriverComponents` API, but defaults to current behavior. ### How was this patch tested? Existing unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org