[ https://issues.apache.org/jira/browse/SPARK-32217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153157#comment-17153157 ]
Apache Spark commented on SPARK-32217: -------------------------------------- User 'agrawaldevesh' has created a pull request for this issue: https://github.com/apache/spark/pull/29032 > Track whether the worker is also being decommissioned along with an executor > ---------------------------------------------------------------------------- > > Key: SPARK-32217 > URL: https://issues.apache.org/jira/browse/SPARK-32217 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: Devesh Agrawal > Priority: Major > > When an executor is decommissioned, we would like to know if its shuffle data > is truly going to be lost. In the case of external shuffle service, this > means knowing that the worker (or the node that the executor is on) is also > going to be lost. > > ( I don't think we need to worry about disaggregated remote shuffle storage > at present since those are only used in a couple of web companies – but when > there is remote shuffle then yes the shuffle won't be lost with a > decommissioned executor ) > > We know for sure that a worker is being decommissioned when the Master is > asked to decommission a worker. In case of other schedulers: > * Yarn support for decommissioning isn't implemented yet. But the idea would > be for Yarn preeemption to not mark that the worker is being lost, but > machine level decommissioning (like for kernel upgrades) to do mark such. > * K8s isn't quite working with external shuffle service as yet, so when the > executor is lost, the worker isn't quite lost with it. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org