mridulm commented on pull request #35683: URL: https://github.com/apache/spark/pull/35683#issuecomment-1082010575
> If decommissioning is enabled, shuffle blocks are already migrated to another node before the executor exits. However, this is best effort. So shuffle data might be left if shuffle blocks migration is left incomplete or the executor doesn't exit cleanly. Can you please clarify what are the expectations for shuffle data when a node is put in decommissioning state? I am not sure I follow. In k8s, where there is no shuffle service - it makes sense to move shuffle blocks when executor is getting decommissioned. In yarn, particularly with dynamic resource allocation (but even without it), it is very common for executors to exit - while shuffle service continues to serve the shuffle blocks generated on that node, which might have no active executors for application on them. We should not be moving shuffle blocks when an executor is exiting - only when the node is actually getting decommissioned (they might be the same for k8s, not for yarn). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
