mridulm commented on pull request #35683:
URL: https://github.com/apache/spark/pull/35683#issuecomment-1082010575


   > If decommissioning is enabled, shuffle blocks are already migrated to 
another node before the executor exits. However, this is best effort. So 
shuffle data might be left if shuffle blocks migration is left incomplete or 
the executor doesn't exit cleanly. Can you please clarify what are the 
expectations for shuffle data when a node is put in decommissioning state?
   
   
   I am not sure I follow.
   In k8s, where there is no shuffle service - it makes sense to move shuffle 
blocks when executor is getting decommissioned.
   In yarn, particularly with dynamic resource allocation (but even without 
it), it is very common for executors to exit - while shuffle service continues 
to serve the shuffle blocks generated on that node, which might have no active 
executors for application on them.
   
   We should not be moving shuffle blocks when an executor is exiting - only 
when the node is actually getting decommissioned (they might be the same for 
k8s, not for yarn).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to