holdenk opened a new pull request, #37821:
URL: https://github.com/apache/spark/pull/37821

   ### What changes were proposed in this pull request?
   
   
   Propagate decommission executor loss reason in K8s during onDisconnect
   
   ### Why are the changes needed?
   Currently if an executor has been sent a decommission message and then it 
disconnects from the scheduler we only disable the executor depending on the 
K8s status events to drive the rest of the state transitions. However, the K8s 
status events can become overwhelmed on large clusters so we should check if an 
executor is in a decommissioning state when it is disconnected and use that 
reason instead of waiting on the K8s status events so we have more accurate 
logging information.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Logging output will change.
   
   ### How was this patch tested?
   Existing unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to