mridulm edited a comment on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-819120744


   I am getting a little confused between PR description and the subsequent 
discussion.
   What exactly is the behavior we are trying to converge towards/address ?
   
   An expiration of executor from heartbeat master not only sends a 
`StopExecutor` to voluntarily get executor to exit, but also gets the cluster 
manager to force termination (in case of MIA/hung executor). So in steady 
state, once transitionary/overlapping updates are done, the executor should be 
gone according to driver.
   
   My understanding was, there is a race here between cluster manager notifying 
application (after killing executor) and the executor heartbeat/blockmanager 
re-registration : which ends up causing a dead executor to be marked live 
indefinitely.
   
   Is this the only case we are addressing ? Or are there any other paths that 
are impacted ?
   
   (@Ngone51 Not sure if standalone has nuances that I am missing here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to