mridulm edited a comment on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-819120744
I am getting a little confused between PR description and the subsequent discussion. What exactly is the behavior we are trying to converge towards/address ? An expiration of executor from heartbeat master not only sends a `StopExecutor` to voluntarily get executor to exit, but also gets the cluster manager to force termination (in case of MIA/hung executor). So in steady state, once transitionary/overlapping updates are done, the executor should be gone according to driver. My understanding was, there is a race here between cluster manager notifying application (after killing executor) and the executor heartbeat/blockmanager re-registration : which ends up causing a dead executor to be marked live indefinitely. Is this the only case we are addressing ? Or are there any other paths that are impacted ? (@Ngone51 Not sure if standalone has nuances that I am missing here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
