seayoun commented on issue #26975: [SPARK-26975][CORE] Stage retry and executor crash cause app hung up forever URL: https://github.com/apache/spark/pull/26975#issuecomment-568658757 > I'm not sure killing tasks can work. There is no guarantee that a task can always be killed successfully. And even if we can, we may send out the kill request, and immediately get the executor lost event before the task is killed. > > I think we should accept the fact that a running task may be useless as its corresponding partition is completed, and deal with it well. e.g. when seeing executor lost, don't reschedule tasks whose corresponding partitions are already completed. I think it doesn't matter, if driver immediately get the executor lost event before the task is killed, the TSM will `handleFailedTask` and will not scheduler it; Btw, app process the task success or failed status in `handleSuccessfulTask` or `handleFailedTask` if the task finished before killed; In `handleSuccessfulTask`, we mark it as `Killed(another stage succeeded)`, in `handleFailedTask`, we will not reschedule the task.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
