Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/8887#issuecomment-148865197
Sorry, you're right, in your cxode the dag scheduler is updated right away.
Only the task set manager is not.
The delay is hopefully not that long. As part of this change I'm waking up
the monitor thread in the YARN AM and that should get the reason for recent
exits right away... but "right away" still depends on the RM having been
updated by the NMs with that info, and responding to the AM, and the AM
notifying the driver about it. If the reason doesn't arrive in the first try,
it starts taking longer since the AM tries not to overload the RM with requests.
Also, another thing is that this can become a building block for doing more
intelligent things with #7786; in that case, we're proactively removing
executors from the pool, so the executor's process hasn't really gone away yet.
But anyway, if you don't feel like that's important I can apply your
suggestion and go with the simpler approach. I just didn't feel my original
approach was that complicated to start with.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]