Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/14162
So we had specifically decided to have this behavior when this was first
written. The reason is that an issue with the spark shuffle services shouldn't
stop other services from running fine on the NM. ie the mapreduce shuffle
services. The node still works fine for MR even if there is bug in spark
shuffle service. This was definitely a concern when we first released this.
That isn't as much of an issue now.
We had talked about this again recently and again decided to leave this
behavior, the reason is that it should fail fast, ie as soon as it registers
the executor would fail and there wouldn't be any wasted work. I guess this
could cause the job to fail if it kept trying to launch on some bad node. Or
is it not really killing the executor?
What is the case you are seeing this issue? I'm ok with changing it if we
have a good reason.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]