Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/14162
> If its the leveldb file not being created, that should be fixed by
aab99d3
That's great, and in my view that also means that any failure in the
startup of the shuffle service should actually be caused by something wrong
with the environment, and not the shuffle service code, so this change
shouldn't harm anybody. :-)
> For instance, lets say we have a bug in the spark shuffle services
To be fair if you have a bug in another part of the shuffle service that is
not in the startup path, it still could take out your whole cluster. That can't
be fixed until the NM runs aux services in separate processes.
> This also should get better once we have the node blacklisting stuff in.
Are you talking about SPARK-8425? If you are, I don't think that changes
anything here, since the executor isn't even coming up, and that blacklisting
is based on tasks failing.
I'm not against adding an option, I just don't really see it as really
necessary. But if you feel strongly that the Spark shuffle service shouldn't
affect NM startup ever, then I can add it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]