Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/1180#issuecomment-49887688
The reason for maxNumExecutorsFailure isn't because yarn can't give it
more, its because something has happened to enough of your executors that you
think there is a problem with the application. One silly example of this is
you start an application and ask for 8G of memory but are only using jdk32.
All executors will fail to launch since asking for more memory then jdk32 can
handle.
There are always possible errors that can occur and executors will just be
restarted. Like disk failures, machines go down, etc. But if you keep having
executors fail for other reasons then there is likely a problem with Spark or
the application code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---