Steve Loughran created YARN-1073:
------------------------------------
Summary: NM to recognise when it can't span process and stop
accepting containers
Key: YARN-1073
URL: https://issues.apache.org/jira/browse/YARN-1073
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Affects Versions: 2.1.0-beta
Environment: OS/X with not enough file handles
Reporter: Steve Loughran
when creating too many containers with a claimed resource use of 0 RAM or
vCores, the NM got to the state where exec() was continually failing -but
nothing seemed to recognise this and blacklist the node.
Something should be noting that all container launches for an app/container are
failing and do something. While AMs can/should code this, NM failure is
something at the YARN-level
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira