[jira] [Updated] (YARN-1073) NM to recognise when it can't spawn process and stop accepting containers

Steve Loughran (JIRA) Fri, 16 Aug 2013 13:28:51 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated YARN-1073:
---------------------------------

    Summary: NM to recognise when it can't spawn process and stop accepting 
containers  (was: NM to recognise when it can't span process and stop accepting 
containers)
    
> NM to recognise when it can't spawn process and stop accepting containers
> -------------------------------------------------------------------------
>
>                 Key: YARN-1073
>                 URL: https://issues.apache.org/jira/browse/YARN-1073
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.1.0-beta
>         Environment: OS/X with not enough file handles
>            Reporter: Steve Loughran
>            Priority: Minor
>
> when creating too many containers with a claimed resource use of 0 RAM or 
> vCores, the NM got to the state where exec() was continually failing -but 
> nothing seemed to recognise this and blacklist the node.
> Something should be noting that all container launches for an app/container 
> are failing and do something. While AMs can/should code this, NM failure is 
> something at the YARN-level

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1073) NM to recognise when it can't spawn process and stop accepting containers

Reply via email to