[ https://issues.apache.org/jira/browse/YARN-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Naganarasimha G R updated YARN-2487: ------------------------------------ Description: There are some scenarios where AM will not get containers and indefinitely waiting. We faced one such sceanrio which makes the applications to get hung : Consider a cluster setup which has 2 NMS of each 8GB resource, And 2 applications(MR2) are launched in the default queue where in each AM is taking 2 GB each. Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource . As in each NM only 6GB resource is available both the applications are hung forever. To avoid such scenarios i would like to propose generic timeout feature for all AM's in yarn, such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt. Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application was: There are some scenarios where AM will not get containers and indefinetely waiting. We faced one such sceanrio which makes the applications to get hung : Consider a cluster setup which has 2 NMS of each 8GB resource, And 2 applications are launched in the default queue where in each AM is taking 2 GB each. Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource . As in each NM only 6GB resource is available both the applications are hung forever. To avoid such scenarios i would to propose generic timeout feature for all AM's @ the yarn side such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt. Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application > Need to support timeout of AM When no containers are assigned to it for a > defined period > ---------------------------------------------------------------------------------------- > > Key: YARN-2487 > URL: https://issues.apache.org/jira/browse/YARN-2487 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Reporter: Naganarasimha G R > Assignee: Naganarasimha G R > > There are some scenarios where AM will not get containers and indefinitely > waiting. We faced one such sceanrio which makes the applications to get hung > : > Consider a cluster setup which has 2 NMS of each 8GB resource, > And 2 applications(MR2) are launched in the default queue where in each AM is > taking 2 GB each. > Each AM is placed in each of the NM. Now each AM is requesting for container > of 7Gb mem resource . > As in each NM only 6GB resource is available both the applications are hung > forever. > To avoid such scenarios i would like to propose > generic timeout feature for all AM's in yarn, such that if no containers are > assigned for an application for a defined period than yarn can timeout the > application attempt. > Default can be set to 0 where in RM will not timeout the app attempt and user > can set his own timeout when he submits the application -- This message was sent by Atlassian JIRA (v6.3.4#6332)