[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147934#comment-14147934 ]
Karthik Kambatla commented on YARN-2604: ---------------------------------------- Actually, I think the JIRA here is slightly different from the one reported on YARN-56. IIUC, YARN-56 wants to tackle the case where resources on a node are large enough to accommodate the request, but these resources are (partially) taken by other applications and are "currently unavailable". Using a timeout, as suggested, seems like a reasonable approach. This JIRA was meant to handle the case where there is no node (even if it were to be free) that can accommodate the request. This case can be partially fixed through better configuration - set max-allocation-mb to a value less than or equal to the most memory available on a node. However, if that largest node fails, the config will be outdated. We could either handle this separately or just fallback on YARN-56. Thoughts? > Scheduler should consider max-allocation-* in conjunction with the largest > node > ------------------------------------------------------------------------------- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler > Affects Versions: 2.5.1 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)