[ https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ram Venkatesh updated YARN-2362: -------------------------------- Summary: Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps (was: Capacity Scheduler: apps with requests that exceed capacity can starve pending apps) > Capacity Scheduler: apps with requests that exceed current capacity can > starve pending apps > ------------------------------------------------------------------------------------------- > > Key: YARN-2362 > URL: https://issues.apache.org/jira/browse/YARN-2362 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.4.1 > Reporter: Ram Venkatesh > > Cluster configuration: > Total memory: 8GB > yarn.scheduler.minimum-allocation-mb 256 > yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) > App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. > It subsequently makes a request for 4.6 GB, which cannot be granted and it > waits. > App 2 makes a request for 1 GB - never receives it, so the app stays in the > ACCEPTED state for ever. > I think this can happen in leaf queues that are near capacity. > The fix is likely in LeafQueue.java assignContainers near line 861, where it > returns if the assignment would exceed queue capacity, instead of checking if > requests for other active applications can be met. > {code:title=LeafQueue.java|borderStyle=solid} > // Check queue max-capacity limit > if (!assignToQueue(clusterResource, required)) { > - return NULL_ASSIGNMENT; > + break; > } > {code} > With this change, the scenario above allows App 2 to start and finish while > App 1 continues to wait. > I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)