[
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075925#comment-14075925
]
Advertising
Sunil G commented on YARN-2362:
-------------------------------
Possible duplicate of YARN-1631
> Capacity Scheduler: apps with requests that exceed current capacity can
> starve pending apps
> -------------------------------------------------------------------------------------------
>
> Key: YARN-2362
> URL: https://issues.apache.org/jira/browse/YARN-2362
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 2.4.1
> Reporter: Ram Venkatesh
>
> Cluster configuration:
> Total memory: 8GB
> yarn.scheduler.minimum-allocation-mb 256
> yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
> App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state.
> It subsequently makes a request for 4.6 GB, which cannot be granted and it
> waits.
> App 2 makes a request for 1 GB - never receives it, so the app stays in the
> ACCEPTED state for ever.
> I think this can happen in leaf queues that are near capacity.
> The fix is likely in LeafQueue.java assignContainers near line 861, where it
> returns if the assignment would exceed queue capacity, instead of checking if
> requests for other active applications can be met.
> {code:title=LeafQueue.java|borderStyle=solid}
> // Check queue max-capacity limit
> if (!assignToQueue(clusterResource, required)) {
> - return NULL_ASSIGNMENT;
> + break;
> }
> {code}
> With this change, the scenario above allows App 2 to start and finish while
> App 1 continues to wait.
> I have a patch available, but wondering if the current behavior is by design.
--
This message was sent by Atlassian JIRA
(v6.2#6252)