[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075925#comment-14075925
 ] 

Sunil G commented on YARN-2362:
-------------------------------

Possible duplicate of YARN-1631

> Capacity Scheduler: apps with requests that exceed current capacity can 
> starve pending apps
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-2362
>                 URL: https://issues.apache.org/jira/browse/YARN-2362
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.4.1
>            Reporter: Ram Venkatesh
>
> Cluster configuration:
> Total memory: 8GB
> yarn.scheduler.minimum-allocation-mb 256
> yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
> App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
> It subsequently makes a request for 4.6 GB, which cannot be granted and it 
> waits.
> App 2 makes a request for 1 GB - never receives it, so the app stays in the 
> ACCEPTED state for ever.
> I think this can happen in leaf queues that are near capacity.
> The fix is likely in LeafQueue.java assignContainers near line 861, where it 
> returns if the assignment would exceed queue capacity, instead of checking if 
> requests for other active applications can be met.
> {code:title=LeafQueue.java|borderStyle=solid}
>            // Check queue max-capacity limit
>            if (!assignToQueue(clusterResource, required)) {
> -            return NULL_ASSIGNMENT;
> +            break;
>            }
> {code}
> With this change, the scenario above allows App 2 to start and finish while 
> App 1 continues to wait.
> I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to