[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075476#comment-14075476
 ] 

Chen He commented on YARN-2362:
-------------------------------

This is interesting. In general, user may not submit an application that asks 
for 50% of the whole cluster resources. It is possible that a cluster has more 
than 2 applications. If third application finishes, App2 can get enough 
resource and run. Then, "deadlock" breaks. Is this reasonable, [~venkateshrin] ?

> Capacity Scheduler apps with requests that exceed capacity can starve pending 
> apps
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-2362
>                 URL: https://issues.apache.org/jira/browse/YARN-2362
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.4.1
>            Reporter: Ram Venkatesh
>
> Cluster configuration:
> Total memory: 8GB
> yarn.scheduler.minimum-allocation-mb 256
> yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
> App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
> It subsequently makes a request for 4.6 GB, which cannot be granted and it 
> waits.
> App 2 makes a request for 1 GB - never receives it, so the app stays in the 
> ACCEPTED state for ever.
> I think this can happen in leaf queues that are near capacity.
> The fix is likely in LeafQueue.java assignContainers near line 861, where it 
> returns if the assignment would exceed queue capacity, instead of checking if 
> requests for other active applications can be met.
> {code:title=LeafQueue.java|borderStyle=solid}
>            // Check queue max-capacity limit
>            if (!assignToQueue(clusterResource, required)) {
> -            return NULL_ASSIGNMENT;
> +            break;
>            }
> {code}
> With this change, the scenario above allows App 2 to start and finish while 
> App 1 continues to wait.
> I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to