[jira] [Updated] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps

Ram Venkatesh (JIRA) Sat, 26 Jul 2014 09:46:53 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ram Venkatesh updated YARN-2362:
--------------------------------

    Description: 
Cluster configuration:
Total memory: 8GB
yarn.scheduler.minimum-allocation-mb 256
yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)

App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
It subsequently makes a request for 4.6 GB, which cannot be granted and it 
waits.

App 2 makes a request for 1 GB - never receives it, so the app stays in the 
ACCEPTED state for ever.

I think this can happen in leaf queues that are near capacity.

The fix is likely in LeafQueue.java assignContainers near line 861, where it 
returns if the assignment would exceed queue capacity, instead of checking if 
requests for other active applications can be met.

{code:title=LeafQueue.java|borderStyle=solid}
           // Check queue max-capacity limit
           if (!assignToQueue(clusterResource, required)) {
-            return NULL_ASSIGNMENT;
+            break;
           }
{code}

With this change, the scenario above allows App 2 to start and finish while App 
1 continues to wait.

I have a patch available, but wondering if the current behavior is by design.

  was:
Cluster configuration:
Total memory: 8GB
yarn.scheduler.minimum-allocation-mb 256
yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)

App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
It subsequently makes a request for 4.6 GB, which cannot be granted and it 
waits.

App 2 makes a request for 1 GB - never receives it, so the app stays in the 
ACCEPTED state for ever.

I think this can happen in leaf queues that are near capacity.

The fix is likely in LeafQueue.java assignContainers near line 861, where it 
returns if the assignment would exceed queue capacity, instead of checking if 
requests for other active applications can be met.

           // Check queue max-capacity limit
           if (!assignToQueue(clusterResource, required)) {
-            return NULL_ASSIGNMENT;
+            break;
           }

With this change, the scenario above allows App 2 to start and finish while App 
1 continues to wait.

I have a patch available, but wondering if the current behavior is by design.


> Capacity Scheduler apps with requests that exceed capacity can starve pending 
> apps
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-2362
>                 URL: https://issues.apache.org/jira/browse/YARN-2362
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.4.1
>            Reporter: Ram Venkatesh
>
> Cluster configuration:
> Total memory: 8GB
> yarn.scheduler.minimum-allocation-mb 256
> yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
> App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
> It subsequently makes a request for 4.6 GB, which cannot be granted and it 
> waits.
> App 2 makes a request for 1 GB - never receives it, so the app stays in the 
> ACCEPTED state for ever.
> I think this can happen in leaf queues that are near capacity.
> The fix is likely in LeafQueue.java assignContainers near line 861, where it 
> returns if the assignment would exceed queue capacity, instead of checking if 
> requests for other active applications can be met.
> {code:title=LeafQueue.java|borderStyle=solid}
>            // Check queue max-capacity limit
>            if (!assignToQueue(clusterResource, required)) {
> -            return NULL_ASSIGNMENT;
> +            break;
>            }
> {code}
> With this change, the scenario above allows App 2 to start and finish while 
> App 1 continues to wait.
> I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps

Reply via email to