[jira] [Commented] (YARN-2362) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps

2014-07-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075925#comment-14075925
 ] 

Sunil G commented on YARN-2362:
---

Possible duplicate of YARN-1631

 Capacity Scheduler: apps with requests that exceed current capacity can 
 starve pending apps
 ---

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2362) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075970#comment-14075970
 ] 

Wangda Tan commented on YARN-2362:
--

I think we should fix this,
{code}
   if (!assignToQueue(clusterResource, required)) {
-return NULL_ASSIGNMENT;
+break;
   }
{code}
The {{return NULL_ASSIGNMENT}} statement means: if an app submitted earlier 
cannot allocate resource in a queue, the rest of apps in the queue cannot 
allocate resource in a queue too.

The {{break}} looks better to me.

And I agree this should be a duplicate of YARN-1631

 Capacity Scheduler: apps with requests that exceed current capacity can 
 starve pending apps
 ---

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)