[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616708#comment-13616708
 ] 

Zhijie Shen commented on YARN-276:
----------------------------------

IMO, the essential problem is that maxActiveApplications is a loose bound. See 
the formular bellow.

1. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * 
maxActiveApplications.

maxActiveApplications is computed by assuming each application only requires 
minAllocation. In fact, AM container may require more. Therefore,

2. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * 
maxActiveApplications = (minAllocation_1 + minAllocation_2 + ... + 
minAllocation_k) <= (requestedResource_1 + requestedResource_2 + ... + 
minAllocation_k), where k = maxActiveApplications.

Hence when maxActiveApplications applications are activated and they require 
more than minAllocation resource, such that more than 
maximumApplicationMasterResourcePercent of clusterResource may be used by AMs, 
and even clusterResource is likely to be exceeded.

@nemon's solution looks good, which is actually a more restrict bound of the 
max allowed active applications. Whenever an application is to be activated, 
the following criteria is checked.

3. clusterResource * maximumApplicationMasterResourcePercent - 
ApplicationMasterResource >= requestedResource.

The issue here is that when this criteria is met, maxActiveApplications should 
be met as well, because this one is more restricted. So instead of add the new 
criteria, how about replacing maxActiveApplications with it?
                
> Capacity Scheduler can hang when submit many jobs concurrently
> --------------------------------------------------------------
>
>                 Key: YARN-276
>                 URL: https://issues.apache.org/jira/browse/YARN-276
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: nemon lou
>         Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
> YARN-276.patch, YARN-276.patch, YARN-276.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
> scheduler can hang with most resources taken up by AM and don't have enough 
> resources for tasks.And then all applications hang there.
> The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not 
> check directly.Instead ,this property only used for maxActiveApplications. 
> And maxActiveApplications is computed by minimumAllocation (not by Am 
> actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to