[ 
https://issues.apache.org/jira/browse/YARN-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111761#comment-15111761
 ] 

Wangda Tan commented on YARN-4618:
----------------------------------

Nice catching [~bibinchundatt],

I'm not entirely sure if changing int to long is the only solution.

[~vvasudev] has a JIRA YARN-3926 to add resource type and unit to the 
"Resource" object. instead of using "MB", it can use "GB" for such a big 
cluster.

> RM Stops allocating containers if large number of pending containers
> --------------------------------------------------------------------
>
>                 Key: YARN-4618
>                 URL: https://issues.apache.org/jira/browse/YARN-4618
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>
> In  one of the test found that when RM is having so many pending container 
> request to be served RM Stops assigning containers.
> Cluster simulated is with 100 TB 
> Root total = 600k containers = 
> Queue 1 = 300k containers = 1328800000 MB
> Queue 2 = 300k containers = 1428800000 MB
> Each container request is with 4GB. 
> {{ParentQueue#assignContainers}} is as below
> {noformat}
>     // Check if this queue need more resource, simply skip allocation if this
>     // queue doesn't need more resources.
>     if (!super.hasPendingResourceRequest(node.getPartition(),
>         clusterResource, schedulingMode)) {
>       if (LOG.isDebugEnabled()) {
>         LOG.debug("Skip this queue=" + getQueuePath()
>             + ", because it doesn't need more resource, schedulingMode="
>             + schedulingMode.name() + " node-partition=" + 
> node.getPartition());
>       }
>       return CSAssignment.NULL_ASSIGNMENT;
>     }
> {noformat}
> When the pending resource > MAX VALUE and become *negative*  {{- 167XXXXXXX 
> MB}} and always NULL_ASSIGNMENT is return.
> Tools used to test SLS.
> For checking pendingResource request we should first check any pending 
> containers (from getMetrics()) are there to be served. If pending containers 
> are available then return true else consider other check for increase request.
> Thoughts ??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to