[ 
https://issues.apache.org/jira/browse/YARN-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110215#comment-15110215
 ] 

Naganarasimha G R commented on YARN-4618:
-----------------------------------------

Good catch [~bibinchundatt] !
I think {{org.apache.hadoop.yarn.api.records.Resource}} should have used *long* 
for memory atleast.
Though in the normal scenario we might not get to see such high #containers but 
definitely in the future we can see each container asking more  Memory (like 
100GB or more) then it will get easily reproduced. 

> RM Stops allocating containers if large number of pending containers
> --------------------------------------------------------------------
>
>                 Key: YARN-4618
>                 URL: https://issues.apache.org/jira/browse/YARN-4618
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>
> In  one of the test found that when RM is having so many pending container 
> request to be served RM Stops assigning containers.
> Root total = 6 lakhs containers = 
> Queue 1 = 3 lakh containers = 1328800000 MB
> Queue 2 = 3+ lakh containers = 1428800000 MB
> Each container request is with 4GB. 
> {{ParentQueue#assignContainers}} is as below
> {noformat}
>     // Check if this queue need more resource, simply skip allocation if this
>     // queue doesn't need more resources.
>     if (!super.hasPendingResourceRequest(node.getPartition(),
>         clusterResource, schedulingMode)) {
>       if (LOG.isDebugEnabled()) {
>         LOG.debug("Skip this queue=" + getQueuePath()
>             + ", because it doesn't need more resource, schedulingMode="
>             + schedulingMode.name() + " node-partition=" + 
> node.getPartition());
>       }
>       return CSAssignment.NULL_ASSIGNMENT;
>     }
> {noformat}
> When the pending resource > MAX VALUE and become *negative*  {{- 167XXXXXXX 
> MB}} and always NULL_ASSIGNMENT is return.
> Tools used to test SLS.
> For checking pendingResource request we should first check any pending 
> containers (from getMetrics()) are there to be served. If pending containers 
> are available then return true else consider other check for increase request.
> Thoughts ??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to