[ 
https://issues.apache.org/jira/browse/YARN-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986114#comment-15986114
 ] 

Ravi Prakash commented on YARN-6378:
------------------------------------

The occurrence of these negative usedResources is very strongly correlated with 
applications being moved from one queue to another. e.g. on one cluster which 
was started on March 11, usedResources wasn't negative until somebody moved an 
application from one queue to the afflicted queue on April 7th. Since then, the 
queue shows negative usedResources.

This might actually be a race condition. It seems like 
[LeafQueue.detachContainer|https://github.com/apache/hadoop/blob/28eb2aabebd15c15a357d86e23ca407d3c85211c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java#L1890]
 neglects to lock the LeafQueue object. In comparison, the same thing when a 
container is completed is done after acquiring a lock on the LeafQueue object 
in 
[LeafQueue.completedContainer|https://github.com/apache/hadoop/blob/28eb2aabebd15c15a357d86e23ca407d3c85211c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java#L1538]

> Negative usedResources memory in CapacityScheduler
> --------------------------------------------------
>
>                 Key: YARN-6378
>                 URL: https://issues.apache.org/jira/browse/YARN-6378
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>
> Courtesy Thomas Nystrand, we found that on two of our clusters configured 
> with the CapacityScheduler, usedResources occasionally becomes negative. 
> e.g.
> {code}
> 2017-03-15 11:10:09,449 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1487222361993_17177_000001 
> container=Container: [ContainerId: container_1487222361993_17177_01_000014, 
> NodeId: <SOMENODE>:27249, NodeHttpAddress: <SOMENODE>:8042, Resource: 
> <memory:6656, vCores:1>, Priority: 2, Token: null, ] queue=<somequeuename>: 
> capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:-1024, vCores:3>, 
> usedCapacity=0.03409091, absoluteUsedCapacity=0.006818182, numApps=1, 
> numContainers=3 clusterResource=<memory:1249280, vCores:440> type=RACK_LOCAL
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to