[ 
https://issues.apache.org/jira/browse/YARN-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968360#comment-15968360
 ] 

Ravi Prakash commented on YARN-6378:
------------------------------------

I downloaded the RM logs (thanks again DP team) on dogfood. The RM for 
firstdata was restarted on 02-16. The first time since then that there are 
negative resources was on 03-01.
{code}
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Re-sorting completed queue: root.etl stats: etl: capacity=0.2, 
absoluteCapacity=0.2, usedResources=<memory:1024, vCores:1>, 
usedCapacity=0.011363636, absoluteUsedCapacity=0.0022727272, numApps=1, 
numContainers=1
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application attempt appattempt_1487222361993_12379_000001 released container 
container_1487222361993_12379_01_000061 on node: host: 
203-35.as1.altiscale.com:26469 #containers=9 available=<memory:58368, 
vCores:35> used=<memory:66560, vCores:9> with event: RELEASED
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Null container completed...
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1487222361993_12379_01_000068 Container Transitioned from RUNNING to 
RELEASED
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Completed container: container_1487222361993_12379_01_000068 in state: 
RELEASED event:RELEASED
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released 
container container_1487222361993_12379_01_000068 of capacity <memory:8192, 
vCores:1> on host 203-03.as1.altiscale.com:27249, which currently has 7 
containers, <memory:53760, vCores:7> used and <memory:71168, vCores:37> 
available, release resources=true
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: etl 
used=<memory:-7168, vCores:0> numContainers=0 user=vijayasarathyparanthaman 
user-resources=<memory:-7168, vCores:0>
2017-03-01 13:35:20,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
completedContainer container=Container: [ContainerId: 
container_1487222361993_12379_01_000068, NodeId: 
203-03.as1.altiscale.com:27249, NodeHttpAddress: 203-03.as1.altiscale.com:8042, 
Resource: <memory:8192, vCores:1>, Priority: 2, Token: Token { kind: 
ContainerToken, service: 10.247.57.232:27249 }, ] queue=etl: capacity=0.2, 
absoluteCapacity=0.2, usedResources=<memory:-7168, vCores:0>, usedCapacity=0.0, 
absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:1249280, 
vCores:440>{code}

At 12:53, usedResources are 0,0 on etl
{code}
2017-03-01 12:53:17,934 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
completedContainer container=Container: [ContainerId: 
container_1487222361993_12294_01_000001, NodeId: 
202-33.as1.altiscale.com:33675, NodeHttpAddress: 202-33.as1.altiscale.com:8042, 
Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: 
ContainerToken, service: 10.247.57.237:33675 }, ] queue=etl: capacity=0.2, 
absoluteCapacity=0.2, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, 
absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:1249280, 
vCores:440>
2017-03-01 12:53:17,934 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Re-sorting completed queue: root.etl stats: etl: capacity=0.2, 
absoluteCapacity=0.2, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, 
absoluteUsedCapacity=0.0, numApps=1, numContainers=0
{code}
Something happens between 12:53 and 13:35. Going to investigate.

> Negative usedResources memory in CapacityScheduler
> --------------------------------------------------
>
>                 Key: YARN-6378
>                 URL: https://issues.apache.org/jira/browse/YARN-6378
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>
> Courtesy Thomas Nystrand, we found that on one of our clusters configured 
> with the CapacityScheduler, usedResources occasionally becomes negative. 
> e.g.
> {code}
> 2017-03-15 11:10:09,449 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1487222361993_17177_000001 
> container=Container: [ContainerId: container_1487222361993_17177_01_000014, 
> NodeId: <SOMENODE>:27249, NodeHttpAddress: <SOMENODE>:8042, Resource: 
> <memory:6656, vCores:1>, Priority: 2, Token: null, ] queue=<somequeuename>: 
> capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:-1024, vCores:3>, 
> usedCapacity=0.03409091, absoluteUsedCapacity=0.006818182, numApps=1, 
> numContainers=3 clusterResource=<memory:1249280, vCores:440> type=RACK_LOCAL
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to