[
https://issues.apache.org/jira/browse/YARN-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ravi Prakash updated YARN-6378:
-------------------------------
Comment: was deleted
(was: I downloaded the RM logs (thanks again DP team) on dogfood. The RM for
firstdata was restarted on 02-16. The first time since then that there are
negative resources was on 03-01.
{code}
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting completed queue: root.etl stats: etl: capacity=0.2,
absoluteCapacity=0.2, usedResources=<memory:1024, vCores:1>,
usedCapacity=0.011363636, absoluteUsedCapacity=0.0022727272, numApps=1,
numContainers=1
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application attempt appattempt_1487222361993_12379_000001 released container
container_1487222361993_12379_01_000061 on node: host:
203-35.as1.altiscale.com:26469 #containers=9 available=<memory:58368,
vCores:35> used=<memory:66560, vCores:9> with event: RELEASED
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Null container completed...
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1487222361993_12379_01_000068 Container Transitioned from RUNNING to
RELEASED
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Completed container: container_1487222361993_12379_01_000068 in state:
RELEASED event:RELEASED
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released
container container_1487222361993_12379_01_000068 of capacity <memory:8192,
vCores:1> on host 203-03.as1.altiscale.com:27249, which currently has 7
containers, <memory:53760, vCores:7> used and <memory:71168, vCores:37>
available, release resources=true
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: etl
used=<memory:-7168, vCores:0> numContainers=0 user=vijayasarathyparanthaman
user-resources=<memory:-7168, vCores:0>
2017-03-01 13:35:20,813 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId:
container_1487222361993_12379_01_000068, NodeId:
203-03.as1.altiscale.com:27249, NodeHttpAddress: 203-03.as1.altiscale.com:8042,
Resource: <memory:8192, vCores:1>, Priority: 2, Token: Token { kind:
ContainerToken, service: 10.247.57.232:27249 }, ] queue=etl: capacity=0.2,
absoluteCapacity=0.2, usedResources=<memory:-7168, vCores:0>, usedCapacity=0.0,
absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:1249280,
vCores:440>{code}
At 12:53, usedResources are 0,0 on etl
{code}
2017-03-01 12:53:17,934 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId:
container_1487222361993_12294_01_000001, NodeId:
202-33.as1.altiscale.com:33675, NodeHttpAddress: 202-33.as1.altiscale.com:8042,
Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind:
ContainerToken, service: 10.247.57.237:33675 }, ] queue=etl: capacity=0.2,
absoluteCapacity=0.2, usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:1249280,
vCores:440>
2017-03-01 12:53:17,934 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting completed queue: root.etl stats: etl: capacity=0.2,
absoluteCapacity=0.2, usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
absoluteUsedCapacity=0.0, numApps=1, numContainers=0
{code}
Something happens between 12:53 and 13:35. Going to investigate.)
> Negative usedResources memory in CapacityScheduler
> --------------------------------------------------
>
> Key: YARN-6378
> URL: https://issues.apache.org/jira/browse/YARN-6378
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, resourcemanager
> Affects Versions: 2.7.2
> Reporter: Ravi Prakash
> Assignee: Ravi Prakash
>
> Courtesy Thomas Nystrand, we found that on one of our clusters configured
> with the CapacityScheduler, usedResources occasionally becomes negative.
> e.g.
> {code}
> 2017-03-15 11:10:09,449 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> assignedContainer application attempt=appattempt_1487222361993_17177_000001
> container=Container: [ContainerId: container_1487222361993_17177_01_000014,
> NodeId: <SOMENODE>:27249, NodeHttpAddress: <SOMENODE>:8042, Resource:
> <memory:6656, vCores:1>, Priority: 2, Token: null, ] queue=<somequeuename>:
> capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:-1024, vCores:3>,
> usedCapacity=0.03409091, absoluteUsedCapacity=0.006818182, numApps=1,
> numContainers=3 clusterResource=<memory:1249280, vCores:440> type=RACK_LOCAL
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]