qingwu.fu created YARN-1458:
-------------------------------
Summary: hadoop2.2.0 fairscheduler ResourceManager Event Processor
thread blocked
Key: YARN-1458
URL: https://issues.apache.org/jira/browse/YARN-1458
Project: Hadoop YARN
Issue Type: Bug
Components: scheduler
Affects Versions: 2.2.0
Environment: Centos 2.6.18-238.19.1.el5 X86_64
hadoop2.2.0
Reporter: qingwu.fu
The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when
clients submit lots jobs, it is not easy to reapear. We run the test cluster
for days to reapear it. The output of jstack command on resourcemanager pid:
"ResourceManager Event Processor" prio=10 tid=0x00002aaab0c5f000 nid=0x5dd3
waiting for monitor entry [0x0000000043aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
- waiting to lock <0x000000070026b6e0> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
……
"FairSchedulerUpdateThread" daemon prio=10 tid=0x00002aaab0a2c800 nid=0x5dc8
runnable [0x00000000433a2000]
java.lang.Thread.State: RUNNABLE
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
- locked <0x000000070026b6e0> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
- locked <0x000000070026b6e0> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
at java.lang.Thread.run(Thread.java:744)
--
This message was sent by Atlassian JIRA
(v6.1#6144)