[
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837050#comment-13837050
]
qingwu.fu commented on YARN-1458:
---------------------------------
Thanks Sandy. We were confused by your point that " If it returns 0 we should
just set the fair shares of all the considered schedulables to 0.". In our
understanding, you suggested to set all app's weight to 0 when one app's weight
is 0. So we proposed the idea above.
But now we agree with the point that "If size based weight is turned on and an
app has 0 demand, I think giving it 0 fair share is the correct thing to do.".
It's more precise to the principle of FairShare.
> In Fair Scheduler, size based weight can cause update thread to hold lock
> indefinitely
> --------------------------------------------------------------------------------------
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
> Reporter: qingwu.fu
> Labels: patch
> Original Estimate: 408h
> Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when
> clients submit lots jobs, it is not easy to reapear. We run the test cluster
> for days to reapear it. The output of jstack command on resourcemanager pid:
> {code}
> "ResourceManager Event Processor" prio=10 tid=0x00002aaab0c5f000 nid=0x5dd3
> waiting for monitor entry [0x0000000043aa9000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x000000070026b6e0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x00002aaab0a2c800 nid=0x5dc8
> runnable [0x00000000433a2000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x000000070026b6e0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x000000070026b6e0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)