[
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504772#comment-15504772
]
Hudson commented on YARN-5540:
------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10461 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/10461/])
YARN-5540. Scheduler spends too much time looking at empty priorities. (jlowe:
rev 7558dbbb481eab055e794beb3603bbe5671a4b4c)
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAppSchedulingInfo.java
> scheduler spends too much time looking at empty priorities
> ----------------------------------------------------------
>
> Key: YARN-5540
> URL: https://issues.apache.org/jira/browse/YARN-5540
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler, fairscheduler, resourcemanager
> Affects Versions: 2.7.2
> Reporter: Nathan Roberts
> Assignee: Jason Lowe
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: YARN-5540-branch-2.7.004.patch,
> YARN-5540-branch-2.8.004.patch, YARN-5540-branch-2.8.004.patch,
> YARN-5540.001.patch, YARN-5540.002.patch, YARN-5540.003.patch,
> YARN-5540.004.patch
>
>
> We're starting to see the capacity scheduler run out of scheduling horsepower
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many
> more priorities (sometimes in the hundreds) than typical MR applications and
> therefore the loop in the scheduler which examines every priority within
> every running application, starts to be a hotspot. The priorities appear to
> stay around forever, even when there is no remaining resource request at that
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x00007fc2d453e800
> nid=0x22f3 runnable [0x00007fc2a8be2000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
> - eliminated <0x00000005e73e5dc0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
> - locked <0x00000005e73e5dc0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> - locked <0x00000003006fcf60> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
> - locked <0x00000003001b22f8> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
> - locked <0x00000003001b22f8> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
> - locked <0x0000000300041e40> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]