[
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-5540:
-----------------------------
Attachment: YARN-5540.001.patch
The main problem is that a scheduler key is never being removed from the
collection of scheduler keys even when there are no further asks for that key.
There's also separate issue where we can fail to cleanup the underlying hash
map keys underneath a particular scheduler key, but I believe that's more of a
memory issue than a performance issue. The performance issue occurs because
the inner loop for schedulers is to iterate the scheduler keys, so it's
important to remove keys we know are no longer necessary.
When I first started this patch I tried to clean up everything with the
bookkeeping including all the keys from the underlying requests hashmap. This
made for a much larger patch and adds new, interesting NPE possibilities since
requests could disappear in cases that are impossible today. For example the
current code goes out of its way to avoid removing the ANY request for a
scheduler key. As such I decided to focus just on the scheduler key set size
problem which is a more focused patch that should still fix the main problem
behind this JIRA.
Attaching a patch for trunk for review. The main idea is to reference count
the various scheduler keys and remove them once their refcount goes to zero.
We increment the refcount for a key when the corresponding ANY request goes
from zero to non-zero or if there's a container increment request against that
scheduler key when there wasn't one before. Similarly we decrement the
refcount for a key when the corresponding ANY request goes from non-zero to
zero or if there are no container increment requests when there were some
before. When a scheduler key refcount goes from 0 to 1 it is inserted in the
collection of scheduler keys, and when it goes from 1 to 0 it is removed from
the collection. This also has the nice property that deactivation checks
simply become an isEmpty check on the collection of scheduler keys rather than
a loop over that collection.
Once we're agreed on a version for trunk I'll put up the separate patches for
branch-2.8 and branch-2.7 due to changes from YARN-5392 and YARN-1651,
respectively.
> scheduler spends too much time looking at empty priorities
> ----------------------------------------------------------
>
> Key: YARN-5540
> URL: https://issues.apache.org/jira/browse/YARN-5540
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler, fairscheduler, resourcemanager
> Affects Versions: 2.7.2
> Reporter: Nathan Roberts
> Assignee: Jason Lowe
> Attachments: YARN-5540.001.patch
>
>
> We're starting to see the capacity scheduler run out of scheduling horsepower
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many
> more priorities (sometimes in the hundreds) than typical MR applications and
> therefore the loop in the scheduler which examines every priority within
> every running application, starts to be a hotspot. The priorities appear to
> stay around forever, even when there is no remaining resource request at that
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x00007fc2d453e800
> nid=0x22f3 runnable [0x00007fc2a8be2000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
> - eliminated <0x00000005e73e5dc0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
> - locked <0x00000005e73e5dc0> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> - locked <0x00000003006fcf60> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
> - locked <0x00000003001b22f8> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
> - locked <0x00000003001b22f8> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
> - locked <0x0000000300041e40> (a
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]