[ 
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-5540:
-----------------------------
    Attachment: YARN-5540.001.patch

The main problem is that a scheduler key is never being removed from the 
collection of scheduler keys even when there are no further asks for that key.  
There's also separate issue where we can fail to cleanup the underlying hash 
map keys underneath a particular scheduler key, but I believe that's more of a 
memory issue than a performance issue.  The performance issue occurs because 
the inner loop for schedulers is to iterate the scheduler keys, so it's 
important to remove keys we know are no longer necessary.

When I first started this patch I tried to clean up everything with the 
bookkeeping including all the keys from the underlying requests hashmap.  This 
made for a much larger patch and adds new, interesting NPE possibilities since 
requests could disappear in cases that are impossible today.  For example the 
current code goes out of its way to avoid removing the ANY request for a 
scheduler key.  As such I decided to focus just on the scheduler key set size 
problem which is a more focused patch that should still fix the main problem 
behind this JIRA.

Attaching a patch for trunk for review.  The main idea is to reference count 
the various scheduler keys and remove them once their refcount goes to zero.  
We increment the refcount for a key when the corresponding ANY request goes 
from zero to non-zero or if there's a container increment request against that 
scheduler key when there wasn't one before.  Similarly we decrement the 
refcount for a key when the corresponding ANY request goes from non-zero to 
zero or if there are no container increment requests when there were some 
before.  When a scheduler key refcount goes from 0 to 1 it is inserted in the 
collection of scheduler keys, and when it goes from 1 to 0 it is removed from 
the collection.  This also has the nice property that deactivation checks 
simply become an isEmpty check on the collection of scheduler keys rather than 
a loop over that collection.

Once we're agreed on a version for trunk I'll put up the separate patches for 
branch-2.8 and branch-2.7 due to changes from YARN-5392 and YARN-1651, 
respectively.


> scheduler spends too much time looking at empty priorities
> ----------------------------------------------------------
>
>                 Key: YARN-5540
>                 URL: https://issues.apache.org/jira/browse/YARN-5540
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, fairscheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: Jason Lowe
>         Attachments: YARN-5540.001.patch
>
>
> We're starting to see the capacity scheduler run out of scheduling horsepower 
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many 
> more priorities (sometimes in the hundreds) than typical MR applications and 
> therefore the loop in the scheduler which examines every priority within 
> every running application, starts to be a hotspot. The priorities appear to 
> stay around forever, even when there is no remaining resource request at that 
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x00007fc2d453e800 
> nid=0x22f3 runnable [0x00007fc2a8be2000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
>         - eliminated <0x00000005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
>         - locked <0x00000005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
>         - locked <0x00000003006fcf60> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
>         - locked <0x00000003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
>         - locked <0x00000003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
>         - locked <0x0000000300041e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to