[jira] [Commented] (YARN-5540) scheduler spends too much time looking at empty priorities

Jason Lowe (JIRA) Wed, 24 Aug 2016 08:46:21 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435138#comment-15435138
 ]


Jason Lowe commented on YARN-5540:
----------------------------------

Thanks for the review!

bq. you can remove the TODO: Shouldn't we activate even if numContainers = 0 
since you are now taking care of it.

Unless I'm missing something it's still not handling it.  Activation will only 
occur if the ANY request numContainers > 0 because we won't go through that 
TODO commented code if numContainers <= 0.

bq. You do not really need to pass the schedulerKey around since you can 
extract it from the request

True, but that's significantly more expensive since it requires object creation 
and adds to the garbage collection overhead.  As such I thought it was far 
preferable to pass the existing object than create a copy.

bq. shouldn't we probably merge the 2 data structures (the resourceRequestMap 
ConcurrentHashMap and the schedulerKeys TreeSet) with a ConcurrentSkipListMap

No, that will break the delta protocol.  There are cases when we want to remove 
the scheduler key from the collection but *not* remove the map of requests that 
go with that key.  In other words, there are cases where there are no more 
containers to allocate for a scheduler key but the RM should not forget the 
outstanding locality-specific requests that have been sent for that key.  The 
concurrent task limiting feature of MAPREDUCE-5583 is one example that 
leverages this.  The MapReduce job sends the full list of locality requests up 
front then artificially lowers the ANY request count to the concurrent limit.  
As requests are fulfilled it bumps the ANY request back up to the concurrent 
limit _without re-sending the locality-specific requests_.  The RM should still 
remember them because it's a delta protocol, so there's no need to re-send 
them.  If we pulled out the entire request map when there are no more 
containers to allocate for that scheduler key then the RM would forget the 
locality-specific requests when the ANY request is bumped back up and break the 
delta protocol semantics.


> scheduler spends too much time looking at empty priorities
> ----------------------------------------------------------
>
>                 Key: YARN-5540
>                 URL: https://issues.apache.org/jira/browse/YARN-5540
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, fairscheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: Jason Lowe
>         Attachments: YARN-5540.001.patch
>
>
> We're starting to see the capacity scheduler run out of scheduling horsepower 
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many 
> more priorities (sometimes in the hundreds) than typical MR applications and 
> therefore the loop in the scheduler which examines every priority within 
> every running application, starts to be a hotspot. The priorities appear to 
> stay around forever, even when there is no remaining resource request at that 
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x00007fc2d453e800 
> nid=0x22f3 runnable [0x00007fc2a8be2000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
>         - eliminated <0x00000005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
>         - locked <0x00000005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
>         - locked <0x00000003006fcf60> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
>         - locked <0x00000003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
>         - locked <0x00000003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
>         - locked <0x0000000300041e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5540) scheduler spends too much time looking at empty priorities

Reply via email to