Jason Lowe commented on YARN-2176:

AppSchedulingInfo is already determining when an app is actively requesting to 
be able to update the QueueMetrics.activeApplications metric.  (It's confusing 
that LeafQueue also has an activeApplications collection which is actually the 
applications running not just the ones requesting.)

It would be nice to leverage the work already being done by AppSchedulingInfo, 
which is currently calling the ActiveUsersManager activateApplication and 
deactivateApplication methods when necessary.  CapacityScheduler could 
potentially have a derived ActiveUsersManager class that in addition notifies 
the LeafQueue so the queue can track apps requesting and apps not requesting 
separately.  To preserve allocation semantics we'd have to track the original 
order of the applications so activating an application inserts it into the list 
of requesting applications in the same relative order to other requesting 
applications regardless of how many times it's been activated or deactivated.

> CapacityScheduler loops over all running applications rather than actively 
> requesting apps
> ------------------------------------------------------------------------------------------
>                 Key: YARN-2176
>                 URL: https://issues.apache.org/jira/browse/YARN-2176
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
> The capacity scheduler performance is primarily dominated by 
> LeafQueue.assignContainers, and that currently loops over all applications 
> that are running in the queue.  It would be more efficient if we looped over 
> just the applications that are actively asking for resources rather than all 
> applications, as there could be thousands of applications running but only a 
> few hundred that are currently asking for resources.

This message was sent by Atlassian JIRA

Reply via email to