Jason Lowe commented on YARN-2176:

It is less efficient to lump them all together.  As for whether we need to make 
the optimization, do we know that this overhead is significant?  IIRC we're not 
resorting all applications from scratch each time we allocate but rather only 
moving individual apps in the sort order as they are added/updated.  That's a 
lg(N) operation which is not going to budge a whole lot when N is moving from 
hundreds to thousands and especially if we don't invoke the operation very 

> CapacityScheduler loops over all running applications rather than actively 
> requesting apps
> ------------------------------------------------------------------------------------------
>                 Key: YARN-2176
>                 URL: https://issues.apache.org/jira/browse/YARN-2176
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
> The capacity scheduler performance is primarily dominated by 
> LeafQueue.assignContainers, and that currently loops over all applications 
> that are running in the queue.  It would be more efficient if we looped over 
> just the applications that are actively asking for resources rather than all 
> applications, as there could be thousands of applications running but only a 
> few hundred that are currently asking for resources.

This message was sent by Atlassian JIRA

Reply via email to