Jason Lowe commented on YARN-3136:

I have one main concern with the patch.  AbstractYarnScheduler declares the 
applications map but does not actually instantiates the map.  If we remove the 
lock on getTransferredContainers then we are making the assumption that derived 
schedulers will use a concurrent map when instantiating the applications map.  
Therefore I think AbstractYarnScheduler should declare applications as a 
ConcurrentMap rather than just a plain ol' Map to better enforce that 
assumption.  Otherwise theoretically another derived scheduler could use a map 
that's not thread-safe and subtly break this.

> getTransferredContainers can be a bottleneck during AM registration
> -------------------------------------------------------------------
>                 Key: YARN-3136
>                 URL: https://issues.apache.org/jira/browse/YARN-3136
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3136.patch
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.

This message was sent by Atlassian JIRA

Reply via email to