Carlo Curino commented on YARN-4198:

[~kshukla], I am happy to collaborate, but we have a patch in the works with 
[~atumanov] and [~chris.douglas]... We tested at scale... seems to work well... 
we now want to double check it carefully, clean it up and submitted for review. 
However, as this is a very delicate piece, it would be great if you help us go 
over it and analyze it carefully. It is also likely that we missed some further 
opportunities of improvement.  

The general observation is that we are holding a bunch of big locks (e.g., CS) 
to make modifications to data structures that could be protected by much more 
fine grained locks, or made concurrency safe and not lock at all (as the entire 
CS anyway operate on a stale view of the cluster state due to hearbeats etc). 

We will post something soon, and I would really like your help on 
reviewing/extending this.

> CapacityScheduler locking / synchronization improvements
> --------------------------------------------------------
>                 Key: YARN-4198
>                 URL: https://issues.apache.org/jira/browse/YARN-4198
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Carlo Curino
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.

This message was sent by Atlassian JIRA

Reply via email to