Naganarasimha G R commented on YARN-4416:

bq. Hence with this new lock, we are getting a hierarchy. Is this intentional.?
Yes Sunil, even i was skeptical about it, but went ahead with [~wangda]'s 
 as there were similar read write locks held in queueCapacity, resource-usage & 
some methods were already updating them without locks on LeafQueue. Further was 
of the opinion that Ordering policy should not be dependent on LeafQueue for 
ensuring multithreaded consistency as its independent entity and can be used 
else where.

bq. we access the iterator from ordering policy under LeafQueue lock, so I 
could see that, now we have some methods in LeafQueue which is removed with 
LeafQueue lock and directly used only new lock from OrderingPolicy.
Still all the methods which are modifying the Ordering policy is done holding 
lock on LeafQueue and if in future if any other place they modify they need to 
ensure first lock on Leaf queue is held. Also TreeSet iterator failsfast when 
the underlying set gets modified

But Anyway need to evaluate the impact on the performance. Planning to run SLS 
with and without these changes to validate it.

Further IMO i think we could have read write lock in LeafQueue which would 
better avoid all Synchronized locks on LeafQueue for the getter(/reads) in the 
leaf queue. Thoughts ?

> Deadlock due to synchronised get Methods in AbstractCSQueue
> -----------------------------------------------------------
>                 Key: YARN-4416
>                 URL: https://issues.apache.org/jira/browse/YARN-4416
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Minor
>         Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.

This message was sent by Atlassian JIRA

Reply via email to