[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043932#comment-15043932
 ] 

Sunil G commented on YARN-4416:
-------------------------------

Sorry, I was not  very clear in my earlier comments.

Almost all api's exposed from LeafQueue is used with Lock from Queue. Hence 
with this new lock, we are getting a hierarchy. Is this intentional.?
Because we are going to have a new lock in a major code path.

Also In LeafQueue#assignContainers
{code}
    for (Iterator<FiCaSchedulerApp> assignmentIterator =
        orderingPolicy.getAssignmentIterator(); assignmentIterator.hasNext();) {
      FiCaSchedulerApp application = assignmentIterator.next();

{code}

we access the iterator from ordering policy under LeafQueue lock, so I could 
see that, now we have some methods in LeafQueue which is removed with LeafQueue 
lock and directly used only new lock from OrderingPolicy. So we need to 
slightly careful here as we should ensure we do not delete any item w/o 
LeafQueue lock. (we are now doing under LeafQueue lock, hence no issues as of 
now)

> Deadlock due to synchronised get Methods in AbstractCSQueue
> -----------------------------------------------------------
>
>                 Key: YARN-4416
>                 URL: https://issues.apache.org/jira/browse/YARN-4416
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Minor
>         Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to