[ 
https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968653#comment-13968653
 ] 

Steve Loughran commented on YARN-1929:
--------------------------------------

I'm +1 to the change to composite service, as well as making the serviceXYZ 
operations desyncrhonized (the state entry point in the public method is 
synchronized to prevent re-entrancy.

I'll leave it to others to look at the remaining code and comment

Now, there is one little quirk by desynchronizing the serviceStart() and 
serviceStop methods. Although it is still impossible to have >1 thread 
successfully entering either method, there is the sequence
{code}

Thread 1 : service.start()
Thread 1:  service.serviceStart() begins

Thread 2 : service.stop()
Thread 2:  service.serviceStop() begins
Thread 2:  service.serviceStop() completes

Thread 1: service start completes
{code}

That's because we're not making any attempt to include transitive states, it 
generally makes things too complex -and that includes handling the problem of 
"what is the policy if I try to call stop midway through starting"

> DeadLock in RM when automatic failover is enabled.
> --------------------------------------------------
>
>                 Key: YARN-1929
>                 URL: https://issues.apache.org/jira/browse/YARN-1929
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>         Environment: Yarn HA cluster
>            Reporter: Rohith
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: yarn-1929-1.patch
>
>
> Dead lock detected  in RM when automatic failover is enabled.
> {noformat}
> Found one Java-level deadlock:
> =============================
> "Thread-2":
>   waiting to lock monitor 0x00007fb514303cf0 (object 0x00000000ef153fd0, a 
> org.apache.hadoop.ha.ActiveStandbyElector),
>   which is held by "main-EventThread"
> "main-EventThread":
>   waiting to lock monitor 0x00007fb514750a48 (object 0x00000000ef154020, a 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService),
>   which is held by "Thread-2"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to