[ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057190#comment-15057190
 ] 

Sangjin Lee commented on YARN-4247:
-----------------------------------

FYI, those who port YARN-2005 without YARN-3361 will run into this issue pretty 
easily. If we ever decide to backport YARN-2005 to 2.6.x or 2.7.x, YARN-3361 
needs to be backported too or this should be fixed in the way this patch 
suggests.

There are a couple of things that are not quite correct with the patch.
- the call to {{hasMasterContainer()}} in {{ScheduledApplicationAttempt}} is 
opposite: it should be {{!hasMasterContainer()}}
- {{masterContainer}} should be {{volatile}} to preserve the memory visibility

Adding these comments for posterity.

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-4247
>                 URL: https://issues.apache.org/jira/browse/YARN-4247
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Blocker
>         Attachments: YARN-4247.001.patch, YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to