[
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057190#comment-15057190
]
Sangjin Lee commented on YARN-4247:
-----------------------------------
FYI, those who port YARN-2005 without YARN-3361 will run into this issue pretty
easily. If we ever decide to backport YARN-2005 to 2.6.x or 2.7.x, YARN-3361
needs to be backported too or this should be fixed in the way this patch
suggests.
There are a couple of things that are not quite correct with the patch.
- the call to {{hasMasterContainer()}} in {{ScheduledApplicationAttempt}} is
opposite: it should be {{!hasMasterContainer()}}
- {{masterContainer}} should be {{volatile}} to preserve the memory visibility
Adding these comments for posterity.
> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing
> events
> ---------------------------------------------------------------------------------
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler, resourcemanager
> Reporter: Anubhav Dhoot
> Assignee: Anubhav Dhoot
> Priority: Blocker
> Attachments: YARN-4247.001.patch, YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)