[
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wangda Tan updated YARN-2594:
-----------------------------
Attachment: YARN-2594.patch
[~zxu],
bq. It will be good to use a local variable to save currentAttempt to avoid any
potential null pointer exception in the future.
Good catch! Addressed,
[~kasha],
bq. We need to handle getFinalApplicationStatus, and may be
createAndGetApplicationReport as well. In the latter, we can replace direct
access of diagnostics with getDiagnostics to avoid races on diagnostics.
{{getFinalApplicationStatus}} has access to statemachine.getCurrentState(), and
{{createAndGetApplicationReport}} has accesses on
statemachine.getCurrentState() and other Fields.
To minimize scope to solve the problem we can see now, I would suggest to keep
other fields as-is.
bq. Also, it would be nice to add a comment next to the declaration of
currentAttempt to say it is not protected by the readLock.
Addressed,
New patch attached.
> Potential deadlock in RM when querying ApplicationResourceUsageReport
> ---------------------------------------------------------------------
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: Karam Singh
> Assignee: Wangda Tan
> Priority: Blocker
> Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch
>
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only following
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO event.AsyncDispatcher
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)