[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148049#comment-14148049
 ] 

Karthik Kambatla commented on YARN-2594:
----------------------------------------

Thanks for working on this, Wangda. 

As I see, we could adopt the approach in the current patch. If we do so, we 
should avoid using readLock in other get methods that access 
{{RMAppImpl#currentAttempt}}. {{RMAppAttemptImpl}} should handle the 
thread-safety of its fields.

Either in addition to or instead of current approach, we really need to cleanup 
{{SchedulerApplicationAttempt}}. Most of the methods there are synchronized, 
and many of them just call synchronized methods in {{AppSchedulingInfo}}. 
Needless to say, this is more involved and we need to be very careful. 

I am open to adopting the first approach in this JIRA and file follow-up JIRAs 
to address the second approach suggested. 

PS: We really need to set up jcarder or something to identify most of these 
deadlock paths. 

> ResourceManger sometimes become un-responsive
> ---------------------------------------------
>
>                 Key: YARN-2594
>                 URL: https://issues.apache.org/jira/browse/YARN-2594
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: YARN-2594.patch
>
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to