[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492963#comment-14492963
 ] 

Xuan Gong commented on YARN-3127:
---------------------------------

[~Naganarasimha] Sorry for the late reply.
So, the solution here is to avoid events sent to System metrics publisher 
during RM application recovery from state store. It looks fine to solve the 
current issue.

But here is the case I am thinking right now might not work:
* we start RM, ATS correctly
* the RM failover/restart happens between the transition from FINAL_SAVING to 
FINISHED
* based on the original code, when we do the recovery for the applications, we 
will send out appFinished event to System metrics publisher to update the app 
status in ATS
* but based on the patch, we will not do it. In this case, the ATS will never 
get the app status update(change the app status from start to finished) ? This 
looks like an issue which is broken by the patch. 

Did I miss anything ?

> Apphistory url crashes when RM switches with ATS enabled
> --------------------------------------------------------
>
>                 Key: YARN-3127
>                 URL: https://issues.apache.org/jira/browse/YARN-3127
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, timelineserver
>    Affects Versions: 2.6.0
>         Environment: RM HA with ATS
>            Reporter: Bibin A Chundatt
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL <IP>:<PORT>/applicationhistory
> Result: Application history URL fails with below info
> {quote}
> 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
> read the applications.
> java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
>       at 
> org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80)
>       at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
>       at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
>       at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>       at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>       ...
> Caused by: 
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The 
> entity for application attempt appattempt_1422972608379_0001_000001 doesn't 
> exist in the timeline store
>       at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151)
>       at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499)
>       at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108)
>       at 
> org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84)
>       at 
> org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       ... 51 more
> 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /applicationhistory
> org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: 
> nestLevel=6 expected 5
>       at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>       at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
> {quote}
> Behaviour with AHS with file based history store
>       -Apphistory url is working 
>       -No attempt entries are shown for each application.
>       
> Based on inital analysis when RM switches ,application attempts from state 
> store  are not replayed but only applications are.
> So when /applicaitonhistory url is accessed it tries for all attempt id and 
> fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to