[ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621281#comment-14621281
 ] 

Eric Payne commented on YARN-3905:
----------------------------------

{{org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable}} 
constructs what it believes should be the AM container ID when creating a new 
{{GetContainerReportRequest}}.
{code}
        // AM container is always the first container of the attempt
        final GetContainerReportRequest request =
            GetContainerReportRequest.newInstance(ContainerId.newContainerId(
              appAttemptReport.getApplicationAttemptId(), 1));
{code}
- After the RM is restarted, container IDs contain an {{e##}} string, which the 
above code doesn't take into consideration
- The AM container is not always _000001 due to the way reservations work. We 
have seen "non-first" AM containers in practice.

As a result of the above code, the container ID in the 
{{GetContainerReportRequest}} may not match the actual AM container ID before 
RM restart, and will not match those for jobs run after the RM is restarted.

So, When {{ApplicationHistoryManagerImpl}} compares the ID of the passed 
container with it's cache from the history store, it can't find a match and 
throws the NPE.

In {{AppBlock#generateApplicationTable}}, instead of constructing the AM's 
container ID, I suggest using appAttemptReport#getAMContainerId instead:
{code}
        final GetContainerReportRequest request =
            GetContainerReportRequest.newInstance(
                    appAttemptReport.getAMContainerId());
{code}

> Application History Server UI NPEs when accessing apps run after RM restart
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3905
>                 URL: https://issues.apache.org/jira/browse/YARN-3905
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.7.0, 2.8.0, 2.7.1
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> From the Application History URL (http://RmHostName:8188/applicationhistory), 
> clicking on the application ID of an app that was run after the RM daemon has 
> been restarted results in a 500 error:
> {noformat}
> Sorry, got error 500
> Please consult RFC 2616 for meanings of the error code.
> {noformat}
> The stack trace is as follows:
> {code}
> 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
> applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
> reading history information of all application attempts of application 
> application_1436472584878_0001
> 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
> Failed to read the AM container of the application attempt 
> appattempt_1436472584878_0001_000001.
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
>         at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
>         at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
>         at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to