[ 
https://issues.apache.org/jira/browse/YARN-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202667#comment-14202667
 ] 

Hadoop QA commented on YARN-2816:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680250/YARN-2816.001.patch
  against trunk revision a71e930.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

      {color:red}-1 javac{color}.  The applied patch generated 1396 javac 
compiler warnings (more than the trunk's current 1220 warnings).

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5790//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5790//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5790//console

This message is automatically generated.

> NM fail to start with NPE during container recovery
> ---------------------------------------------------
>
>                 Key: YARN-2816
>                 URL: https://issues.apache.org/jira/browse/YARN-2816
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2816.000.patch, YARN-2816.001.patch, 
> leveldb_records.txt
>
>
> NM fail to start with NPE during container recovery.
> We saw the following crash happen:
> 2014-10-30 22:22:37,211 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
> The reason is some DB files used in NMLeveldbStateStoreService are 
> accidentally deleted to save disk space at 
> /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. This leaves some incomplete 
> container record which don't have CONTAINER_REQUEST_KEY_SUFFIX(startRequest) 
> entry in the DB. When container is recovered at 
> ContainerManagerImpl#recoverContainer, 
> The NullPointerException at the following code cause NM shutdown.
> {code}
>     StartContainerRequest req = rcs.getStartRequest();
>     ContainerLaunchContext launchContext = req.getContainerLaunchContext();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to