[jira] [Commented] (YARN-3112) AM restart and keep containers from previous attempts, then new container launch failed

Jason Lowe (JIRA) Thu, 29 Jan 2015 12:57:57 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297651#comment-14297651
 ]


Jason Lowe commented on YARN-3112:
----------------------------------

The MapReduce AM supports recovery of completed tasks upon restart, but it does 
not support reacquiring active containers.  Doing so would require the tasks to 
determine the new address of the subsequent AM attempt so the umbilical 
connection can be re-established.

> AM restart and keep containers from previous attempts, then new container 
> launch failed
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3112
>                 URL: https://issues.apache.org/jira/browse/YARN-3112
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications, resourcemanager
>    Affects Versions: 2.6.0
>         Environment: in real linux cluster
>            Reporter: Jack Chen
>
> This error is very similar to YARN-1795, YARN-1839, but i have check the 
> solution of those jira, the patches are already included in my version. I 
> think this error is caused by the different NMTokens between old and new 
> appattempts. New AM has inherited the old tokens from previous AM according 
> to my configuration (keepContainers=true), so the token for new containers 
> are replaced by the old one in the NMTokenCache.
> 206 2015-01-29 10:04:49,603 ERROR [ContainerLauncher #0] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for      container_1422546145900_0001_02_000002 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for ixk02:47625
>  207 ›   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProt
>      ocolProxy.java:256)
>  208 ›   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtoc
>      olProxy.java:246)
>  209 ›   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:132)
>  210 ›   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:401)
>  211 ›   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>  212 ›   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:367)
>  213 ›   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  214 ›   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  215 ›   at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3112) AM restart and keep containers from previous attempts, then new container launch failed

Reply via email to