Jack Chen created YARN-3112: ------------------------------- Summary: AM restart and keep containers from previous attempts, then new container launch failed Key: YARN-3112 URL: https://issues.apache.org/jira/browse/YARN-3112 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 2.6.0 Environment: in real linux cluster Reporter: Jack Chen
This error is very similar to YARN-713, 1839, but i have check the solution of those jira, the patches are already included in my version. I think this error is caused by the different NMTokens between old and new appattempts. New AM has inherited the old tokens from previous AM according to my configuration (keepContainers=true), so the token for new containers are replaced by the old one in the NMTokenCache. 206 2015-01-29 10:04:49,603 ERROR [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1422546145900_0001_02_000002 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for ixk02:47625 207 › at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProt ocolProxy.java:256) 208 › at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtoc olProxy.java:246) 209 › at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:132) 210 › at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:401) 211 › at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) 212 › at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:367) 213 › at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 214 › at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 215 › at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332)