[ https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jian He resolved YARN-2433. --------------------------- Resolution: Fixed Fix Version/s: 2.5.0 Assignee: Jian He (was: Wangda Tan) > Stale token used by restarted AM (with previous containers retained) to > request new container > --------------------------------------------------------------------------------------------- > > Key: YARN-2433 > URL: https://issues.apache.org/jira/browse/YARN-2433 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 2.4.0, 2.4.1 > Reporter: Yingda Chen > Assignee: Jian He > Fix For: 2.5.0 > > > With Hadoop 2.4, container retention is supported across AM > crash-and-restart. However, after an AM is restarted with containers > retained, it appears to be using the stale token to start new container. This > leads to the error below. To truly support container retention, AM should be > able to communicate with previous container(s) with the old token and ask for > new container with new token. > This could be similar to YARN-1321 which was reported and fixed earlier. > ERROR: > Unauthorized request to start container. \nNMToken for application attempt : > appattempt_1408130608672_0065_000001 was used for starting container with > container token issued for application attempt : > appattempt_1408130608672_0065_000002 > STACK trace: > {code} > hadoop.ipc.ProtobufRpcEngine$Invoker.invoke > org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: > Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: > startContainers {services_meta_data { key: "mapreduce_shuffle" value: > "\000\0004\372" } failed_requests { container_id { app_attempt_id { > application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: > 2 } exception { message: "Unauthorized request to start container. \nNMToken > for application attempt : appattempt_1408130608672_0065_000001 was used for > starting container with container token issued for application attempt : > appattempt_1408130608672_0065_000002" trace: > "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to > start container. \nNMToken for application attempt : > appattempt_1408130608672_0065_000001 was used for starting container with > container token issued for application attempt : > appattempt_1408130608672_0065_000002\r\n\tat > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat > > org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat > > org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat > java.security.AccessController.doPrivileged(Native Method)\r\n\tat > javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: > "org.apache.hadoop.yarn.exceptions.YarnException" } }} > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)