[ 
https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-2433.
---------------------------

       Resolution: Fixed
    Fix Version/s: 2.5.0
         Assignee: Jian He  (was: Wangda Tan)

> Stale token used by restarted AM (with previous containers retained) to 
> request new container
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-2433
>                 URL: https://issues.apache.org/jira/browse/YARN-2433
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0, 2.4.1
>            Reporter: Yingda Chen
>            Assignee: Jian He
>             Fix For: 2.5.0
>
>
> With Hadoop 2.4, container retention is supported across AM 
> crash-and-restart. However, after an AM is restarted with containers 
> retained, it appears to be using the stale token to start new container. This 
> leads to the error below. To truly support container retention, AM should be 
> able to communicate with previous container(s) with the old token and ask for 
> new container with new token. 
> This could be similar to YARN-1321 which was reported and fixed earlier.
> ERROR: 
> Unauthorized request to start container. \nNMToken for application attempt : 
> appattempt_1408130608672_0065_000001 was used for starting container with 
> container token issued for application attempt : 
> appattempt_1408130608672_0065_000002
> STACK trace:
> {code}
> hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
> Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
> startContainers {services_meta_data { key: "mapreduce_shuffle" value: 
> "\000\0004\372" } failed_requests { container_id { app_attempt_id { 
> application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 
> 2 } exception { message: "Unauthorized request to start container. \nNMToken 
> for application attempt : appattempt_1408130608672_0065_000001 was used for 
> starting container with container token issued for application attempt : 
> appattempt_1408130608672_0065_000002" trace: 
> "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
> start container. \nNMToken for application attempt : 
> appattempt_1408130608672_0065_000001 was used for starting container with 
> container token issued for application attempt : 
> appattempt_1408130608672_0065_000002\r\n\tat 
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
>  
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: 
> "org.apache.hadoop.yarn.exceptions.YarnException" } }}
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to