[ 
https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingda Chen updated YARN-2433:
------------------------------

    Description: 
With Hadoop 2.4, container retention is supported across AM crash-and-restart. 
However, after an AM is restarted with containers retained, it appears to be 
using the stale token to start new container. This leads to the error below. To 
truly support container retention, AM should be able to communicate with 
previous container(s) with the old token and ask for new container with new 
token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_000001 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_000002

STACK trace:
{code}
hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
startContainers {services_meta_data { key: "mapreduce_shuffle" value: 
"\000\0004\372" } failed_requests { container_id { app_attempt_id { 
application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 
} exception { message: "Unauthorized request to start container. \nNMToken for 
application attempt : appattempt_1408130608672_0065_000001 was used for 
starting container with container token issued for application attempt : 
appattempt_1408130608672_0065_000002" trace: 
"org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start 
container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_000001 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_000002\r\n\tat 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: 
"org.apache.hadoop.yarn.exceptions.YarnException" } }}
{code}






  was:
With Hadoop 2.4, container retention is supported across AM crash-and-restart. 
However, after an AM is restarted with containers retained, it appears to be 
using the stale token to start new container. This leads to the error below. To 
truly support container retention, AM should be able to communicate with 
previous container(s) with the old token and ask for new container with new 
token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_000001 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_000002

STACK trace:

hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
startContainers {services_meta_data { key: "mapreduce_shuffle" value: 
"\000\0004\372" } failed_requests { container_id { app_attempt_id { 
application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 
} exception { message: "Unauthorized request to start container. \nNMToken for 
application attempt : appattempt_1408130608672_0065_000001 was used for 
starting container with container token issued for application attempt : 
appattempt_1408130608672_0065_000002" trace: 
"org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start 
container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_000001 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_000002\r\n\tat 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: 
"org.apache.hadoop.yarn.exceptions.YarnException" } }}







> Stale token used by restarted AM (with previous containers retained) to 
> request new container
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-2433
>                 URL: https://issues.apache.org/jira/browse/YARN-2433
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0, 2.4.1
>            Reporter: Yingda Chen
>
> With Hadoop 2.4, container retention is supported across AM 
> crash-and-restart. However, after an AM is restarted with containers 
> retained, it appears to be using the stale token to start new container. This 
> leads to the error below. To truly support container retention, AM should be 
> able to communicate with previous container(s) with the old token and ask for 
> new container with new token. 
> This could be similar to YARN-1321 which was reported and fixed earlier.
> ERROR: 
> Unauthorized request to start container. \nNMToken for application attempt : 
> appattempt_1408130608672_0065_000001 was used for starting container with 
> container token issued for application attempt : 
> appattempt_1408130608672_0065_000002
> STACK trace:
> {code}
> hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
> Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
> startContainers {services_meta_data { key: "mapreduce_shuffle" value: 
> "\000\0004\372" } failed_requests { container_id { app_attempt_id { 
> application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 
> 2 } exception { message: "Unauthorized request to start container. \nNMToken 
> for application attempt : appattempt_1408130608672_0065_000001 was used for 
> starting container with container token issued for application attempt : 
> appattempt_1408130608672_0065_000002" trace: 
> "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
> start container. \nNMToken for application attempt : 
> appattempt_1408130608672_0065_000001 was used for starting container with 
> container token issued for application attempt : 
> appattempt_1408130608672_0065_000002\r\n\tat 
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
>  
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: 
> "org.apache.hadoop.yarn.exceptions.YarnException" } }}
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to