[jira] [Commented] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container
[ https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114152#comment-14114152 ] Yingda Chen commented on YARN-2433: --- Looks like this is related to YARN-2371. Stale token used by restarted AM (with previous containers retained) to request new container - Key: YARN-2433 URL: https://issues.apache.org/jira/browse/YARN-2433 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0, 2.4.1 Reporter: Yingda Chen Assignee: Wangda Tan With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after an AM is restarted with containers retained, it appears to be using the stale token to start new container. This leads to the error below. To truly support container retention, AM should be able to communicate with previous container(s) with the old token and ask for new container with new token. This could be similar to YARN-1321 which was reported and fixed earlier. ERROR: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 STACK trace: {code} hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers {services_meta_data { key: mapreduce_shuffle value: \000\0004\372 } failed_requests { container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 } exception { message: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 trace: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02\r\n\tat org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: org.apache.hadoop.yarn.exceptions.YarnException } }} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container
Yingda Chen created YARN-2433: - Summary: Stale token used by restarted AM (with previous containers retained) to request new container Key: YARN-2433 URL: https://issues.apache.org/jira/browse/YARN-2433 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1, 2.4.0 Reporter: Yingda Chen With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after an AM is restarted with containers retained, it appears to be using the stale token to start new container. This leads to the error below. To truly support container retention, AM should be able to communicate with previous container(s) with the old token and ask for new container with new token. This could be similar to YARN-1321 which was reported and fixed earlier. ERROR: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 STACK trace: hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers {services_meta_data { key: mapreduce_shuffle value: \000\0004\372 } failed_requests { container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 } exception { message: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 trace: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02\r\n\tat org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: org.apache.hadoop.yarn.exceptions.YarnException } }} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container
[ https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingda Chen updated YARN-2433: -- Description: With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after an AM is restarted with containers retained, it appears to be using the stale token to start new container. This leads to the error below. To truly support container retention, AM should be able to communicate with previous container(s) with the old token and ask for new container with new token. This could be similar to YARN-1321 which was reported and fixed earlier. ERROR: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 STACK trace: {code} hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers {services_meta_data { key: mapreduce_shuffle value: \000\0004\372 } failed_requests { container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 } exception { message: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 trace: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02\r\n\tat org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: org.apache.hadoop.yarn.exceptions.YarnException } }} {code} was: With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after an AM is restarted with containers retained, it appears to be using the stale token to start new container. This leads to the error below. To truly support container retention, AM should be able to communicate with previous container(s) with the old token and ask for new container with new token. This could be similar to YARN-1321 which was reported and fixed earlier. ERROR: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 STACK trace: hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers {services_meta_data { key: mapreduce_shuffle value: \000\0004\372 } failed_requests { container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 } exception { message: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_01 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_02 trace: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. \nNMToken for application attempt :
[jira] [Assigned] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows
[ https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingda Chen reassigned YARN-1138: - Assignee: (was: Yingda Chen) I am not actively working on YARN for the time being. yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows --- Key: YARN-1138 URL: https://issues.apache.org/jira/browse/YARN-1138 Project: Hadoop YARN Issue Type: Bug Reporter: Yingda Chen yarn-default.xml has yarn.application.classpath entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)