[jira] [Commented] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container

2014-08-28 Thread Yingda Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114152#comment-14114152
 ] 

Yingda Chen commented on YARN-2433:
---

Looks like this is related to YARN-2371.

 Stale token used by restarted AM (with previous containers retained) to 
 request new container
 -

 Key: YARN-2433
 URL: https://issues.apache.org/jira/browse/YARN-2433
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0, 2.4.1
Reporter: Yingda Chen
Assignee: Wangda Tan

 With Hadoop 2.4, container retention is supported across AM 
 crash-and-restart. However, after an AM is restarted with containers 
 retained, it appears to be using the stale token to start new container. This 
 leads to the error below. To truly support container retention, AM should be 
 able to communicate with previous container(s) with the old token and ask for 
 new container with new token. 
 This could be similar to YARN-1321 which was reported and fixed earlier.
 ERROR: 
 Unauthorized request to start container. \nNMToken for application attempt : 
 appattempt_1408130608672_0065_01 was used for starting container with 
 container token issued for application attempt : 
 appattempt_1408130608672_0065_02
 STACK trace:
 {code}
 hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
 org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
 Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
 startContainers {services_meta_data { key: mapreduce_shuffle value: 
 \000\0004\372 } failed_requests { container_id { app_attempt_id { 
 application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 
 2 } exception { message: Unauthorized request to start container. \nNMToken 
 for application attempt : appattempt_1408130608672_0065_01 was used for 
 starting container with container token issued for application attempt : 
 appattempt_1408130608672_0065_02 trace: 
 org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
 start container. \nNMToken for application attempt : 
 appattempt_1408130608672_0065_01 was used for starting container with 
 container token issued for application attempt : 
 appattempt_1408130608672_0065_02\r\n\tat 
 org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat
  
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
  
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
  
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
  
 org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
  
 org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
  
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
 java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
 javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: 
 org.apache.hadoop.yarn.exceptions.YarnException } }}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container

2014-08-20 Thread Yingda Chen (JIRA)
Yingda Chen created YARN-2433:
-

 Summary: Stale token used by restarted AM (with previous 
containers retained) to request new container
 Key: YARN-2433
 URL: https://issues.apache.org/jira/browse/YARN-2433
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1, 2.4.0
Reporter: Yingda Chen


With Hadoop 2.4, container retention is supported across AM crash-and-restart. 
However, after an AM is restarted with containers retained, it appears to be 
using the stale token to start new container. This leads to the error below. To 
truly support container retention, AM should be able to communicate with 
previous container(s) with the old token and ask for new container with new 
token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_01 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_02

STACK trace:

hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
startContainers {services_meta_data { key: mapreduce_shuffle value: 
\000\0004\372 } failed_requests { container_id { app_attempt_id { 
application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 
} exception { message: Unauthorized request to start container. \nNMToken for 
application attempt : appattempt_1408130608672_0065_01 was used for 
starting container with container token issued for application attempt : 
appattempt_1408130608672_0065_02 trace: 
org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start 
container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_01 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_02\r\n\tat 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: 
org.apache.hadoop.yarn.exceptions.YarnException } }}








--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container

2014-08-20 Thread Yingda Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingda Chen updated YARN-2433:
--

Description: 
With Hadoop 2.4, container retention is supported across AM crash-and-restart. 
However, after an AM is restarted with containers retained, it appears to be 
using the stale token to start new container. This leads to the error below. To 
truly support container retention, AM should be able to communicate with 
previous container(s) with the old token and ask for new container with new 
token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_01 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_02

STACK trace:
{code}
hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
startContainers {services_meta_data { key: mapreduce_shuffle value: 
\000\0004\372 } failed_requests { container_id { app_attempt_id { 
application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 
} exception { message: Unauthorized request to start container. \nNMToken for 
application attempt : appattempt_1408130608672_0065_01 was used for 
starting container with container token issued for application attempt : 
appattempt_1408130608672_0065_02 trace: 
org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start 
container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_01 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_02\r\n\tat 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: 
org.apache.hadoop.yarn.exceptions.YarnException } }}
{code}






  was:
With Hadoop 2.4, container retention is supported across AM crash-and-restart. 
However, after an AM is restarted with containers retained, it appears to be 
using the stale token to start new container. This leads to the error below. To 
truly support container retention, AM should be able to communicate with 
previous container(s) with the old token and ask for new container with new 
token. 

This could be similar to YARN-1321 which was reported and fixed earlier.

ERROR: 
Unauthorized request to start container. \nNMToken for application attempt : 
appattempt_1408130608672_0065_01 was used for starting container with 
container token issued for application attempt : 
appattempt_1408130608672_0065_02

STACK trace:

hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
startContainers {services_meta_data { key: mapreduce_shuffle value: 
\000\0004\372 } failed_requests { container_id { app_attempt_id { 
application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 
} exception { message: Unauthorized request to start container. \nNMToken for 
application attempt : appattempt_1408130608672_0065_01 was used for 
starting container with container token issued for application attempt : 
appattempt_1408130608672_0065_02 trace: 
org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start 
container. \nNMToken for application attempt : 

[jira] [Assigned] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows

2014-01-02 Thread Yingda Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingda Chen reassigned YARN-1138:
-

Assignee: (was: Yingda Chen)

I am not actively working on YARN for the time being.

 yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which 
 does not work on Windows
 ---

 Key: YARN-1138
 URL: https://issues.apache.org/jira/browse/YARN-1138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yingda Chen

 yarn-default.xml has yarn.application.classpath entry set to 
 $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib.
  It does not work on Windows which needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)