[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2015-05-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523347#comment-14523347
 ] 

Junping Du commented on YARN-944:
-

[~jianhe], we have improved diagnostics info a lot since this JIRA get filed. 
Do you think we still need any improvement here? If not, how about resolve it 
as "not a problem"?

> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-944.patch, nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2014-03-15 Thread Oleg Zhurakousky (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936354#comment-13936354
 ] 

Oleg Zhurakousky commented on YARN-944:
---

Here is another way of reproducing it.
{code}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
ApplicationMasterProtocol applicationsManager = 
ClientRMProxy.createRMProxy(conf, ApplicationMasterProtocol.class);
RegisterApplicationMasterRequest request = 
RegisterApplicationMasterRequest.newInstance("", 0, "");
RegisterApplicationMasterResponse response = 
applicationsManager.registerApplicationMaster(request);
}
{code}
Execution the above while connecting to the remote YARN (2.3.0) cluster results 
in:
{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
   . . . .
{code}

Looking at the ipc Server code where the actual exception is being triggered I 
wonder, what is the rational for hardcoding the TOKEN as one of the 
authentication methods especially when SIMPLE is configured explicitly. 

> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>Priority: Blocker
> Attachments: YARN-944.patch, nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2013-07-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717579#comment-13717579
 ] 

Bikas Saha commented on YARN-944:
-

Making this a blocker since its making debugging quite difficult by hiding the 
root cause and giving a potentially misleading error message.

> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>Priority: Blocker
> Attachments: nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2013-07-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717481#comment-13717481
 ] 

Hitesh Shah commented on YARN-944:
--

There are 2 different issues being reported in this jira: 

  - The first related to the underlying error which is due to the changes in 
YARN-701, default addresses of 0.0.0.0 combined with the use of ip in the 
tokens cause apps to fail. 
  - the second issue is related to what failure information is propagated back 
to the user. For any non-zero exit code, the user now sees a 
ShellExitCodeException - this change was done as part of YARN-814.

 


> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
> Attachments: nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2013-07-23 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716672#comment-13716672
 ] 

Sandy Ryza commented on YARN-944:
-

I'm facing this as well.  The error I'm getting on the AM side is

{code}

2013-07-23 10:58:33,426 ERROR [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while 
registering
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:175)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:94)
at $Proxy29.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:147)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:107)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:213)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:789)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1401)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1397)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1330)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1428)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy28.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
... 22 more
{code}

Haven't looked into this deeply, but it seems like this is caused by YARN-701?

> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
> Attachments: nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2013-07-19 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714291#comment-13714291
 ] 

Omkar Vinit Joshi commented on YARN-944:


By default resolve ip is set to true...
{code}
boolean useIp = conf.getBoolean(
  CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP,
  CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP_DEFAULT);
setTokenServiceUseIp(useIp);
{code}
can you try setting below parameter?
{code}
  
 
 yarn.resourcemanager.scheduler.address
 
 localhost:54313  
 host is the hostname of the resourcemanager and port is the 
port
   on which the Applications in the cluster talk to the Resource Manager.   
 
  
 
   
{code}

I don't know whether we need to fix this..

> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
> Attachments: nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-944) App failed with container launch failed even though container started

2013-07-19 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714119#comment-13714119
 ] 

Bikas Saha commented on YARN-944:
-

Error shown on RM web UI for application. The application container actually 
started. So this message is wrong.
{noformat}
Application application_1374261801151_0002 failed 2 times due to AM Container 
for appattempt_1374261801151_0002_02 exited with exitCode: 1 due to: 
Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
at org.apache.hadoop.util.Shell.run(Shell.java:373)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
.Failing this attempt.. Failing the application.
{noformat}

> App failed with container launch failed even though container started
> -
>
> Key: YARN-944
> URL: https://issues.apache.org/jira/browse/YARN-944
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
> Attachments: nm.log, rm.log, yarn-site.xml
>
>
> The container is the AM container. It started and the AM failed during RM 
> registration. The error message presented was about container launch failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira