[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523347#comment-14523347 ] Junping Du commented on YARN-944: - [~jianhe], we have improved diagnostics info a lot since this JIRA get filed. Do you think we still need any improvement here? If not, how about resolve it as "not a problem"? > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-944.patch, nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936354#comment-13936354 ] Oleg Zhurakousky commented on YARN-944: --- Here is another way of reproducing it. {code} public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); ApplicationMasterProtocol applicationsManager = ClientRMProxy.createRMProxy(conf, ApplicationMasterProtocol.class); RegisterApplicationMasterRequest request = RegisterApplicationMasterRequest.newInstance("", 0, ""); RegisterApplicationMasterResponse response = applicationsManager.registerApplicationMaster(request); } {code} Execution the above while connecting to the remote YARN (2.3.0) cluster results in: {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN] at org.apache.hadoop.ipc.Client.call(Client.java:1406) . . . . {code} Looking at the ipc Server code where the actual exception is being triggered I wonder, what is the rational for hardcoding the TOKEN as one of the authentication methods especially when SIMPLE is configured explicitly. > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi >Priority: Blocker > Attachments: YARN-944.patch, nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717579#comment-13717579 ] Bikas Saha commented on YARN-944: - Making this a blocker since its making debugging quite difficult by hiding the root cause and giving a potentially misleading error message. > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi >Priority: Blocker > Attachments: nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717481#comment-13717481 ] Hitesh Shah commented on YARN-944: -- There are 2 different issues being reported in this jira: - The first related to the underlying error which is due to the changes in YARN-701, default addresses of 0.0.0.0 combined with the use of ip in the tokens cause apps to fail. - the second issue is related to what failure information is propagated back to the user. For any non-zero exit code, the user now sees a ShellExitCodeException - this change was done as part of YARN-814. > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi > Attachments: nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716672#comment-13716672 ] Sandy Ryza commented on YARN-944: - I'm facing this as well. The error I'm getting on the AM side is {code} 2013-07-23 10:58:33,426 ERROR [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while registering org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:175) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:94) at $Proxy29.registerApplicationMaster(Unknown Source) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:147) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:107) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:789) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1401) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1397) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1330) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN] at org.apache.hadoop.ipc.Client.call(Client.java:1428) at org.apache.hadoop.ipc.Client.call(Client.java:1381) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) ... 22 more {code} Haven't looked into this deeply, but it seems like this is caused by YARN-701? > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi > Attachments: nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714291#comment-13714291 ] Omkar Vinit Joshi commented on YARN-944: By default resolve ip is set to true... {code} boolean useIp = conf.getBoolean( CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP, CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP_DEFAULT); setTokenServiceUseIp(useIp); {code} can you try setting below parameter? {code} yarn.resourcemanager.scheduler.address localhost:54313 host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. {code} I don't know whether we need to fix this.. > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi > Attachments: nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-944) App failed with container launch failed even though container started
[ https://issues.apache.org/jira/browse/YARN-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714119#comment-13714119 ] Bikas Saha commented on YARN-944: - Error shown on RM web UI for application. The application container actually started. So this message is wrong. {noformat} Application application_1374261801151_0002 failed 2 times due to AM Container for appattempt_1374261801151_0002_02 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:458) at org.apache.hadoop.util.Shell.run(Shell.java:373) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. {noformat} > App failed with container launch failed even though container started > - > > Key: YARN-944 > URL: https://issues.apache.org/jira/browse/YARN-944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi > Attachments: nm.log, rm.log, yarn-site.xml > > > The container is the AM container. It started and the AM failed during RM > registration. The error message presented was about container launch failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira