bc Wong created YARN-2010:
-----------------------------
Summary: RM can't transition to active if it can't recover an app
attempt
Key: YARN-2010
URL: https://issues.apache.org/jira/browse/YARN-2010
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
If the RM fails to recover an app attempt, it won't come up. We should make it
more resilient.
Specifically, the underlying error is that the app was submitted before
Kerberos security got turned on. Makes sense for the app to fail in this case.
But YARN should still start.
{noformat}
2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
transitioning to Active mode
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException:
org.apache.hadoop.yarn.exceptions.YarnException:
java.lang.IllegalArgumentException: Missing argument
at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
... 5 more
Caused by: org.apache.hadoop.yarn.exceptions.YarnException:
java.lang.IllegalArgumentException: Missing argument
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 8 more
Caused by: java.lang.IllegalArgumentException: Missing argument
at javax.crypto.spec.SecretKeySpec.<init>(SecretKeySpec.java:93)
at
org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
at
org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
... 13 more
{noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)