[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-10-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---
Attachment: yarn-2010-4.patch

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-10-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---
Attachment: yarn-2010-3.patch

Re-uploading the last patch, that has a single {{catch(Exception)}}.

[~vinodkv] - would you still prefer having multiple catch-blocks, one for each 
exception. IMO, catching {{ConnectException}} doesn't seem very readable; we 
could add a comment on why we are adding that catch, but we might not be able 
to enumerate all possible cases. That said, I am okay with catching 
ConnectException and Exception separately. Please advise. 

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch, yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-28 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---

Attachment: yarn-2010-3.patch

New patch that gets rid of the config and addresses the issue where the 
masterKey is null. 

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---

Attachment: yarn-2010-2.patch

New patch with following changes - 
# Noticed that RMAppManager#recoverApplication wasn't failing running 
applications in all the code-paths corresponding to failed recovery. Fixed that 
and cleaned it up futher.
# Changed the config name to be shorter. 
# Added comments to make sure we document why we are doing what we are doing.

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---

Priority: Critical  (was: Major)
Target Version/s: 2.5.0

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2010:
-

Attachment: YARN-2010.patch

Uploading patch without test written.

Thinking of how to write test, should complete flow need to consider or only 
RMAppAttempt.recoveryApplication() can be called.?!!

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
 Attachments: YARN-2010.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)