[
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013201#comment-14013201
]
Vinod Kumar Vavilapalli commented on YARN-2010:
-----------------------------------------------
bq. For completed applications before starting in secured mode,
clientTokenMaterKey is null. After starting in secured mode, recovery of apps
fails since clientTokenMasterKey is null. During recovering application, rm
should have intellegence to decide whether recovering applicaiton has run in
secured mode or non secured mode. This is possible by checking
cilentTokenMasterKey for null.
bq. Please, can this be considered a "Blocker" as there seems no way to recover
from this and still transition to secured mode?
Apologies for coming in real late. When is this exception/crash manifesting?
It seemed like this is when you try to upgrade from a non-secure cluster to a
secure cluster. Is that so? That is a completely unsupportable use-case. There
are so many other things that will be broken when you do such an upgrade with
existing applications - think tokens needed, localized files etc.
Just trying to make sure we are not fixing 'issues' to support unsupportable
use-cases.
> RM can't transition to active if it can't recover an app attempt
> ----------------------------------------------------------------
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: bc Wong
> Assignee: Rohith
> Priority: Critical
> Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch,
> yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make
> it more resilient.
> Specifically, the underlying error is that the app was submitted before
> Kerberos security got turned on. Makes sense for the app to fail in this
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
> at
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>
> at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>
> at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
> at
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.hadoop.yarn.exceptions.YarnException:
> java.lang.IllegalArgumentException: Missing argument
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException:
> java.lang.IllegalArgumentException: Missing argument
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: Missing argument
> at javax.crypto.spec.SecretKeySpec.<init>(SecretKeySpec.java:93)
> at
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>
> ... 13 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)