[ https://issues.apache.org/jira/browse/YARN-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306015#comment-16306015 ]
Sunil G commented on YARN-7692: ------------------------------- Thanks [~charanh]. I ll help to share a patch to avoid checking priority acl's during recovery. > Resource Manager goes down when a user not included in a priority acl submits > a job > ----------------------------------------------------------------------------------- > > Key: YARN-7692 > URL: https://issues.apache.org/jira/browse/YARN-7692 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.9.0, 2.8.3, 3.0.0 > Reporter: Charan Hebri > Assignee: Sunil G > > Test scenario > ------------------ > 1. A cluster is created, no ACLs are included > 2. Submit jobs with an existing user say 'user_a' > 3. Enable ACLs and create a priority ACL entry via the property > yarn.scheduler.capacity.priority-acls. Do not include the user, 'user_a' in > this ACL. > 4. Submit a job with the 'user_a' > The observed behavior in this case is that the job is rejected as 'user_a' > does not have the permission to run the job which is expected behavior. But > Resource Manager also goes down when it tries to recover previous > applications and fails to recover them. > Below is the exception seen, > {noformat} > 2017-12-27 10:52:30,064 INFO conf.Configuration > (Configuration.java:getConfResourceAsInputStream(2659)) - found resource > yarn-site.xml at file:/etc/hadoop/3.0.0.0-636/0/yarn-site.xml > 2017-12-27 10:52:30,065 INFO scheduler.AbstractYarnScheduler > (AbstractYarnScheduler.java:setClusterMaxPriority(911)) - Updated the cluste > max priority to maxClusterLevelAppPriority = 10 > 2017-12-27 10:52:30,066 INFO resourcemanager.ResourceManager > (ResourceManager.java:transitionToActive(1177)) - Transitioning to active > state > 2017-12-27 10:52:30,097 INFO resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(765)) - Recovery started > 2017-12-27 10:52:30,102 INFO recovery.RMStateStore > (RMStateStore.java:checkVersion(747)) - Loaded RM state version info 1.5 > 2017-12-27 10:52:30,375 INFO security.RMDelegationTokenSecretManager > (RMDelegationTokenSecretManager.java:recover(196)) - recovering > RMDelegationTokenSecretManager. > 2017-12-27 10:52:30,380 INFO resourcemanager.RMAppManager > (RMAppManager.java:recover(561)) - Recovering 51 applications > 2017-12-27 10:52:30,432 INFO resourcemanager.RMAppManager > (RMAppManager.java:recover(571)) - Successfully recovered 0 out of 51 > applications > 2017-12-27 10:52:30,432 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(776)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) > does not have permission to submit/update application_1514268754125_0001 for 0 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.security.AccessControlException: User hrt_qa > (auth:SIMPLE) does not have permission to submit/update > application_1514268754125_0001 for 0 > ... 20 more > 2017-12-27 10:52:30,434 INFO service.AbstractService > (AbstractService.java:noteFailure(273)) - Service RMActiveServices failed in > state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) > does not have permission to submit/update application_1514268754125_0001 for 0 > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) > does not have permission to submit/update application_1514268754125_0001 for 0 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.security.AccessControlException: User hrt_qa > (auth:SIMPLE) does not have permission to submit/update > application_1514268754125_0001 for 0 > ... 20 more > 2017-12-27 10:52:30,435 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics > system... > 2017-12-27 10:52:30,435 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped. > 2017-12-27 10:52:30,436 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(607)) - ResourceManager metrics system > shutdown complete. > 2017-12-27 10:52:30,436 INFO event.AsyncDispatcher > (AsyncDispatcher.java:serviceStop(155)) - AsyncDispatcher is draining to > stop, ignoring any new events. > 2017-12-27 10:52:30,437 INFO event.AsyncDispatcher > (AsyncDispatcher.java:register(223)) - Registering class > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher > 2017-12-27 10:52:30,438 INFO security.NMTokenSecretManagerInRM > (NMTokenSecretManagerInRM.java:<init>(75)) - NMTokenKeyRollingInterval: > 86400000ms and NMTokenKeyActivationDelay: 900000ms > 2017-12-27 10:52:30,438 INFO security.RMContainerTokenSecretManager > (RMContainerTokenSecretManager.java:<init>(79)) - > ContainerTokenKeyRollingInterval: 86400000ms and > ContainerTokenKeyActivationDelay: 900000ms > 2017-12-27 10:52:30,438 INFO security.AMRMTokenSecretManager > (AMRMTokenSecretManager.java:<init>(94)) - AMRMTokenKeyRollingInterval: > 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms > 2017-12-27 10:52:30,439 INFO recovery.RMStateStoreFactory > (RMStateStoreFactory.java:getStore(33)) - Using RMStateStore implementation - > class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore > 2017-12-27 10:52:30,439 INFO event.AsyncDispatcher > (AsyncDispatcher.java:register(223)) - Registering class > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType > for class > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler > 2017-12-27 10:52:30,439 WARN curator.CuratorZookeeperClient > (CuratorZookeeperClient.java:<init>(96)) - session timeout [10000] is less > than connection timeout [15000] > 2017-12-27 10:52:30,440 INFO imps.CuratorFrameworkImpl > (CuratorFrameworkImpl.java:start(235)) - Starting > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org