[ 
https://issues.apache.org/jira/browse/YARN-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-7692:
-----------------------------

    Assignee: Sunil G

> Resource Manager goes down when a user not included in a priority acl submits 
> a job
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-7692
>                 URL: https://issues.apache.org/jira/browse/YARN-7692
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.9.0, 2.8.3, 3.0.0
>            Reporter: Charan Hebri
>            Assignee: Sunil G
>
> Test scenario
> ------------------
> 1. A cluster is created, no ACLs are included
> 2. Submit jobs with an existing user say 'user_a'
> 3. Enable ACLs and create a priority ACL entry via the property 
> yarn.scheduler.capacity.priority-acls. Do not include the user, 'user_a' in 
> this ACL.
> 4. Submit a job with the 'user_a'
> The observed behavior in this case is that the job is rejected as 'user_a' 
> does not have the permission to run the job which is expected behavior. But 
> Resource Manager also goes down when it tries to recover previous 
> applications and fails to recover them.
> Below is the exception seen,
> {noformat}
> 2017-12-27 10:52:30,064 INFO  conf.Configuration 
> (Configuration.java:getConfResourceAsInputStream(2659)) - found resource 
> yarn-site.xml at file:/etc/hadoop/3.0.0.0-636/0/yarn-site.xml
> 2017-12-27 10:52:30,065 INFO  scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:setClusterMaxPriority(911)) - Updated the cluste 
> max priority to maxClusterLevelAppPriority = 10
> 2017-12-27 10:52:30,066 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToActive(1177)) - Transitioning to active 
> state
> 2017-12-27 10:52:30,097 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(765)) - Recovery started
> 2017-12-27 10:52:30,102 INFO  recovery.RMStateStore 
> (RMStateStore.java:checkVersion(747)) - Loaded RM state version info 1.5
> 2017-12-27 10:52:30,375 INFO  security.RMDelegationTokenSecretManager 
> (RMDelegationTokenSecretManager.java:recover(196)) - recovering 
> RMDelegationTokenSecretManager.
> 2017-12-27 10:52:30,380 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(561)) - Recovering 51 applications
> 2017-12-27 10:52:30,432 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(571)) - Successfully recovered 0 out of 51 
> applications
> 2017-12-27 10:52:30,432 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(776)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) 
> does not have permission to submit/update application_1514268754125_0001 for 0
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.security.AccessControlException: User hrt_qa 
> (auth:SIMPLE) does not have permission to submit/update 
> application_1514268754125_0001 for 0
>         ... 20 more
> 2017-12-27 10:52:30,434 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(273)) - Service RMActiveServices failed in 
> state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) 
> does not have permission to submit/update application_1514268754125_0001 for 0
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) 
> does not have permission to submit/update application_1514268754125_0001 for 0
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.security.AccessControlException: User hrt_qa 
> (auth:SIMPLE) does not have permission to submit/update 
> application_1514268754125_0001 for 0
>         ... 20 more
> 2017-12-27 10:52:30,435 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics 
> system...
> 2017-12-27 10:52:30,435 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> 2017-12-27 10:52:30,436 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(607)) - ResourceManager metrics system 
> shutdown complete.
> 2017-12-27 10:52:30,436 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(155)) - AsyncDispatcher is draining to 
> stop, ignoring any new events.
> 2017-12-27 10:52:30,437 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:register(223)) - Registering class 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
> 2017-12-27 10:52:30,438 INFO  security.NMTokenSecretManagerInRM 
> (NMTokenSecretManagerInRM.java:<init>(75)) - NMTokenKeyRollingInterval: 
> 86400000ms and NMTokenKeyActivationDelay: 900000ms
> 2017-12-27 10:52:30,438 INFO  security.RMContainerTokenSecretManager 
> (RMContainerTokenSecretManager.java:<init>(79)) - 
> ContainerTokenKeyRollingInterval: 86400000ms and 
> ContainerTokenKeyActivationDelay: 900000ms
> 2017-12-27 10:52:30,438 INFO  security.AMRMTokenSecretManager 
> (AMRMTokenSecretManager.java:<init>(94)) - AMRMTokenKeyRollingInterval: 
> 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms
> 2017-12-27 10:52:30,439 INFO  recovery.RMStateStoreFactory 
> (RMStateStoreFactory.java:getStore(33)) - Using RMStateStore implementation - 
> class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
> 2017-12-27 10:52:30,439 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:register(223)) - Registering class 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType 
> for class 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
> 2017-12-27 10:52:30,439 WARN  curator.CuratorZookeeperClient 
> (CuratorZookeeperClient.java:<init>(96)) - session timeout [10000] is less 
> than connection timeout [15000]
> 2017-12-27 10:52:30,440 INFO  imps.CuratorFrameworkImpl 
> (CuratorFrameworkImpl.java:start(235)) - Starting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to