[ https://issues.apache.org/jira/browse/YARN-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Saxena reassigned YARN-3804: ---------------------------------- Assignee: Varun Saxena > Both RM are on standBy state when kerberos user not in yarn.admin.acl > --------------------------------------------------------------------- > > Key: YARN-3804 > URL: https://issues.apache.org/jira/browse/YARN-3804 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: Suse 11 Sp3, 2 RM, Secure > Reporter: Bibin A Chundatt > Assignee: Varun Saxena > Priority: Critical > > Steps to reproduce > ================ > 1. Configure cluster in secure mode > 2. On RM Configure yarn.admin.acl=dsperf > 3. Configure in arn.resourcemanager.principal=yarn > 4. Start Both RM > Both RM will be in Standby forever > {code} > 2015-06-15 12:20:21,556 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > OPERATION=refreshAdminAcls TARGET=AdminService RESULT=FAILURE > DESCRIPTION=Unauthorized userPERMISSIONS= > 2015-06-15 12:20:21,556 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:645) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:518) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Can not execute > refreshAdminAcls > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > ... 4 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.security.AccessControlException: User yarn doesn't have > permission to call 'refreshAdminAcls' > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:230) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAdminAcls(AdminService.java:465) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:295) > ... 5 more > Caused by: org.apache.hadoop.security.AccessControlException: User yarn > doesn't have permission to call 'refreshAdminAcls' > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:182) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:148) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:223) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:228) > ... 7 more > {code} > *Analysis* > On each RM attempt to switch to Active refreshACl is called and acl > permission not available for the user > Infinite retry for the same switch to Active and always false returned from > {{ActiveStandbyElector#becomeActive()}} > > *Expected* > RM should get shutdown event after few retry or even at first attempt > Since at runtime user from which it retries for refreshacl can never be > updated. > *States from commands* > ./yarn rmadmin -getServiceState rm2 > *standby* > ./yarn rmadmin -getServiceState rm1 > *standby* > ./yarn rmadmin -checkHealth rm1 > *echo $? = 0* > ./yarn rmadmin -checkHealth rm2 > *echo $? = 0* -- This message was sent by Atlassian JIRA (v6.3.4#6332)