[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Naganarasimha G R updated YARN-3152: ------------------------------------ Description: NI have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception: 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) ... 5 more 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file. I just think as of a improvement: - Should RMs ignore the missing file as the NNs did? - Should single RM fail even when the file is not present? Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs). was: I have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception: 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) ... 5 more 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file. I just think as of a improvement: - Should RMs ignore the missing file as the NNs did? - Should single RM fail even when the file is not present? Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs). > Missing hadoop exclude file fails RMs in HA > ------------------------------------------- > > Key: YARN-3152 > URL: https://issues.apache.org/jira/browse/YARN-3152 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Environment: Debian 7 > Reporter: Neill Lima > Assignee: Naganarasimha G R > > NI have two NNs in HA, they do not fail when the exclude file is not present > (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in > HA. I didn't create the exclude file at this point as well. I applied the HA > RM settings properly and when I started both RMs I started getting this > exception: > 2015-02-06 12:25:25,326 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root > OPERATION=transitionToActive TARGET=RMHAProtocolService > RESULT=FAILURE DESCRIPTION=Exception transitioning to active > PERMISSIONS=All users are allowed > 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException: > java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file > or directory) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) > ... 5 more > 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x44af32566180094 closed > 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 > sessionTimeout=10000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c > 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate > using SASL (unknown error) > 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to x.x.x.x/x.x.x.x:2181, initiating session > The issue is descriptive enough to resolve the problem - and it has been > fixed by creating the exclude file. > I just think as of a improvement: > - Should RMs ignore the missing file as the NNs did? > - Should single RM fail even when the file is not present? > Just suggesting this improvement to keep the behavior consistent when working > with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)