I have a 3 node running HA cluster with the following configuration. Machine 1 - Name Node 1 - Resource Manager 1 Machine 2 - Name Node 2 - Resource Manager 2 Machine 3 - Data Node 1 - Node Manager 1
Performed Rolling upgrade Hadoop cluster from 2.5.2 to 2.7.2 version and finalized the same. All other services started properly. Resource Manager alone does not turned to active state at all. This could be strange but the issue occurs only if I have submitted any jobs(say MR) in my old version 2.5.2; My old version Hadoop cluster had no issues. Disabled automatic failover and tried manual failover too. Same error. Here is sample resource manager logs: 2016-09-12 01:03:50,888 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2016-09-12 01:03:50,907 INFO org.apache.zookeeper.ZooKeeper: Session: 0x1571d6758d500e7 closed 2016-09-12 01:03:51,908 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=rootserver25.root.dinesh.lan:2181,rootserver33.root.dinesh.lan:2181,rootserver36.root.dinesh.lan:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@45d89439 2016-09-12 01:03:51,909 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rootserver25.root.dinesh.lan/fe80:0:0:0:f9a7:4fa1:76de:c8eb%6:2181. Will not attempt to authenticate using SASL (unknown error) 2016-09-12 01:03:51,909 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to rootserver25.root.dinesh.lan/fe80:0:0:0:f9a7:4fa1:76de:c8eb%6:2181, initiating session 2016-09-12 01:03:51,940 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server rootserver25.root.dinesh.lan/fe80:0:0:0:f9a7:4fa1:76de:c8eb%6:2181, sessionid = 0x1571d6758d500e8, negotiated timeout = 10000 2016-09-12 01:03:51,941 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-09-12 01:03:51,941 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2016-09-12 01:03:52,014 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2016-09-12 01:03:52,015 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0b73796e63636c757374657212067265736d7232 2016-09-12 01:03:52,015 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/synccluster/ActiveBreadCrumb to indicate that the local node is the most recent active... 2016-09-12 01:03:52,032 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:.../Hadoop/etc/hadoop/yarn-site.xml 2016-09-12 01:03:52,033 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=SYSTEM OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS 2016-09-12 01:03:52,033 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state 2016-09-12 01:03:52,034 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher 2016-09-12 01:03:52,034 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms 2016-09-12 01:03:52,034 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreFactory: Using RMStateStore implementation - class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher 2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher 2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher 2016-09-12 01:03:52,036 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system already initialized! 2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager 2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher 2016-09-12 01:03:52,036 WARN org.apache.hadoop.metrics2.util.MBeans: Error creating MBean object name: Hadoop:service=ResourceManager,name=RMNMInfo org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=ResourceManager,name=RMNMInfo already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:122) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:102) at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:92) at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:55) at org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.<init>(RMNMInfo.java:59) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:549) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=ResourceManager,name=RMNMInfo already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:118) ... 20 more 2016-09-12 01:03:52,037 WARN org.apache.hadoop.metrics2.util.MBeans: Failed to register MBean "null" javax.management.RuntimeOperationsException: Exception occurred trying to register the MBean at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:951) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) at org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.<init>(RMNMInfo.java:59) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:549) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: No object name specified at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:949) ... 21 more 2016-09-12 01:03:52,037 INFO org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo: Registered RMNMInfo MBean 2016-09-12 01:03:52,037 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2016-09-12 01:03:52,038 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.NodesListManager failed in state INITED; cause: org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics already exists! org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.registerMetrics(ClusterMetrics.java:74) at org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.getMetrics(ClusterMetrics.java:61) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.setDecomissionedNMsMetrics(NodesListManager.java:140) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.serviceInit(NodesListManager.java:81) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:551) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 2016-09-12 01:03:52,039 INFO org.apache.hadoop.service.AbstractService: Service RMActiveServices failed in state INITED; cause: org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics already exists! org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.registerMetrics(ClusterMetrics.java:74) at org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.getMetrics(ClusterMetrics.java:61) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.setDecomissionedNMsMetrics(NodesListManager.java:140) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.serviceInit(NodesListManager.java:81) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:551) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 2016-09-12 01:03:52,039 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STOPPED; cause: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:585) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)