The YARN resource manager keeps crashing in Ambari 2.0 + HDP 2.2.4.2 clusters
for me. The error log indicates that it can't connect to zookeeper, which
makes sense since I didn't provision zookeeper as I don't use it. I found the
relevant settings in the Ambari UI:
yarn.resourcemanager.zk-address = localhost:2181
yarn.resourcemanager.ha.enabled = false
Since HA is disabled, why is it trying to use Zookeeper at all? Attempts to
remove the zk-address setting, which was defaulted, are met with an error "This
field is required". Is there some way to stop the ResourceManager from
attempting to use Zookeeper? Should I open a JIRA ticket about this? Is this
an Ambari issue or a YARN issue?
Greg
Stack trace:
2015-05-13 14:49:32,144 FATAL resourcemanager.ResourceManager
(ResourceManager.java:main(1229)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for /rmstore
at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:581)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /rmstore
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:300)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:296)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1076)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1095)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createRootDir(ZKRMStateStore.java:296)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:279)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:478)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 12 more