Hello.
I have setup'ed hadoop HA-cluster with autofailoer on namenodes and resource manager by this manuals

http://www.oracle.com/technetwork/articles/servers-storage-admin/hadoop-cluster-solaris-2203962.html#16
http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

So, when i halt only hadoop daemon, zookeeper swithes to active NameNode and ResMan. But when i halt a whole server (with zookeeper member of quorum) switches only ResMan.
I have tried many configurations.

here zoo.cfg

tickTime=2000
initLimit=5
syncLimit=2
dataDir=/var/zookeeper/data
clientPort=2181
cnxTimeout=3

server.1=name-node1:2888:3888
server.2=name-node2:2888:3888
server.3=resource-manager:2888:3888
server.4=resource-manager2:2888:3888
server.5=data-node1:2888:3888
server.6=data-node2:2888:3888

group.1=1:2:5
group.2=3:4:6

core-site.xml

  <property>
    <name>ha.zookeeper.quorum</name>
    <value>name-node1:2181,name-node2:2181,data-node1:2181</value>
  </property>

yarn-site.xml

  <property>
    <name>yarn.resourcemanager.zk-address</name>

<value>resource-manager:2181,resource-manager2:2181,data-node2:2181</value>
  </property>

When i halted whole host name-node1 at zookeeper's log i see next:

2015-05-21 13:24:22,177 [myid:5] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for id 3, my id = 5, error =
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765) 2015-05-21 13:24:22,178 [myid:5] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker 2015-05-21 13:24:22,179 [myid:5] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@697] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849) at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685) 2015-05-21 13:24:22,179 [myid:5] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@706] - Send worker leaving thread

When i halted whole host resource-manager at zookeeper's log i see next:


2015-05-21 13:24:22,990 [myid:4] - INFO [ProcessThread(sid:4 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x34d767b51ef0000 type:create cxid:0x9 zxid:0x1c0000004e txntype:-1 reqpath:n/a Error Path:/yarn-leader-election/dph-rm/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /yarn-leader-election/dph-rm/ActiveStandbyElectorLock

After this ResMan2 became an active.

What i am doing wrong?
Thanks.

Reply via email to