I'm trying to embed zookeeper inside my application and having trouble
testing a simple failure scenario.  My test goes as follows:

 

Start up a 3 node cluster

Create a client against all 3 nodes

Add data A

Stop node 3

Add data B

Start the previously stopped node 3

Create a new client against node 3 only

Make sure data A + B are present.

 

Most of the time this works as expected but every once in a while it hangs
on multiple platforms (win64 and lin64) on the call to 'add data B'.  My
client automatically handles connection loss events and will try to
automatically replay the message until it succeeds however it is never able
to.  From reading the doc in the ops guide it seems to say that 3 nodes will
work "Thus, a deployment that consists of three machines can handle one
failure."  Am I missing something?   Do I need more than 3 nodes?  Logs and
stack traces are the logic for create:

 

11/06/14 10:52:52 INFO ZooKeeper.DEBUGLOG: shutdown 2

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 1,
my id = 2, error = java.nio.channels.AsynchronousCloseException

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 3,
my id = 2, error = java.nio.channels.AsynchronousCloseException

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2,
my id = 3, error = java.io.IOException: Channel eof

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2,
my id = 1, error = java.io.IOException: Channel eof

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue

java.lang.InterruptedException

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)

                at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)

                at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue

java.lang.InterruptedException

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)

                at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)

                at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread

11/06/14 10:52:52 INFO quorum.Learner: shutdown called

java.lang.Exception: shutdown Follower

                at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)

                at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)

11/06/14 10:52:52 INFO server.FinalRequestProcessor: shutdown of request
processor complete

11/06/14 10:52:52 WARN quorum.QuorumPeer: QuorumPeer main thread exited

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue

java.lang.InterruptedException

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)

                at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)

                at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue

java.lang.InterruptedException

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)

                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)

                at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)

                at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)

11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread

11/06/14 10:52:52 ERROR quorum.QuorumCnxManager: Exception while listening

java.nio.channels.AsynchronousCloseException

                at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptible
Channel.java:185)

                at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)

                at
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxMa
nager.java:478)

11/06/14 10:52:52 INFO quorum.QuorumCnxManager: Leaving listener

11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerSender is down

11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerReceiver is down

11/06/14 10:52:52 INFO zookeeper.LoggingWatcher: Saw event: WatchedEvent
state:Disconnected type:None path:null

11/06/14 10:52:52 ERROR zookeeper.ZooUtils$AieZooKeeper: ATTIVIO-PLATFORM-85
: Failed trying to talk to zookeeper, retrying... 

  org.apache.zookeeper.KeeperException$ConnectionLossException -
KeeperErrorCode = ConnectionLoss for /mugatu

org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /mugatu

                at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)

                at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)

                at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)

                at
com.attivio.app.config.zookeeper.ZooUtils$AieZooKeeper.create(ZooUtils.java:
277)

                at
com.attivio.app.config.zoo.ZooServerTest.check(ZooServerTest.java:116)

                at
com.attivio.app.config.zoo.ZooServerTest.helloQuroumZoo(ZooServerTest.java:7
8)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

                at java.lang.reflect.Method.invoke(Method.java:597)

                at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.
java:44)

                at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.j
ava:15)

                at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.ja
va:41)

                at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.jav
a:20)

                at
com.attivio.junit.internal.runners.statements.FailAndStackDumpOnTimeout$2.ca
ll(FailAndStackDumpOnTimeout.java:63)

                at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

                at java.util.concurrent.FutureTask.run(FutureTask.java:138)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

                at java.lang.Thread.run(Thread.java:619)

11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:16055

11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Socket connection established
to localhost/127.0.0.1:16055, initiating session

11/06/14 10:52:53 INFO server.NIOServerCnxn: Accepted socket connection from
/127.0.0.1:44515

11/06/14 10:52:53 INFO server.NIOServerCnxn: Client attempting to renew
session 0x2308ea3fab10000 at /127.0.0.1:44515

11/06/14 10:52:53 ERROR server.NIOServerCnxn: Thread
Thread[NIOServerCxn.Factory:localhost/127.0.0.1:16055,5,main] died

java.lang.AssertionError

                at
org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66)

                at
org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.ja
va:1552)

                at
org.apache.zookeeper.server.ZooKeeperServer.revalidateSession(ZooKeeperServe
r.java:529)

                at
org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.revalidateSession(L
eaderZooKeeperServer.java:167)

                at
org.apache.zookeeper.server.ZooKeeperServer.reopenSession(ZooKeeperServer.ja
va:537)

                at
org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.j
ava:775)

                at
org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:485
)

                at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:521)

                at
org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262
)

11/06/14 10:52:56 ERROR quorum.LearnerHandler: Unexpected exception causing
shutdown while sock still open

java.net.SocketTimeoutException: Read timed out

                at java.net.SocketInputStream.socketRead0(Native Method)

                at
java.net.SocketInputStream.read(SocketInputStream.java:129)

                at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

                at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)

                at java.io.DataInputStream.readInt(DataInputStream.java:370)

                at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)

                at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.jav
a:84)

                at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)

                at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:37
5)

11/06/14 10:52:56 WARN quorum.LearnerHandler: ******* GOODBYE
/127.0.0.1:34413 ********

11/06/14 10:53:06 INFO zookeeper.ClientCnxn: Client session timed out, have
not heard from server in 13333ms for sessionid 0x2308ea3fab10000, closing
socket connection and attempting reconnect

Reply via email to