I have experimented with ActiveMQ's replicated LevelDB store for quite some time now and finally came to the conclusion that it is very "flaky" and "brittle" at best.
I believe it is not ready for production yet and everybody who considers using it in a hot production environment should probably think twice (or trice). For example, I have a running active/passive cluster with three instances (one active, two passive) connecting to three Zookeeper nodes. I ran several tests against this cluster and then left it sitting over night. The next morning I came back I saw the following exceptions in the log files *without* any activity! On the active node: 2014-10-22 22:31:05,027 | WARN | Unexpected session error: java.io.IOException: Connection reset by peer | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 2014-10-22 22:31:05,028 | INFO | Slave has disconnected: d8175d91-6602-4abf-bdfc-2ab07c40997e | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 On node #2 (passive): 2014-10-22 22:31:04,796 | INFO | Client session timed out, have not heard from server in 29584ms for sessionid 0x24937f0a2a70001, closing socket connection and attempting reconnect | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk2:2181) 2014-10-22 22:31:05,098 | INFO | Slave stopped | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[brokergrp1] Task-3 2014-10-22 22:31:05,494 | INFO | Opening socket connection to server uromahn-zk3/10.XX.YY.ZZ:2181. Will not attempt to authenticate using SASL (unknown error) | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181) 2014-10-22 22:31:05,524 | INFO | Socket connection established to uromahn-zk3/10.XX.YY.ZZ:2181, initiating session | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181) 2014-10-22 22:31:05,564 | WARN | Expiration detected: trying to restart... | org.apache.activemq.leveldb.replicated.groups.ZKClient | main-EventThread 2014-10-22 22:31:05,564 | INFO | Initiating client connection, connectString=uromahn-zk1:2181,uromahn-zk2:2181,uromahn-zk3:2181 sessionTimeout=2000 watcher=org.apache.activemq.leveldb.replicated.groups.ZKClient@3c53cec2 | org.apache.zookeeper.ZooKeeper | main-EventThread 2014-10-22 22:31:05,566 | INFO | Unable to reconnect to ZooKeeper service, session 0x24937f0a2a70001 has expired, closing socket connection | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181) 2014-10-22 22:31:05,572 | INFO | Opening socket connection to server uromahn-zk1/10.XX.AA.BB:2181. Will not attempt to authenticate using SASL (unknown error) | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181) 2014-10-22 22:31:05,579 | INFO | EventThread shut down | org.apache.zookeeper.ClientCnxn | main-EventThread 2014-10-22 22:31:05,597 | INFO | Socket connection established to uromahn-zk1/10.XX.AA.BB:2181, initiating session | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181) 2014-10-22 22:31:05,625 | INFO | Session establishment complete on server uromahn-zk1/10.XX.AA.BB:2181, sessionid = 0x14937f0a2840000, negotiated timeout = 4000 | org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181) 2014-10-22 22:31:05,745 | WARN | listeners are taking too long to process the events | org.apache.activemq.leveldb.replicated.groups.ChangeListenerSupport | ZooKeeper state change dispatcher thread On node #3 (passive): no entry int he log file for that time. And these are the log entries in my corresponding Zookeeper nodes: Node #1: 2014-10-22 22:31:05,604 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.XX.YY.ZZ:38788 2014-10-22 22:31:05,606 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.XX.YY.ZZ:38788 2014-10-22 22:31:05,612 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14937f0a2840000 with negotiated timeout 4000 for client /10.XX.YY.ZZ:38788 Node #2: 2014-10-22 22:30:40,000 [myid:2] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x24937f0a2a70001, timeout of 4000ms exceeded 2014-10-22 22:30:40,002 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24937f0a2a70001 2014-10-22 22:30:40,009 [myid:2] - INFO [CommitProcessor:2:NIOServerCnxn@1007] - Closed socket connection for client /10.XX.YY.ZZ:52422 which had sessionid 0x24937f0a2a70001 2014-10-22 22:31:05,639 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14937f0a2840000 type:setData cxid:0x1 zxid:0x100000014 txntype:-1 reqpath:n/a Error Path:/activemq-grp1/leveldb-stores/00000000001 Error:KeeperErrorCode = NoNode for /activemq-grp1/leveldb-stores/00000000001 Node #3: 2014-10-22 22:30:39,972 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@118] - Got zxid 0x100000012 expected 0x1 2014-10-22 22:31:05,494 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.XX.YY.ZZ:60802 2014-10-22 22:31:05,497 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x24937f0a2a70001 at /10.XX.YY.ZZ:60802 2014-10-22 22:31:05,497 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x24937f0a2a70001 2014-10-22 22:31:05,516 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@610] - Invalid session 0x24937f0a2a70001 for client /10.XX.YY.ZZ:60802, probably expired 2014-10-22 22:31:05,519 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.XX.YY.ZZ:60802 which had sessionid 0x24937f0a2a70001 One should note and emphasize again: during the time I see those messages in the log there was absolutely no activity on the broker and the Zookeeper nodes (those are exclusively for my brokers)! This simply doesn't give me the confidence of a robust solution - sorry guys! -- View this message in context: http://activemq.2283324.n4.nabble.com/ActiveMQ-Master-Slave-with-ZooKeeper-Session-Expiration-tp4686562p4686658.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.