I have experimented with ActiveMQ's replicated LevelDB store for quite some
time now and finally came to the conclusion that it is very "flaky" and
"brittle" at best.

I believe it is not ready for production yet and everybody who considers
using it in a hot production environment should probably think twice (or
trice).

For example, I have a running active/passive cluster with three instances
(one active, two passive) connecting to three Zookeeper nodes. I ran several
tests against this cluster and then left it sitting over night. The next
morning I came back I saw the following exceptions in the log files
*without* any activity!

On the active node:
2014-10-22 22:31:05,027 | WARN  | Unexpected session error:
java.io.IOException: Connection reset by peer |
org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
hawtdispatch-DEFAULT-2
2014-10-22 22:31:05,028 | INFO  | Slave has disconnected:
d8175d91-6602-4abf-bdfc-2ab07c40997e |
org.apache.activemq.leveldb.replicated.MasterLevelDBStore |
hawtdispatch-DEFAULT-2

On node #2 (passive):
2014-10-22 22:31:04,796 | INFO  | Client session timed out, have not heard
from server in 29584ms for sessionid 0x24937f0a2a70001, closing socket
connection and attempting reconnect | org.apache.zookeeper.ClientCnxn |
main-SendThread(uromahn-zk2:2181)
2014-10-22 22:31:05,098 | INFO  | Slave stopped |
org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ
BrokerService[brokergrp1] Task-3
2014-10-22 22:31:05,494 | INFO  | Opening socket connection to server
uromahn-zk3/10.XX.YY.ZZ:2181. Will not attempt to authenticate using SASL
(unknown error) | org.apache.zookeeper.ClientCnxn |
main-SendThread(uromahn-zk3:2181)
2014-10-22 22:31:05,524 | INFO  | Socket connection established to
uromahn-zk3/10.XX.YY.ZZ:2181, initiating session |
org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181)
2014-10-22 22:31:05,564 | WARN  | Expiration detected: trying to restart...
| org.apache.activemq.leveldb.replicated.groups.ZKClient | main-EventThread
2014-10-22 22:31:05,564 | INFO  | Initiating client connection,
connectString=uromahn-zk1:2181,uromahn-zk2:2181,uromahn-zk3:2181
sessionTimeout=2000
watcher=org.apache.activemq.leveldb.replicated.groups.ZKClient@3c53cec2 |
org.apache.zookeeper.ZooKeeper | main-EventThread
2014-10-22 22:31:05,566 | INFO  | Unable to reconnect to ZooKeeper service,
session 0x24937f0a2a70001 has expired, closing socket connection |
org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk3:2181)
2014-10-22 22:31:05,572 | INFO  | Opening socket connection to server
uromahn-zk1/10.XX.AA.BB:2181. Will not attempt to authenticate using SASL
(unknown error) | org.apache.zookeeper.ClientCnxn |
main-SendThread(uromahn-zk1:2181)
2014-10-22 22:31:05,579 | INFO  | EventThread shut down |
org.apache.zookeeper.ClientCnxn | main-EventThread
2014-10-22 22:31:05,597 | INFO  | Socket connection established to
uromahn-zk1/10.XX.AA.BB:2181, initiating session |
org.apache.zookeeper.ClientCnxn | main-SendThread(uromahn-zk1:2181)
2014-10-22 22:31:05,625 | INFO  | Session establishment complete on server
uromahn-zk1/10.XX.AA.BB:2181, sessionid = 0x14937f0a2840000, negotiated
timeout = 4000 | org.apache.zookeeper.ClientCnxn |
main-SendThread(uromahn-zk1:2181)
2014-10-22 22:31:05,745 | WARN  | listeners are taking too long to process
the events |
org.apache.activemq.leveldb.replicated.groups.ChangeListenerSupport |
ZooKeeper state change dispatcher thread

On node #3 (passive):
no entry int he log file for that time.

And these are the log entries in my corresponding Zookeeper nodes:
Node #1:
2014-10-22 22:31:05,604 [myid:1] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
Accepted socket connection from /10.XX.YY.ZZ:38788
2014-10-22 22:31:05,606 [myid:1] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
attempting to establish new session at /10.XX.YY.ZZ:38788
2014-10-22 22:31:05,612 [myid:1] - INFO 
[CommitProcessor:1:ZooKeeperServer@617] - Established session
0x14937f0a2840000 with negotiated timeout 4000 for client /10.XX.YY.ZZ:38788

Node #2:
2014-10-22 22:30:40,000 [myid:2] - INFO 
[SessionTracker:ZooKeeperServer@347] - Expiring session 0x24937f0a2a70001,
timeout of 4000ms exceeded
2014-10-22 22:30:40,002 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x24937f0a2a70001
2014-10-22 22:30:40,009 [myid:2] - INFO 
[CommitProcessor:2:NIOServerCnxn@1007] - Closed socket connection for client
/10.XX.YY.ZZ:52422 which had sessionid 0x24937f0a2a70001
2014-10-22 22:31:05,639 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
processing sessionid:0x14937f0a2840000 type:setData cxid:0x1
zxid:0x100000014 txntype:-1 reqpath:n/a Error
Path:/activemq-grp1/leveldb-stores/00000000001 Error:KeeperErrorCode =
NoNode for /activemq-grp1/leveldb-stores/00000000001

Node #3:
2014-10-22 22:30:39,972 [myid:3] - WARN 
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@118] - Got zxid
0x100000012 expected 0x1
2014-10-22 22:31:05,494 [myid:3] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
Accepted socket connection from /10.XX.YY.ZZ:60802
2014-10-22 22:31:05,497 [myid:3] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client
attempting to renew session 0x24937f0a2a70001 at /10.XX.YY.ZZ:60802
2014-10-22 22:31:05,497 [myid:3] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
client: 0x24937f0a2a70001
2014-10-22 22:31:05,516 [myid:3] - INFO 
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@610] - Invalid
session 0x24937f0a2a70001 for client /10.XX.YY.ZZ:60802, probably expired
2014-10-22 22:31:05,519 [myid:3] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
socket connection for client /10.XX.YY.ZZ:60802 which had sessionid
0x24937f0a2a70001

One should note and emphasize again: during the time I see those messages in
the log there was absolutely no activity on the broker and the Zookeeper
nodes (those are exclusively for my brokers)!

This simply doesn't give me the confidence of a robust solution - sorry
guys!



--
View this message in context: 
http://activemq.2283324.n4.nabble.com/ActiveMQ-Master-Slave-with-ZooKeeper-Session-Expiration-tp4686562p4686658.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to