Hi,

We have an ActiveMQ multibroker network with 4 nodes. There is 1 master
node and 3 slave nodes,
every message is broadcasted between all nodes using persistent topics and
durable subscriptions.
ActiveMQ (version 5.9.1) is embedded into a Spring based Java Web
Application running on Tomcat 7.
It works well mostly, but today we had to face a serious issue for the
second time.

The connection between the master node and a slave node is lost:

WARN   2015.01.10 14:00:29,281
org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an
exception processing local command
java.net.SocketException: Connection reset
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:176)
        at
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:304)
        at
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:286)
        at
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
        at
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
        at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
        at
org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
        at
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:968)
        at
org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:177)
        at
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
        at
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
        at
org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:247)
        at
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133)
        at
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
INFO   2015.01.10 14:00:29,335
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.io.EOFException
INFO   2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.net.SocketException: Connection reset
INFO   2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.io.EOFException
INFO   2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.io.EOFException
INFO   2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.io.EOFException
INFO   2015.01.10 14:00:29,347
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.io.EOFException
WARN   2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an
exception processing local command
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at
org.apache.activemq.openwire.v10.BaseDataStreamMarshaller.tightUnmarshalByteSequence(BaseDataStreamMarshaller.java:440)
        at
org.apache.activemq.openwire.v10.MessageMarshaller.tightUnmarshal(MessageMarshaller.java:72)
        at
org.apache.activemq.openwire.v10.ActiveMQMessageMarshaller.tightUnmarshal(ActiveMQMessageMarshaller.java:66)
        at
org.apache.activemq.openwire.v10.ActiveMQTextMessageMarshaller.tightUnmarshal(ActiveMQTextMessageMarshaller.java:66)
        at
org.apache.activemq.openwire.OpenWireFormat.doUnmarshal(OpenWireFormat.java:371)
        at
org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:285)
        at
org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)
        at
org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)
        at
org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
        at java.lang.Thread.run(Thread.java:745)
INFO   2015.01.10 14:00:29,347
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000
shutdown due to a local error: java.io.EOFException
INFO   2015.01.10 14:00:29,383
org.apache.activemq.network.DemandForwardingBridgeSupport -
PERCMASTERBroker bridge to BALASSIBroker stopped


After a few seconds the connection has been reestablished automatically:

INFO   2015.01.10 14:00:45,563
org.apache.activemq.broker.TransportConnection - Started responder end of
duplex bridge clientConnector@ID:balaserver1-45698-1418680167169-1:1
INFO   2015.01.10 14:00:45,570
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#42 and tcp:///10.10.0.58:54625@62000
(BALASSIBroker) has been established.

But after then the cpu usage goes up to 100% (effectively killed the
application) and the log full of duplicate message errors:

WARN   2015.01.10 14:00:46,599
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=0,cacheEnabled=true,maxBatchSize:1,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36636, 4
WARN   2015.01.10 14:00:46,639 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:2:1:36636, redirecting for dlq
processing
WARN   2015.01.10 14:00:48,526
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36637, 4
WARN   2015.01.10 14:00:48,528 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:2:1:36637, redirecting for dlq
processing
WARN   2015.01.10 14:00:48,546
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:3,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:3:1:37026, 4
WARN   2015.01.10 14:00:48,547 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:3:1:37026, redirecting for dlq
processing
WARN   2015.01.10 14:00:48,560
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36638, 4

The only thing we could do is stop the application (on all nodes), clear
the activemq-data folder and restart the application.
When we simply tried to restart it, the startup process has stucked with no
error message, just using 100% of the CPU.

Do you have any idea what could have happened here?

Thanks,

Zoltan Nyari

Reply via email to