Hi, We have an ActiveMQ multibroker network with 4 nodes. There is 1 master node and 3 slave nodes, every message is broadcasted between all nodes using persistent topics and durable subscriptions. ActiveMQ (version 5.9.1) is embedded into a Spring based Java Web Application running on Tomcat 7. It works well mostly, but today we had to face a serious issue for the second time.
The connection between the master node and a slave node is lost: WARN 2015.01.10 14:00:29,281 org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an exception processing local command java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:176) at org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:304) at org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:286) at org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85) at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68) at org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81) at org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:968) at org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:177) at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116) at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50) at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:247) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) INFO 2015.01.10 14:00:29,335 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.io.EOFException INFO 2015.01.10 14:00:29,346 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.net.SocketException: Connection reset INFO 2015.01.10 14:00:29,346 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.io.EOFException INFO 2015.01.10 14:00:29,346 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.io.EOFException INFO 2015.01.10 14:00:29,346 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.io.EOFException INFO 2015.01.10 14:00:29,347 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.io.EOFException WARN 2015.01.10 14:00:29,346 org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an exception processing local command java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.activemq.openwire.v10.BaseDataStreamMarshaller.tightUnmarshalByteSequence(BaseDataStreamMarshaller.java:440) at org.apache.activemq.openwire.v10.MessageMarshaller.tightUnmarshal(MessageMarshaller.java:72) at org.apache.activemq.openwire.v10.ActiveMQMessageMarshaller.tightUnmarshal(ActiveMQMessageMarshaller.java:66) at org.apache.activemq.openwire.v10.ActiveMQTextMessageMarshaller.tightUnmarshal(ActiveMQTextMessageMarshaller.java:66) at org.apache.activemq.openwire.OpenWireFormat.doUnmarshal(OpenWireFormat.java:371) at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:285) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196) at java.lang.Thread.run(Thread.java:745) INFO 2015.01.10 14:00:29,347 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819@62000 shutdown due to a local error: java.io.EOFException INFO 2015.01.10 14:00:29,383 org.apache.activemq.network.DemandForwardingBridgeSupport - PERCMASTERBroker bridge to BALASSIBroker stopped After a few seconds the connection has been reestablished automatically: INFO 2015.01.10 14:00:45,563 org.apache.activemq.broker.TransportConnection - Started responder end of duplex bridge clientConnector@ID:balaserver1-45698-1418680167169-1:1 INFO 2015.01.10 14:00:45,570 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://PERCMASTERBroker#42 and tcp:///10.10.0.58:54625@62000 (BALASSIBroker) has been established. But after then the cpu usage goes up to 100% (effectively killed the application) and the log full of duplicate message errors: WARN 2015.01.10 14:00:46,599 org.apache.activemq.broker.region.cursors.AbstractStoreCursor - org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=0,cacheEnabled=true,maxBatchSize:1,hasSpace:true - cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36636, 4 WARN 2015.01.10 14:00:46,639 org.apache.activemq.broker.region.Queue - duplicate message from store ID:balaserver1-45698-1418680167169-6:1:2:1:36636, redirecting for dlq processing WARN 2015.01.10 14:00:48,526 org.apache.activemq.broker.region.cursors.AbstractStoreCursor - org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true - cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36637, 4 WARN 2015.01.10 14:00:48,528 org.apache.activemq.broker.region.Queue - duplicate message from store ID:balaserver1-45698-1418680167169-6:1:2:1:36637, redirecting for dlq processing WARN 2015.01.10 14:00:48,546 org.apache.activemq.broker.region.cursors.AbstractStoreCursor - org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:3,hasSpace:true - cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:3:1:37026, 4 WARN 2015.01.10 14:00:48,547 org.apache.activemq.broker.region.Queue - duplicate message from store ID:balaserver1-45698-1418680167169-6:1:3:1:37026, redirecting for dlq processing WARN 2015.01.10 14:00:48,560 org.apache.activemq.broker.region.cursors.AbstractStoreCursor - org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true - cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36638, 4 The only thing we could do is stop the application (on all nodes), clear the activemq-data folder and restart the application. When we simply tried to restart it, the startup process has stucked with no error message, just using 100% of the CPU. Do you have any idea what could have happened here? Thanks, Zoltan Nyari