If you haven't configured the broker to write a GC log, you can find out if you're doing full GCs by either attaching JConsole and looking at the Memory tab or by using the jstat -gc command. Both of those are only options if you haven't killed the process, so they probably won't help you now.
For what it's worth, I don't think Kevin's right about this being caused by continual full GCs. It's quite possible that a single full GC was the root cause of the disconnection that starts it all, but if the broker was doing continual full GCs like Kevin guessed, I don't think it would have successfully reconnected nor that you'd have seen all the log lines about duplicate messages. I think it's more likely that there's another bug related to how master/slave pairs handle network disconnection and reconnection. The log lines you posted look like they're from either the master or the slave that gets disconnected; which is it, and is there anything useful in the other's log? On Jan 10, 2015 3:35 PM, "Kevin Burton" <bur...@spinn3r.com> wrote: > $100 says it’s spending all its time doing full GC. Look at your GC log.. > I bet it will keep printing “Full GC” constantly. > > Kevin > > On Sat, Jan 10, 2015 at 7:19 AM, Zoltán Nyári <nyari.zol...@smartfront.hu> > wrote: > > > Hi, > > > > We have an ActiveMQ multibroker network with 4 nodes. There is 1 master > > node and 3 slave nodes, > > every message is broadcasted between all nodes using persistent topics > and > > durable subscriptions. > > ActiveMQ (version 5.9.1) is embedded into a Spring based Java Web > > Application running on Tomcat 7. > > It works well mostly, but today we had to face a serious issue for the > > second time. > > > > The connection between the master node and a slave node is lost: > > > > WARN 2015.01.10 14:00:29,281 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an > > exception processing local command > > java.net.SocketException: Connection reset > > at > > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) > > at java.net.SocketOutputStream.write(SocketOutputStream.java:159) > > at > > > > > org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115) > > at java.io.DataOutputStream.flush(DataOutputStream.java:123) > > at > > > > > org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:176) > > at > > > > > org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:304) > > at > > > > > org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:286) > > at > > > > > org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85) > > at > > > > > org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104) > > at > > > org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68) > > at > > > > > org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81) > > at > > > > > org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:968) > > at > > > > > org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:177) > > at > > > > > org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116) > > at > > > > > org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50) > > at > > > org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:247) > > at > > > > > org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133) > > at > > > org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > INFO 2015.01.10 14:00:29,335 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.io.EOFException > > INFO 2015.01.10 14:00:29,346 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.net.SocketException: Connection reset > > INFO 2015.01.10 14:00:29,346 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.io.EOFException > > INFO 2015.01.10 14:00:29,346 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.io.EOFException > > INFO 2015.01.10 14:00:29,346 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.io.EOFException > > INFO 2015.01.10 14:00:29,347 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.io.EOFException > > WARN 2015.01.10 14:00:29,346 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an > > exception processing local command > > java.io.EOFException > > at java.io.DataInputStream.readFully(DataInputStream.java:197) > > at java.io.DataInputStream.readFully(DataInputStream.java:169) > > at > > > > > org.apache.activemq.openwire.v10.BaseDataStreamMarshaller.tightUnmarshalByteSequence(BaseDataStreamMarshaller.java:440) > > at > > > > > org.apache.activemq.openwire.v10.MessageMarshaller.tightUnmarshal(MessageMarshaller.java:72) > > at > > > > > org.apache.activemq.openwire.v10.ActiveMQMessageMarshaller.tightUnmarshal(ActiveMQMessageMarshaller.java:66) > > at > > > > > org.apache.activemq.openwire.v10.ActiveMQTextMessageMarshaller.tightUnmarshal(ActiveMQTextMessageMarshaller.java:66) > > at > > > > > org.apache.activemq.openwire.OpenWireFormat.doUnmarshal(OpenWireFormat.java:371) > > at > > > > > org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:285) > > at > > > > > org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221) > > at > > > org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213) > > at > > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196) > > at java.lang.Thread.run(Thread.java:745) > > INFO 2015.01.10 14:00:29,347 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 > > @62000 > > shutdown due to a local error: java.io.EOFException > > INFO 2015.01.10 14:00:29,383 > > org.apache.activemq.network.DemandForwardingBridgeSupport - > > PERCMASTERBroker bridge to BALASSIBroker stopped > > > > > > After a few seconds the connection has been reestablished automatically: > > > > INFO 2015.01.10 14:00:45,563 > > org.apache.activemq.broker.TransportConnection - Started responder end of > > duplex bridge clientConnector@ID:balaserver1-45698-1418680167169-1:1 > > INFO 2015.01.10 14:00:45,570 > > org.apache.activemq.network.DemandForwardingBridgeSupport - Network > > connection between vm://PERCMASTERBroker#42 and tcp:///10.10.0.58:54625 > > @62000 > > (BALASSIBroker) has been established. > > > > But after then the cpu usage goes up to 100% (effectively killed the > > application) and the log full of duplicate message errors: > > > > WARN 2015.01.10 14:00:46,599 > > org.apache.activemq.broker.region.cursors.AbstractStoreCursor - > > org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a > > > :Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=0,cacheEnabled=true,maxBatchSize:1,hasSpace:true > > - cursor got duplicate: > ID:balaserver1-45698-1418680167169-6:1:2:1:36636, 4 > > WARN 2015.01.10 14:00:46,639 org.apache.activemq.broker.region.Queue - > > duplicate message from store > > ID:balaserver1-45698-1418680167169-6:1:2:1:36636, redirecting for dlq > > processing > > WARN 2015.01.10 14:00:48,526 > > org.apache.activemq.broker.region.cursors.AbstractStoreCursor - > > org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a > > > :Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true > > - cursor got duplicate: > ID:balaserver1-45698-1418680167169-6:1:2:1:36637, 4 > > WARN 2015.01.10 14:00:48,528 org.apache.activemq.broker.region.Queue - > > duplicate message from store > > ID:balaserver1-45698-1418680167169-6:1:2:1:36637, redirecting for dlq > > processing > > WARN 2015.01.10 14:00:48,546 > > org.apache.activemq.broker.region.cursors.AbstractStoreCursor - > > org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a > > > :Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:3,hasSpace:true > > - cursor got duplicate: > ID:balaserver1-45698-1418680167169-6:1:3:1:37026, 4 > > WARN 2015.01.10 14:00:48,547 org.apache.activemq.broker.region.Queue - > > duplicate message from store > > ID:balaserver1-45698-1418680167169-6:1:3:1:37026, redirecting for dlq > > processing > > WARN 2015.01.10 14:00:48,560 > > org.apache.activemq.broker.region.cursors.AbstractStoreCursor - > > org.apache.activemq.broker.region.cursors.QueueStorePrefetch@4ed7430a > > > :Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true > > - cursor got duplicate: > ID:balaserver1-45698-1418680167169-6:1:2:1:36638, 4 > > > > The only thing we could do is stop the application (on all nodes), clear > > the activemq-data folder and restart the application. > > When we simply tried to restart it, the startup process has stucked with > no > > error message, just using 100% of the CPU. > > > > Do you have any idea what could have happened here? > > > > Thanks, > > > > Zoltan Nyari > > > > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> >