It's possible to run top in batch mode and then pipe that to a file, which would let you test your theory that CPU usage is high.
It's also possible to configure the JVM to output GC activity to a log file, which would let you see if heavy GC activity is the root cause of any unresponsiveness. If you can predict when the DW broker will become unresponsive, you could attach JVisualVM before that time and then use the CPU Sampler (don't use the CPU Profiler on a production server!) to try to determine what the broker is doing during the time immediately before/during the period of unresponsiveness. But you have to be able to predict reasonably accurately when it will happen. Also, what GC algorithm are you using? Tim On Mon, Sep 25, 2017 at 9:29 AM, bbuzzard <billy.buzz...@bnsflogistics.com> wrote: > I'm using ActiveMQ-5.5.1 with a centralized broker feeding approximately > twenty other brokers via NetworkBridges. All of the brokers except for one > is working perfectly and have been for years. We recently moved our Data > Warehouse (DW) to the cloud and that broker seems to hang up and stop > communicating four or five times a day. > > I've used JMX to remotely monitor the centralized broker (HUB) and the DW > broker. The HUB continues to move files to/from all of the brokers except > for the DW. The HUB, via JMX, reports that the DW NetworkBridge is down, > but the DW broker says the NetworkBridge is up. > > I turned on transport tracing for both the HUB and the DW brokers and I can > clearly see the KeepAlive messages going to the DW broker and the responses > coming back until the HUB reports the NetworkBridge to the DW is down. My > JMX connection to the HUB continues to work and Heap and Nonheap usage seem > well within design limits, but the JMX connection to the DW returns a > timeout. > > I then tried logging into the DW (Linux box) and tried to run TOP. If took > almost a minute for the letters T, O, P, to echo back which suggested to me > that the box was under heavy cpu load. Just prior to the timeout, the DW > JMX connection showed that Heap and Nonheap were within design limits. May > supervisor asked two very valid questions: "How do I know if the DW Broker > did or did not use up heap if I cannot see heap usage via JMX?" and "Could > GC be stuck?". We also noticed that all ActiveMQ logging ceases while the > broker is hung. > > The DW broker is supposed to run continuously. The DW itself instantiates > several very large one shot processes every ten minutes and I suspect that > this is what is causing the DW broker and JMX to hang. > > Does anyone have experience troubleshooting a problem like this? What > should I do to prove that the problem is either the ActiveMQ broker or the > processed that the DW is instantiating? If someone has seen this problem > and fixed it, how did you fix it? > > The only way I found to fix the hung broker is to execute an activemq > restart that times out after thirty seconds and then does a kill on the > pid. > > > > > > > -- > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805 > .html >