Christian, after much pain and suffering I finally figured out what is going on. Our system is quite complicated and involves many producers that send large messages (600K-1.5M) to a relatively few multi-threaded consumers (services) which run "forever". The producers are transient and can be killed by our custom job scheduler at any time via kill -9 to make room for other producers. We run the broker with 10G heap.
The consumer is coded to group and cache Sessions with a Connection which has an inactivity timer associated with it. Every time a message is sent, the timer is restarted. If the timer pops (default 30minutes), the Sessions, MessageProducers and a Connection are closed due to inactivity. This worked perfectly fine until about 4 weeks ago when we started experiencing broker OOM problem. While the broker was running we could see a steady (fast) rise in the heap usage in a jConsole. After a couple of days the broker's jvm would OOM. The problem started happening when we introduced pingers for the Consumers. Every minute a pinger sends a message to a Consumer to make sure its alive. The Consumer replies to the pinger request and restarts inactivity timer. It took me awhile to see the bug in our application, but eventually I determined that our timer behaves incorrectly as it is associated with a Connection not individual Sessions. The Sessions go stale due to producer getting killed, and any messages in the broker referenced by ProducerExchange object are retained indefinitely causing a leak in the broker. As you explained it to me, the broker uses lazy approach to cleanup. Meaning it cleans up on a new message from the Producer. In our case, the Producer never sends anything and thus no cleanup is ever done. The fix for this is to create a timestamp for every Session when it was last used to message to the broker. At fixed intervals a Session Reaper thread wakes up and checks the timestamp of every Session to determine if it has been inactive for a max allowed time and if so, to close it. So the problem was caused by an application bug and the fact that the broker takes a lazy approach to cleanup. As a side note, under the described scenario, I've noticed that the broker memory usage (shown in jConsole) indicated 0 even though there were ton of messages in the heap with valid references (held by ProducerExchange). Thanks Christian for your help -Jerry C -- View this message in context: http://activemq.2283324.n4.nabble.com/Broker-Leak-tp4660437p4660618.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.
