Another issue that we noticed is when broker goes OOM due to direct memory,
it doesn't create heap dump (specified by "-XX:+HeapDumpOnOutOfMemoryError"),
even when the OOM error is same as what is mentioned in the oracle JVM docs
("java.lang.OutOfMemoryError").Has anyone been able to find a way to get to heap dump for DM OOM? - Ramayan On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari <[email protected]> wrote: > Alex, > > Below are the flow to disk logs from broker having 3million+ messages at > this time. We only have one virtual host. Time is in GMT. Looks like flow > to disk is active on the whole virtual host and not a queue level. > > When the same broker went OOM yesterday, I did not see any flow to disk > logs from when it was started until it crashed (crashed twice within 4hrs). > > > 4/19/17 4:17:43.509 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3356539KB > exceeds threshold 3355443KB > 4/19/17 2:31:13.502 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1015 : Message flow to disk inactive : Message memory use 3354866KB > within threshold 3355443KB > 4/19/17 2:28:43.511 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3358509KB > exceeds threshold 3355443KB > 4/19/17 2:20:13.500 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1015 : Message flow to disk inactive : Message memory use 3353501KB > within threshold 3355443KB > 4/19/17 2:18:13.500 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3357544KB > exceeds threshold 3355443KB > 4/19/17 2:08:43.501 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1015 : Message flow to disk inactive : Message memory use 3353236KB > within threshold 3355443KB > 4/19/17 2:08:13.501 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3356704KB > exceeds threshold 3355443KB > 4/19/17 2:00:43.500 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1015 : Message flow to disk inactive : Message memory use 3353511KB > within threshold 3355443KB > 4/19/17 2:00:13.504 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3357948KB > exceeds threshold 3355443KB > 4/19/17 1:50:43.501 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1015 : Message flow to disk inactive : Message memory use 3355310KB > within threshold 3355443KB > 4/19/17 1:47:43.501 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3365624KB > exceeds threshold 3355443KB > 4/19/17 1:43:43.501 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1015 : Message flow to disk inactive : Message memory use 3355136KB > within threshold 3355443KB > 4/19/17 1:31:43.509 AM INFO [Housekeeping[test]] - [Housekeeping[test]] > BRK-1014 : Message flow to disk active : Message memory use 3358683KB > exceeds threshold 3355443KB > > > After production release (2days back), we have seen 4 crashes in 3 > different brokers, this is the most pressing concern for us in decision if > we should roll back to 0.32. Any help is greatly appreciated. > > Thanks > Ramayan > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy <[email protected]> wrote: > >> Ramayan, >> Thanks for the details. I would like to clarify whether flow to disk was >> triggered today for 3 million messages? >> >> The following logs are issued for flow to disk: >> BRK-1014 : Message flow to disk active : Message memory use >> {0,number,#}KB >> exceeds threshold {1,number,#.##}KB >> BRK-1015 : Message flow to disk inactive : Message memory use >> {0,number,#}KB within threshold {1,number,#.##}KB >> >> Kind Regards, >> Alex >> >> >> On 19 April 2017 at 17:10, Ramayan Tiwari <[email protected]> >> wrote: >> >> > Hi Alex, >> > >> > Thanks for your response, here are the details: >> > >> > We use "direct" exchange, without persistence (we specify NON_PERSISTENT >> > that while sending from client) and use BDB store. We use JSON virtual >> host >> > type. We are not using SSL. >> > >> > When the broker went OOM, we had around 1.3 million messages with 100 >> bytes >> > average message size. Direct memory allocation (value read from MBean) >> kept >> > going up, even though it wouldn't need more DM to store these many >> > messages. DM allocated persisted at 99% for about 3 and half hours >> before >> > crashing. >> > >> > Today, on the same broker we have 3 million messages (same message size) >> > and DM allocated is only at 8%. This seems like there is some issue with >> > de-allocation or a leak. >> > >> > I have uploaded the memory utilization graph here: >> > https://drive.google.com/file/d/0Bwi0MEV3srPRVHFEbDlIYUpLaUE/ >> > view?usp=sharing >> > Blue line is DM allocated, Yellow is DM Used (sum of queue payload) and >> Red >> > is heap usage. >> > >> > Thanks >> > Ramayan >> > >> > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr Rudyy <[email protected]> >> wrote: >> > >> > > Hi Ramayan, >> > > >> > > Could please share with us the details of messaging use case(s) which >> > ended >> > > up in OOM on broker side? >> > > I would like to reproduce the issue on my local broker in order to fix >> > it. >> > > I would appreciate if you could provide as much details as possible, >> > > including, messaging topology, message persistence type, message >> > > sizes,volumes, etc. >> > > >> > > Qpid Broker 6.0.x uses direct memory for keeping message content and >> > > receiving/sending data. Each plain connection utilizes 512K of direct >> > > memory. Each SSL connection uses 1M of direct memory. Your memory >> > settings >> > > look Ok to me. >> > > >> > > Kind Regards, >> > > Alex >> > > >> > > >> > > On 18 April 2017 at 23:39, Ramayan Tiwari <[email protected]> >> > > wrote: >> > > >> > > > Hi All, >> > > > >> > > > We are using Java broker 6.0.5, with patch to use MultiQueueConsumer >> > > > feature. We just finished deploying to production and saw couple of >> > > > instances of broker OOM due to running out of DirectMemory buffer >> > > > (exceptions at the end of this email). >> > > > >> > > > Here is our setup: >> > > > 1. Max heap 12g, max direct memory 4g (this is opposite of what the >> > > > recommendation is, however, for our use cause message payload is >> really >> > > > small ~400bytes and is way less than the per message overhead of >> 1KB). >> > In >> > > > perf testing, we were able to put 2 million messages without any >> > issues. >> > > > 2. ~400 connections to broker. >> > > > 3. Each connection has 20 sessions and there is one multi queue >> > consumer >> > > > attached to each session, listening to around 1000 queues. >> > > > 4. We are still using 0.16 client (I know). >> > > > >> > > > With the above setup, the baseline utilization (without any >> messages) >> > for >> > > > direct memory was around 230mb (with 410 connection each taking >> 500KB). >> > > > >> > > > Based on our understanding of broker memory allocation, message >> payload >> > > > should be the only thing adding to direct memory utilization (on >> top of >> > > > baseline), however, we are experiencing something completely >> different. >> > > In >> > > > our last broker crash, we see that broker is constantly running with >> > 90%+ >> > > > direct memory allocated, even when message payload sum from all the >> > > queues >> > > > is only 6-8% (these % are against available DM of 4gb). During these >> > high >> > > > DM usage period, heap usage was around 60% (of 12gb). >> > > > >> > > > We would like some help in understanding what could be the reason of >> > > these >> > > > high DM allocations. Are there things other than message payload and >> > AMQP >> > > > connection, which use DM and could be contributing to these high >> usage? >> > > > >> > > > Another thing where we are puzzled is the de-allocation of DM byte >> > > buffers. >> > > > From log mining of heap and DM utilization, de-allocation of DM >> doesn't >> > > > correlate with heap GC. If anyone has seen any documentation >> related to >> > > > this, it would be very helpful if you could share that. >> > > > >> > > > Thanks >> > > > Ramayan >> > > > >> > > > >> > > > *Exceptions* >> > > > >> > > > java.lang.OutOfMemoryError: Direct buffer memory >> > > > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_40] >> > > > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) >> > > > ~[na:1.8.0_40] >> > > > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) >> > > ~[na:1.8.0_40] >> > > > at >> > > > org.apache.qpid.bytebuffer.QpidByteBuffer.allocateDirect( >> > > > QpidByteBuffer.java:474) >> > > > ~[qpid-common-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnectionPlainD >> elegate. >> > > > restoreApplicationBufferForWrite(NonBlockingConnectionPlainDele >> > > > gate.java:93) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnectionPlainDele >> > > > gate.processData(NonBlockingConnectionPlainDelegate.java:60) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnection.doRead( >> > > > NonBlockingConnection.java:506) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnection.doWork( >> > > > NonBlockingConnection.java:285) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NetworkConnectionScheduler. >> > > > processConnection(NetworkConnectionScheduler.java:124) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.SelectorThread$ConnectionPr >> ocessor. >> > > > processConnection(SelectorThread.java:504) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.SelectorThread$ >> > > > SelectionTask.performSelect(SelectorThread.java:337) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run( >> > > > SelectorThread.java:87) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.SelectorThread.run( >> > > > SelectorThread.java:462) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > java.util.concurrent.ThreadPoolExecutor.runWorker( >> > > > ThreadPoolExecutor.java:1142) >> > > > ~[na:1.8.0_40] >> > > > at >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( >> > > > ThreadPoolExecutor.java:617) >> > > > ~[na:1.8.0_40] >> > > > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] >> > > > >> > > > >> > > > >> > > > *Second exception* >> > > > java.lang.OutOfMemoryError: Direct buffer memory >> > > > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_40] >> > > > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) >> > > > ~[na:1.8.0_40] >> > > > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) >> > > ~[na:1.8.0_40] >> > > > at >> > > > org.apache.qpid.bytebuffer.QpidByteBuffer.allocateDirect( >> > > > QpidByteBuffer.java:474) >> > > > ~[qpid-common-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnectionPlainDele >> > > > gate.<init>(NonBlockingConnectionPlainDelegate.java:45) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnection. >> > > > setTransportEncryption(NonBlockingConnection.java:625) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingConnection.<init>( >> > > > NonBlockingConnection.java:117) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.NonBlockingNetworkTransport. >> > > > acceptSocketChannel(NonBlockingNetworkTransport.java:158) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.SelectorThread$SelectionTas >> k$1.run( >> > > > SelectorThread.java:191) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > org.apache.qpid.server.transport.SelectorThread.run( >> > > > SelectorThread.java:462) >> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] >> > > > at >> > > > java.util.concurrent.ThreadPoolExecutor.runWorker( >> > > > ThreadPoolExecutor.java:1142) >> > > > ~[na:1.8.0_40] >> > > > at >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( >> > > > ThreadPoolExecutor.java:617) >> > > > ~[na:1.8.0_40] >> > > > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] >> > > > >> > > >> > >> > >
