Hi Alex, Any update on the fix for this? QPID-7753 is assigned a fix version for 7.0.0, I am hoping that the fix will also be back ported to 6.0.x.
Thanks Ramayan On Mon, May 8, 2017 at 2:14 AM, Oleksandr Rudyy <[email protected]> wrote: > Hi Ramayan, > > Thanks for testing the patch and providing a feedback. > > Regarding direct memory utilization, the Qpid Broker caches up to 256MB of > direct memory internally in QpidByteBuffers. Thus, when testing the Broker > with only 256MB of direct memory, the entire direct memory could be cached > and it would look as if direct memory is never released. Potentially, you > can reduce the number of buffers cached on broker by changing context > variable 'broker.directByteBufferPoolSize'. By default, it is set to 1000. > With buffer size of 256K, it would give ~256M of cache. > > Regarding introducing lower and upper thresholds for 'flow to disk'. It > seems like a good idea and we will try to implement it early this week on > trunk first. > > Kind Regards, > Alex > > > On 5 May 2017 at 23:49, Ramayan Tiwari <[email protected]> wrote: > > > Hi Alex, > > > > Thanks for providing the patch. I verified the fix with same perf test, > and > > it does prevent broker from going OOM, however. DM utilization doesn't > get > > any better after hitting the threshold (where flow to disk is activated > > based on total used % across broker - graph in the link below). > > > > After hitting the final threshold, flow to disk activates and deactivates > > pretty frequently across all the queues. The reason seems to be because > > there is only one threshold currently to trigger flow to disk. Would it > > make sense to break this down to high and low threshold - so that once > flow > > to disk is active after hitting high threshold, it will be active until > the > > queue utilization (or broker DM allocation) reaches the low threshold. > > > > Graph and flow to disk logs are here: > > https://docs.google.com/document/d/1Wc1e-id- > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV > > U-RiM/edit#heading=h.6400pltvjhy7 > > > > Thanks > > Ramayan > > > > On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <[email protected]> > wrote: > > > > > Hi Ramayan, > > > > > > We attached to the QPID-7753 a patch with a work around for 6.0.x > branch. > > > It triggers flow to disk based on direct memory consumption rather than > > > estimation of the space occupied by the message content. The flow to > disk > > > should evacuate message content preventing running out of direct > memory. > > We > > > already committed the changes into 6.0.x and 6.1.x branches. It will be > > > included into upcoming 6.0.7 and 6.1.3 releases. > > > > > > Please try and test the patch in your environment. > > > > > > We are still working at finishing of the fix for trunk. > > > > > > Kind Regards, > > > Alex > > > > > > On 30 April 2017 at 15:45, Lorenz Quack <[email protected]> > wrote: > > > > > > > Hi Ramayan, > > > > > > > > The high-level plan is currently as follows: > > > > 1) Periodically try to compact sparse direct memory buffers. > > > > 2) Increase accuracy of messages' direct memory usage estimation to > > more > > > > reliably trigger flow to disk. > > > > 3) Add an additional flow to disk trigger based on the amount of > > > allocated > > > > direct memory. > > > > > > > > A little bit more details: > > > > 1) We plan on periodically checking the amount of direct memory > usage > > > and > > > > if it is above a > > > > threshold (50%) we compare the sum of all queue sizes with the > > amount > > > > of allocated direct memory. > > > > If the ratio falls below a certain threshold we trigger a > > compaction > > > > task which goes through all queues > > > > and copy's a certain amount of old message buffers into new ones > > > > thereby freeing the old buffers so > > > > that they can be returned to the buffer pool and be reused. > > > > > > > > 2) Currently we trigger flow to disk based on an estimate of how > much > > > > memory the messages on the > > > > queues consume. We had to use estimates because we did not have > > > > accurate size numbers for > > > > message headers. By having accurate size information for message > > > > headers we can more reliably > > > > enforce queue memory limits. > > > > > > > > 3) The flow to disk trigger based on message size had another > problem > > > > which is more pertinent to the > > > > current issue. We only considered the size of the messages and > not > > > how > > > > much memory we allocate > > > > to store those messages. In the FIFO use case those numbers will > be > > > > very close to each other but in > > > > use cases like yours we can end up with sparse buffers and the > > > numbers > > > > will diverge. Because of this > > > > divergence we do not trigger flow to disk in time and the broker > > can > > > go > > > > OOM. > > > > To fix the issue we want to add an additional flow to disk > trigger > > > > based on the amount of allocated direct > > > > memory. This should prevent the broker from going OOM even if the > > > > compaction strategy outlined above > > > > should fail for some reason (e.g., the compaction task cannot > keep > > up > > > > with the arrival of new messages). > > > > > > > > Currently, there are patches for the above points but they suffer > from > > > some > > > > thread-safety issues that need to be addressed. > > > > > > > > I hope this description helps. Any feedback is, as always, welcome. > > > > > > > > Kind regards, > > > > Lorenz > > > > > > > > > > > > > > > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > Hi Lorenz, > > > > > > > > > > Thanks so much for the patch. We have a perf test now to reproduce > > this > > > > > issue, so we did test with 256KB, 64KB and 4KB network byte buffer. > > > None > > > > of > > > > > these configurations help with the issue (or give any more > breathing > > > > room) > > > > > for our use case. We would like to share the perf analysis with the > > > > > community: > > > > > > > > > > https://docs.google.com/document/d/1Wc1e-id- > > > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV > > > > > U-RiM/edit?usp=sharing > > > > > > > > > > Feel free to comment on the doc if certain details are incorrect or > > if > > > > > there are questions. > > > > > > > > > > Since the short term solution doesn't help us, we are very > interested > > > in > > > > > getting some details on how the community plans to address this, a > > high > > > > > level description of the approach will be very helpful for us in > > order > > > to > > > > > brainstorm our use cases along with this solution. > > > > > > > > > > - Ramayan > > > > > > > > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hello Ramayan, > > > > > > > > > > > > We are still working on a fix for this issue. > > > > > > In the mean time we had an idea to potentially workaround the > issue > > > > until > > > > > > a proper fix is released. > > > > > > > > > > > > The idea is to decrease the qpid network buffer size the broker > > uses. > > > > > > While this still allows for sparsely populated buffers it would > > > improve > > > > > > the overall occupancy ratio. > > > > > > > > > > > > Here are the steps to follow: > > > > > > * ensure you are not using TLS > > > > > > * apply the attached patch > > > > > > * figure out the size of the largest messages you are sending > > > > (including > > > > > > header and some overhead) > > > > > > * set the context variable "qpid.broker.networkBufferSize" to > > that > > > > > value > > > > > > but not smaller than 4096 > > > > > > * test > > > > > > > > > > > > Decreasing the qpid network buffer size automatically limits the > > > > maximum > > > > > > AMQP frame size. > > > > > > Since you are using a very old client we are not sure how well it > > > copes > > > > > > with small frame sizes where it has to split a message across > > > multiple > > > > > > frames. > > > > > > Therefore, to play it safe you should not set it smaller than the > > > > largest > > > > > > messages (+ header + overhead) you are sending. > > > > > > I do not know what message sizes you are sending but AMQP imposes > > the > > > > > > restriction that the framesize cannot be smaller than 4096 bytes. > > > > > > In the qpid broker the default currently is 256 kB. > > > > > > > > > > > > In the current state the broker does not allow setting the > network > > > > buffer > > > > > > to values smaller than 64 kB to allow TLS frames to fit into one > > > > network > > > > > > buffer. > > > > > > I attached a patch to this mail that lowers that restriction to > the > > > > limit > > > > > > imposed by AMQP (4096 Bytes). > > > > > > Obviously, you should not use this when using TLS. > > > > > > > > > > > > > > > > > > I hope this reduces the problems you are currently facing until > we > > > can > > > > > > complete the proper fix. > > > > > > > > > > > > Kind regards, > > > > > > Lorenz > > > > > > > > > > > > > > > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote: > > > > > > > Thanks so much Keith and the team for finding the root cause. > We > > > are > > > > so > > > > > > > relieved that we fix the root cause shortly. > > > > > > > > > > > > > > Couple of things that I forgot to mention on the mitigation > steps > > > we > > > > > took > > > > > > > in the last incident: > > > > > > > 1) We triggered GC from JMX bean multiple times, it did not > help > > in > > > > > > > reducing DM allocated. > > > > > > > 2) We also killed all the AMQP connections to the broker when > DM > > > was > > > > at > > > > > > > 80%. This did not help either. The way we killed connections - > > > using > > > > > JMX > > > > > > > got list of all the open AMQP connections and called close from > > JMX > > > > > > mbean. > > > > > > > > > > > > > > I am hoping the above two are not related to root cause, but > > wanted > > > > to > > > > > > > bring it up in case this is relevant. > > > > > > > > > > > > > > Thanks > > > > > > > Ramayan > > > > > > > > > > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hello Ramayan > > > > > > > > > > > > > > > > I believe I understand the root cause of the problem. We > have > > > > > > > > identified a flaw in the direct memory buffer management > > employed > > > > by > > > > > > > > Qpid Broker J which for some messaging use-cases can lead to > > the > > > > OOM > > > > > > > > direct you describe. For the issue to manifest the > producing > > > > > > > > application needs to use a single connection for the > production > > > of > > > > > > > > messages some of which are short-lived (i.e. are consumed > > > quickly) > > > > > > > > whilst others remain on the queue for some time. Priority > > > queues, > > > > > > > > sorted queues and consumers utilising selectors that result > in > > > some > > > > > > > > messages being left of the queue could all produce this > patten. > > > > The > > > > > > > > pattern leads to a sparsely occupied 256K net buffers which > > > cannot > > > > be > > > > > > > > released or reused until every message that reference a > 'chunk' > > > of > > > > it > > > > > > > > is either consumed or flown to disk. The problem was > > introduced > > > > > with > > > > > > > > Qpid v6.0 and exists in v6.1 and trunk too. > > > > > > > > > > > > > > > > The flow to disk feature is not helping us here because its > > > > algorithm > > > > > > > > considers only the size of live messages on the queues. If > the > > > > > > > > accumulative live size does not exceed the threshold, the > > > messages > > > > > > > > aren't flown to disk. I speculate that when you observed that > > > > moving > > > > > > > > messages cause direct message usage to drop earlier today, > your > > > > > > > > message movement cause a queue to go over threshold, cause > > > message > > > > to > > > > > > > > be flown to disk and their direct memory references released. > > > The > > > > > > > > logs will confirm this is so. > > > > > > > > > > > > > > > > I have not identified an easy workaround at the moment. > > > > Decreasing > > > > > > > > the flow to disk threshold and/or increasing available direct > > > > memory > > > > > > > > should alleviate and may be an acceptable short term > > workaround. > > > > If > > > > > > > > it were possible for publishing application to publish short > > > lived > > > > > and > > > > > > > > long lived messages on two separate JMS connections this > would > > > > avoid > > > > > > > > this defect. > > > > > > > > > > > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related this > > > > problem. > > > > > > > > We intend to be working on these early next week and will be > > > aiming > > > > > > > > for a fix that is back-portable to 6.0. > > > > > > > > > > > > > > > > Apologies that you have run into this defect and thanks for > > > > > reporting. > > > > > > > > > > > > > > > > Thanks, Keith > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > We have been monitoring the brokers everyday and today we > > found > > > > one > > > > > > > > instance > > > > > > > > > > > > > > > > > > where broker’s DM was constantly going up and was about to > > > crash, > > > > > so > > > > > > we > > > > > > > > > experimented some mitigations, one of which caused the DM > to > > > come > > > > > > down. > > > > > > > > > Following are the details, which might help us > understanding > > > the > > > > > > issue: > > > > > > > > > > > > > > > > > > Traffic scenario: > > > > > > > > > > > > > > > > > > DM allocation had been constantly going up and was at 90%. > > > There > > > > > > were two > > > > > > > > > queues which seemed to align with the theories that we had. > > > Q1’s > > > > > > size had > > > > > > > > > been large right after the broker start and had slow > > > consumption > > > > of > > > > > > > > > messages, queue size only reduced from 76MB to 75MB over a > > > period > > > > > of > > > > > > > > 6hrs. > > > > > > > > > > > > > > > > > > Q2 on the other hand, started small and was gradually > > growing, > > > > > queue > > > > > > size > > > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues with > > > > traffic > > > > > > > > during > > > > > > > > > > > > > > > > > > this time. > > > > > > > > > > > > > > > > > > Action taken: > > > > > > > > > > > > > > > > > > Moved all the messages from Q2 (since this was our original > > > > theory) > > > > > > to Q3 > > > > > > > > > (already created but no messages in it). This did not help > > with > > > > the > > > > > > DM > > > > > > > > > growing up. > > > > > > > > > Moved all the messages from Q1 to Q4 (already created but > no > > > > > > messages in > > > > > > > > > it). This reduced DM allocation from 93% to 31%. > > > > > > > > > > > > > > > > > > We have the heap dump and thread dump from when broker was > > 90% > > > in > > > > > DM > > > > > > > > > allocation. We are going to analyze that to see if we can > get > > > > some > > > > > > clue. > > > > > > > > We > > > > > > > > > > > > > > > > > > wanted to share this new information which might help in > > > > reasoning > > > > > > about > > > > > > > > the > > > > > > > > > > > > > > > > > > memory issue. > > > > > > > > > > > > > > > > > > - Ramayan > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari < > > > > > > > > [email protected]> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Keith, > > > > > > > > > > > > > > > > > > > > Thanks so much for your response and digging into the > > issue. > > > > > Below > > > > > > are > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > answer to your questions: > > > > > > > > > > > > > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't > use > > > 6.1 > > > > > > where it > > > > > > > > > > was released because we need JMX support. Here is the > > > > destination > > > > > > > > format: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes : > { > > > > > > arguments : { > > > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";" > > > > > > > > > > > > > > > > > > > > 2) Our machines have 40 cores, which will make the number > > of > > > > > > threads to > > > > > > > > > > 80. This might not be an issue, because this will show up > > in > > > > the > > > > > > > > baseline DM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > allocated, which is only 6% (of 4GB) when we just bring > up > > > the > > > > > > broker. > > > > > > > > > > > > > > > > > > > > 3) The only setting that we tuned WRT to DM is > > > > > flowToDiskThreshold, > > > > > > > > which > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is set at 80% now. > > > > > > > > > > > > > > > > > > > > 4) Only one virtual host in the broker. > > > > > > > > > > > > > > > > > > > > 5) Most of our queues (99%) are priority, we also have > 8-10 > > > > > sorted > > > > > > > > queues. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6) Yeah we are using the standard 0.16 client and not > AMQP > > > 1.0 > > > > > > clients. > > > > > > > > > > The connection log line looks like: > > > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol > > > > Version > > > > > : > > > > > > 0-10 > > > > > > > > : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Client ID : test : Client Version : 0.16 : Client > Product : > > > > qpid > > > > > > > > > > > > > > > > > > > > We had another broker crashed about an hour back, we do > see > > > the > > > > > > same > > > > > > > > > > patterns: > > > > > > > > > > 1) There is a queue which is constantly growing, enqueue > is > > > > > faster > > > > > > than > > > > > > > > > > dequeue on that queue for a long period of time. > > > > > > > > > > 2) Flow to disk didn't kick in at all. > > > > > > > > > > > > > > > > > > > > This graph shows memory growth (red line - heap, blue - > DM > > > > > > allocated, > > > > > > > > > > yellow - DM used) > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/ > > > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/ > > > > > > > > view?usp=sharing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The below graph shows growth on a single queue (there are > > > 10-12 > > > > > > other > > > > > > > > > > queues with traffic as well, something large size than > this > > > > > queue): > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/ > > > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/ > > > > > > > > view?usp=sharing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Couple of questions: > > > > > > > > > > 1) Is there any developer level doc/design spec on how > Qpid > > > > uses > > > > > > DM? > > > > > > > > > > 2) We are not getting heap dumps automatically when > broker > > > > > crashes > > > > > > due > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone > > > > found a > > > > > > way > > > > > > > > to get > > > > > > > > > > > > > > > > > > > > > > > > > > > > > around this problem? > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > Ramayan > > > > > > > > > > > > > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ramayan > > > > > > > > > > > > > > > > > > > > > > We have been discussing your problem here and have a > > couple > > > > of > > > > > > > > questions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have been experimenting with use-cases based on your > > > > > > descriptions > > > > > > > > > > > above, but so far, have been unsuccessful in > reproducing > > a > > > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory" > > > > condition. > > > > > > The > > > > > > > > > > > direct memory usage reflects the expected model: it > > levels > > > > off > > > > > > when > > > > > > > > > > > the flow to disk threshold is reached and direct memory > > is > > > > > > release as > > > > > > > > > > > messages are consumed until the minimum size for > caching > > of > > > > > > direct is > > > > > > > > > > > reached. > > > > > > > > > > > > > > > > > > > > > > 1] For clarity let me check: we believe when you say > > "patch > > > > to > > > > > > use > > > > > > > > > > > MultiQueueConsumer" you are referring to the patch > > attached > > > > to > > > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the > > broker" > > > > > and > > > > > > you > > > > > > > > > > > are using a combination of this "x-pull-only" with the > > > > > standard > > > > > > > > > > > "x-multiqueue" feature. Is this correct? > > > > > > > > > > > > > > > > > > > > > > 2] One idea we had here relates to the size of the > > > > virtualhost > > > > > IO > > > > > > > > > > > pool. As you know from the documentation, the Broker > > > > > > caches/reuses > > > > > > > > > > > direct memory internally but the documentation fails to > > > > > mentions > > > > > > that > > > > > > > > > > > each pooled virtualhost IO thread also grabs a chunk > > (256K) > > > > of > > > > > > direct > > > > > > > > > > > memory from this cache. By default the virtual host IO > > > pool > > > > is > > > > > > sized > > > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() * > 2, > > > > 64), > > > > > > so if > > > > > > > > > > > you have a machine with a very large number of cores, > you > > > may > > > > > > have a > > > > > > > > > > > surprising large amount of direct memory assigned to > > > > > virtualhost > > > > > > IO > > > > > > > > > > > threads. Check the value of connectionThreadPoolSize > on > > > the > > > > > > > > > > > virtualhost > > > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/< > > > > > > virtualhostnodename>/<; > > > > > > > > virtualhostname>) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to see what value is in force. What is it? It is > > possible > > > > to > > > > > > tune > > > > > > > > > > > the pool size using context variable > > > > > > > > > > > virtualhost.connectionThreadPool.size. > > > > > > > > > > > > > > > > > > > > > > 3] Tell me if you are tuning the Broker in way beyond > the > > > > > > direct/heap > > > > > > > > > > > memory settings you have told us about already. For > > > instance > > > > > > you are > > > > > > > > > > > changing any of the direct memory pooling settings > > > > > > > > > > > broker.directByteBufferPoolSize, default network > buffer > > > size > > > > > > > > > > > qpid.broker.networkBufferSize or applying any other > > > > > non-standard > > > > > > > > > > > settings? > > > > > > > > > > > > > > > > > > > > > > 4] How many virtual hosts do you have on the Broker? > > > > > > > > > > > > > > > > > > > > > > 5] What is the consumption pattern of the messages? Do > > > > consume > > > > > > in a > > > > > > > > > > > strictly FIFO fashion or are you making use of message > > > > > selectors > > > > > > > > > > > or/and any of the out-of-order queue types (LVQs, > > priority > > > > > queue > > > > > > or > > > > > > > > > > > sorted queues)? > > > > > > > > > > > > > > > > > > > > > > 6] Is it just the 0.16 client involved in the > > application? > > > > > Can > > > > > > I > > > > > > > > > > > check that you are not using any of the AMQP 1.0 > clients > > > > > > > > > > > (org,apache.qpid:qpid-jms-client or > > > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software > > > stack > > > > > (as > > > > > > either > > > > > > > > > > > consumers or producers) > > > > > > > > > > > > > > > > > > > > > > Hopefully the answers to these questions will get us > > closer > > > > to > > > > > a > > > > > > > > > > > reproduction. If you are able to reliable reproduce > it, > > > > > please > > > > > > share > > > > > > > > > > > the steps with us. > > > > > > > > > > > > > > > > > > > > > > Kind regards, Keith. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari < > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > After a lot of log mining, we might have a way to > > explain > > > > the > > > > > > > > sustained > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > increased in DirectMemory allocation, the correlation > > > seems > > > > > to > > > > > > be > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > growth in the size of a Queue that is getting > consumed > > > but > > > > at > > > > > > a much > > > > > > > > > > > > slower > > > > > > > > > > > > rate than producers putting messages on this queue. > > > > > > > > > > > > > > > > > > > > > > > > The pattern we see is that in each instance of broker > > > > crash, > > > > > > there is > > > > > > > > > > > > at > > > > > > > > > > > > least one queue (usually 1 queue) whose size kept > > growing > > > > > > steadily. > > > > > > > > > > > > It’d be > > > > > > > > > > > > of significant size but not the largest queue -- > > usually > > > > > there > > > > > > are > > > > > > > > > > > > multiple > > > > > > > > > > > > larger queues -- but it was different from other > queues > > > in > > > > > > that its > > > > > > > > > > > > size > > > > > > > > > > > > was growing steadily. The queue would also be moving, > > but > > > > its > > > > > > > > > > > > processing > > > > > > > > > > > > rate was not keeping up with the enqueue rate. > > > > > > > > > > > > > > > > > > > > > > > > Our theory that might be totally wrong: If a queue is > > > > moving > > > > > > the > > > > > > > > entire > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > time, maybe then the broker would keep reusing the > same > > > > > buffer > > > > > > in > > > > > > > > > > > > direct > > > > > > > > > > > > memory for the queue, and keep on adding onto it at > the > > > end > > > > > to > > > > > > > > > > > > accommodate > > > > > > > > > > > > new messages. But because it’s active all the time > and > > > > we’re > > > > > > pointing > > > > > > > > > > > > to > > > > > > > > > > > > the same buffer, space allocated for messages at the > > head > > > > of > > > > > > the > > > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long after > > those > > > > > > messages > > > > > > > > have > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > been processed. Just a theory. > > > > > > > > > > > > > > > > > > > > > > > > We are also trying to reproduce this using some perf > > > tests > > > > to > > > > > > enqueue > > > > > > > > > > > > with > > > > > > > > > > > > same pattern, will update with the findings. > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > Ramayan > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari > > > > > > > > > > > > <[email protected]> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another issue that we noticed is when broker goes > OOM > > > due > > > > > to > > > > > > direct > > > > > > > > > > > > > memory, it doesn't create heap dump (specified by > > > "-XX:+ > > > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM > error > > > is > > > > > > same as > > > > > > > > what > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > mentioned in the oracle JVM docs > > > > > > ("java.lang.OutOfMemoryError"). > > > > > > > > > > > > > > > > > > > > > > > > > > Has anyone been able to find a way to get to heap > > dump > > > > for > > > > > > DM OOM? > > > > > > > > > > > > > > > > > > > > > > > > > > - Ramayan > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari > > > > > > > > > > > > > <[email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Alex, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Below are the flow to disk logs from broker > having > > > > > > 3million+ > > > > > > > > messages > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > this time. We only have one virtual host. Time is > > in > > > > GMT. > > > > > > Looks > > > > > > > > like > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > flow > > > > > > > > > > > > > > to disk is active on the whole virtual host and > > not a > > > > > > queue level. > > > > > > > > > > > > > > > > > > > > > > > > > > > > When the same broker went OOM yesterday, I did > not > > > see > > > > > any > > > > > > flow to > > > > > > > > > > > > > > disk > > > > > > > > > > > > > > logs from when it was started until it crashed > > > (crashed > > > > > > twice > > > > > > > > within > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4hrs). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3356539KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3354866KB > > > > > > > > > > > > > > within threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3358509KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3353501KB > > > > > > > > > > > > > > within threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3357544KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3353236KB > > > > > > > > > > > > > > within threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3356704KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3353511KB > > > > > > > > > > > > > > within threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3357948KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3355310KB > > > > > > > > > > > > > > within threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3365624KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3355136KB > > > > > > > > > > > > > > within threshold 3355443KB > > > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO > [Housekeeping[test]] - > > > > > > > > > > > > > > [Housekeeping[test]] > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : Message > > > > memory > > > > > > use > > > > > > > > > > > > > > 3358683KB > > > > > > > > > > > > > > exceeds threshold 3355443KB > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After production release (2days back), we have > > seen 4 > > > > > > crashes in 3 > > > > > > > > > > > > > > different brokers, this is the most pressing > > concern > > > > for > > > > > > us in > > > > > > > > > > > > > > decision if > > > > > > > > > > > > > > we should roll back to 0.32. Any help is greatly > > > > > > appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Ramayan > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy > < > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ramayan, > > > > > > > > > > > > > > > Thanks for the details. I would like to clarify > > > > whether > > > > > > flow to > > > > > > > > disk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > was > > > > > > > > > > > > > > > triggered today for 3 million messages? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The following logs are issued for flow to disk: > > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active : > Message > > > > > memory > > > > > > use > > > > > > > > > > > > > > > {0,number,#}KB > > > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB > > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : > > Message > > > > > > memory use > > > > > > > > > > > > > > > {0,number,#}KB within threshold > {1,number,#.##}KB > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind Regards, > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari < > > > > > > > > [email protected]> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your response, here are the > details: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We use "direct" exchange, without persistence > > (we > > > > > > specify > > > > > > > > > > > > > > > NON_PERSISTENT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that while sending from client) and use BDB > > > store. > > > > We > > > > > > use JSON > > > > > > > > > > > > > > > > virtual > > > > > > > > > > > > > > > host > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type. We are not using SSL. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When the broker went OOM, we had around 1.3 > > > million > > > > > > messages > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 100 > > > > > > > > > > > > > > > bytes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > average message size. Direct memory > allocation > > > > (value > > > > > > read from > > > > > > > > > > > > > > > > MBean) > > > > > > > > > > > > > > > kept > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > going up, even though it wouldn't need more > DM > > to > > > > > > store these > > > > > > > > many > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > messages. DM allocated persisted at 99% for > > > about 3 > > > > > > and half > > > > > > > > hours > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > before > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > crashing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Today, on the same broker we have 3 million > > > > messages > > > > > > (same > > > > > > > > message > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > size) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and DM allocated is only at 8%. This seems > like > > > > there > > > > > > is some > > > > > > > > > > > > > > > > issue > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > de-allocation or a leak. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have uploaded the memory utilization graph > > > here: > > > > > > > > > > > > > > > > https://drive.google.com/file/d/ > > > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/ > > > > > > > > > > > > > > > > view?usp=sharing > > > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used > > (sum > > > > of > > > > > > queue > > > > > > > > > > > > > > > > payload) > > > > > > > > > > > > > > > and Red > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is heap usage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > Ramayan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr > > Rudyy > > > > > > > > > > > > > > > > <[email protected]> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ramayan, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Could please share with us the details of > > > > messaging > > > > > > use > > > > > > > > case(s) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > ended > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > up in OOM on broker side? > > > > > > > > > > > > > > > > > I would like to reproduce the issue on my > > local > > > > > > broker in > > > > > > > > order > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > fix > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would appreciate if you could provide as > > much > > > > > > details as > > > > > > > > > > > > > > > > > possible, > > > > > > > > > > > > > > > > > including, messaging topology, message > > > > persistence > > > > > > type, > > > > > > > > message > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sizes,volumes, etc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for > > > keeping > > > > > > message > > > > > > > > content > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > receiving/sending data. Each plain > connection > > > > > > utilizes 512K of > > > > > > > > > > > > > > > > > direct > > > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of > direct > > > > > > memory. Your > > > > > > > > > > > > > > > > > memory > > > > > > > > > > > > > > > > settings > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > look Ok to me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind Regards, > > > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari > > > > > > > > > > > > > > > > > <[email protected]> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with > patch > > to > > > > use > > > > > > > > > > > > > > > MultiQueueConsumer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > feature. We just finished deploying to > > > > production > > > > > > and saw > > > > > > > > > > > > > > > > > > couple of > > > > > > > > > > > > > > > > > > instances of broker OOM due to running > out > > of > > > > > > DirectMemory > > > > > > > > > > > > > > > > > > buffer > > > > > > > > > > > > > > > > > > (exceptions at the end of this email). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here is our setup: > > > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g > (this > > > is > > > > > > opposite of > > > > > > > > > > > > > > > > > > what the > > > > > > > > > > > > > > > > > > recommendation is, however, for our use > > cause > > > > > > message > > > > > > > > payload > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > really > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > small ~400bytes and is way less than the > > per > > > > > > message > > > > > > > > overhead > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > 1KB). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > perf testing, we were able to put 2 > million > > > > > > messages without > > > > > > > > > > > > > > > > > > any > > > > > > > > > > > > > > > > issues. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. ~400 connections to broker. > > > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and > > there > > > is > > > > > > one multi > > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > consumer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > attached to each session, listening to > > around > > > > > 1000 > > > > > > queues. > > > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I > know). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With the above setup, the baseline > > > utilization > > > > > > (without any > > > > > > > > > > > > > > > messages) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > direct memory was around 230mb (with 410 > > > > > > connection each > > > > > > > > > > > > > > > > > > taking > > > > > > > > > > > > > > > 500KB). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Based on our understanding of broker > memory > > > > > > allocation, > > > > > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > payload > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > should be the only thing adding to direct > > > > memory > > > > > > utilization > > > > > > > > > > > > > > > > > > (on > > > > > > > > > > > > > > > top of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > baseline), however, we are experiencing > > > > something > > > > > > completely > > > > > > > > > > > > > > > different. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > our last broker crash, we see that broker > > is > > > > > > constantly > > > > > > > > > > > > > > > > > > running > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 90%+ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > direct memory allocated, even when > message > > > > > payload > > > > > > sum from > > > > > > > > > > > > > > > > > > all the > > > > > > > > > > > > > > > > > queues > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is only 6-8% (these % are against > available > > > DM > > > > of > > > > > > 4gb). > > > > > > > > During > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > these > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > DM usage period, heap usage was around > 60% > > > (of > > > > > > 12gb). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like some help in understanding > > what > > > > > > could be the > > > > > > > > > > > > > > > > > > reason > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > these > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > high DM allocations. Are there things > other > > > > than > > > > > > message > > > > > > > > > > > > > > > > > > payload > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > AMQP > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > connection, which use DM and could be > > > > > contributing > > > > > > to these > > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > usage? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another thing where we are puzzled is the > > > > > > de-allocation of > > > > > > > > DM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > byte > > > > > > > > > > > > > > > > > buffers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From log mining of heap and DM > utilization, > > > > > > de-allocation of > > > > > > > > > > > > > > > > > > DM > > > > > > > > > > > > > > > doesn't > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has > seen > > > any > > > > > > documentation > > > > > > > > > > > > > > > related to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > this, it would be very helpful if you > could > > > > share > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > Ramayan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Exceptions* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer > > > > memory > > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory( > > > Bits.java:658) > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.< > > > > > > init>(DirectByteBuffer.java: > > > > > > > > 123) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer. > > > > > allocateDirect(ByteBuffer. > > > > > > java:311) > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer. > > > > > > QpidByteBuffer.allocateDirect( > > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474) > > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingConnectionPlainD > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > elegate. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite( > > > > > > > > NonBlockingConnectionPlainDele > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gate.java:93) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingConnectionPlainDele > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gate.processData( > > > > NonBlockingConnectionPlainDele > > > > > > > > gate.java:60) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingConnection.doRead( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > NonBlockingConnection.java:506) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingConnection.doWork( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > NonBlockingConnection.java:285) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NetworkConnectionScheduler. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > processConnection( > > > NetworkConnectionScheduler. > > > > > > java:124) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server. > > > > transport.SelectorThread$ > > > > > > > > ConnectionPr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ocessor. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > processConnection( > SelectorThread.java:504) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server. > > > > transport.SelectorThread$ > > > > > > > > > > > > > > > > > > SelectionTask.performSelect( > > > > > > SelectorThread.java:337) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.qpid.server. > > > > transport.SelectorThread$ > > > > > > > > SelectionTask.run( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > SelectorThread.java:87) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server. > > > > > > transport.SelectorThread.run( > > > > > > > > > > > > > > > > > > SelectorThread.java:462) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > java.util.concurrent. > > > > > ThreadPoolExecutor.runWorker( > > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142) > > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > java.util.concurrent. > > > > > > ThreadPoolExecutor$Worker.run( > > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617) > > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Second exception* > > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer > > > > memory > > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory( > > > Bits.java:658) > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.< > > > > > > init>(DirectByteBuffer.java: > > > > > > > > 123) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer. > > > > > allocateDirect(ByteBuffer. > > > > > > java:311) > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer. > > > > > > QpidByteBuffer.allocateDirect( > > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474) > > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingConnectionPlainDele > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gate.<init>( > NonBlockingConnectionPlainDele > > > > > > gate.java:45) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > NonBlockingConnection. > > > > > > > > > > > > > > > > > > setTransportEncryption( > > > > > NonBlockingConnection.java: > > > > > > 625) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingConnection.<init>( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > NonBlockingConnection.java:117) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport. > > > > > > > > NonBlockingNetworkTransport. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > acceptSocketChannel( > > > > NonBlockingNetworkTransport. > > > > > > java:158) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server. > > > > transport.SelectorThread$ > > > > > > > > SelectionTas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > k$1.run( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > SelectorThread.java:191) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > org.apache.qpid.server. > > > > > > transport.SelectorThread.run( > > > > > > > > > > > > > > > > > > SelectorThread.java:462) > > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > java.util.concurrent. > > > > > ThreadPoolExecutor.runWorker( > > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142) > > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > java.util.concurrent. > > > > > > ThreadPoolExecutor$Worker.run( > > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617) > > > > > > > > > > > > > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > ~[na:1.8.0_40] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------ > > > ------------------------------ > > > > > > --------- > > > > > > > > > > > To unsubscribe, e-mail: [email protected]. > > org > > > > > > > > > > > For additional commands, e-mail: > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------ > > > --------- > > > > > > To unsubscribe, e-mail: [email protected] > > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > > > >
