a.) Regarding your last answer (thanks for your effort by the way): I'm aware of the relation between the heap and the systemUsage memoryLimit and we make sure that there are no illogical settings. The primary requirement is to have a stable system running 'forever' w/o any memory issues at any time independent from the load/throughput. No one really wants to deal with memory settings on the edge of limits.
You're right: the memory is completely consumed. And I can't guarantee the checkpoint/cleanup to be finished completely, so the system can be stalled without giving GC a chance to release some memory. It's the expiry check causing this. The persistent stores themselves seem to be managed as expected (no issues, no inconsistency, no loss); our situation is independent of the storage (reproducable for leveldb and kahadb). For KahaDB we use 16mb for journal files since years (helps to save a huge amount of space required for pending messages not consumed for some days due to offline situations on client side). Anyway, here is our current configuration you requested: <persistenceAdapter> <kahaDB directory="${activemq.base}/data/kahadb" enableIndexWriteAsync="true" journalMaxFileLength="16mb" indexWriteBatchSize="10000" indexCacheSize="10000" /> <!-- <levelDB directory="${activemq.base}/data/leveldb" logSize="33554432" /> --> </persistenceAdapter> b.) Some proposal concerning AMQ-6115: In my point of view, it's worth to discuss the one and only memoryLimit parameter used for both the regular browse/consume threads and the checkpoint/cleanup threads. There should always be enough space to browse/consume any queue at least with prefetch 1 resp. one of the next pending messages. Maybe - in this case - 2 well-balanced memoryLimit parameters with priority on consumption instead of checkpoint/cleanup are helpful for a a better regulation. Or something near it. c.) Our results and an acceptable solution so far: After a thorough investigation (w/o changing ActiveMQ source code) the result is for now that we need to accept the limitations defined by the single memoryLimit parameter used both for the #checkpoint/cleanup process and browsing/consuming queues. **1.) Memory** There is not a problem, if we use a much higher memoryLimit (together with a higher max-heap) to support both the message caching per destination during the #checkpoint/cleanup workflow and our requirements to browse/consume messages. But more memory is not an option in our scenario, we need to deal with 1024m max-heap and 500m memoryLimit. Besides this, constantly setting higher memoryLimits just because of more persistent queues containing hundreds/thousands of pending messages together with certain offline/inactive consumer scenarios should be discussed in detail (IMHO). **2.) Persistent Adapters** We ruled out persistent adapters as the cause of the problem, because the behaviour doesn't change, if we switch different types of persistent stores (KahaDB, LevelDB, JDBC-PostgreSQL). During the debugging sessions with KahaDB we also see regular checkpoint handling, the storage is managed as expected. **3.) Destination Policy / Expiration Check** Our problem completely disappears, if we disable caching and the expiration check, which is the actual cause of the problem. The corresponding properties are documented and there is a nice blog article about Message Priorities with a description quite suitable for our scenario: - http://activemq.apache.org/how-can-i-support-priority-queues.html - http://blog.christianposta.com/activemq/activemq-message-priorities-how-it-works/ We simply added useCache="false" and expireMessagesPeriod="0" to the policyEntry: <destinationPolicy> <policyMap> <policyEntries> <policyEntry queue=">" producerFlowControl="false" optimizedDispatch="true" memoryLimit="128mb" timeBeforeDispatchStarts="1000" useCache="false" expireMessagesPeriod="0"> <dispatchPolicy> <strictOrderDispatchPolicy /> </dispatchPolicy> <pendingQueuePolicy> <storeCursor /> </pendingQueuePolicy> </policyEntry> </policyEntries> </policyMap> </destinationPolicy> The consequences are clear, if we don't use in-mem caching anymore and never check for message expiration. For we neither use message expiration nor message priorities and the current message dispatching is fast enough for us, this trade-off is acceptable regarding given system limitations. One should also think about well-defined prefetch limits for memory consumption during specific workflows. Message sizes in our scenario can be 2 Bytes up to approx. 100 KB, so more individual policyEntries and client consumer configurations could be helpful to optimize system behaviour concerning performance and memory usage (see http://activemq.apache.org/per-destination-policies.html). Cheers Klaus Am 11.01.16 um 15:35 schrieb Tim Bain: > I believe you are correct: browsing a persistent queue uses bytes > from the memory store, because those bytes must be read from the > persistence store into the memory store before they can be handed > off to browsers or consumers. If all available bytes in the memory > store are already in use, the messages can't be paged into the > memory store, and so the operation that required them to be paged > in will hang/fail. > > You can work around the problem by increasing your memory store > size via trial-and-error until the problem goes away. Note that > the broker itself needs some amount of memory, so you can't give > the whole heap over to the memory store or you'll risk getting > OOMs, which means you may need to increase the heap size as well. > You can estimate how much memory the broker needs aside from the > memory store by subtracting the bytes used for the memory store > (539 MB) from the total heap bytes used as measured via JConsole or > similar tools. I'd double (or more) that number to be safe, if it > was me; the last thing I want to deal with in a production > application (ActiveMQ or anything else) is running out of memory > because I tried to cut the memory limits too close just to save a > little RAM. > > All of that is how to work around the fact that before you try to > browse your queue, something else has already consumed all > available bytes in the memory store. If you want to dig into why > that's happening, we'd need to try to figure out what those bytes > are being used for and whether it's possible to change > configuration values to reduce the usage so it fits into your > current limit. There will definitely be more effort required than > simply increasing the memory limit (and max heap size), but we can > try if you're not able to increase the limits enough to fix the > problem. > > If you want to go down that path, one thread to pull on is your > observation that you "can browse/consume some Queues _until_ the > #checkpoint call after 30 seconds." I assume from your reference > to checkpointing that you're using KahaDB as your persistence > store. Can you post the KahaDB portion of your config? > > Your statements here and in your StackOverflow post ( > http://stackoverflow.com/questions/34679854/how-to-avoid-blocking-of-queue-browsing-after-activemq-checkpoint-call) > > indicate that you think that the problem is that memory isn't getting > garbage collected after the operation that needed it (i.e. the > checkpoint) completes, but it's also possible that the checkpoint > operation isn't completing because it can't get enough messages > read into the memory store. Have you confirmed via the thread dump > that there is not a checkpoint operation still in progress? Also, > how large are your journal files that are getting checkpointed? If > they're large enough that all messages for one file won't fit into > the memory store, you might be able to prevent the problem by using > smaller files. > > Tim On Jan 8, 2016 9:32 AM, "Klaus Pittig" > <klaus.pit...@futura4retail.com> wrote: > >> If I increase the JVM max heap size (4GB), the behavior does not >> change. In my point of view, the configured memoryLimit (500 MB) >> works as expected (heapdump shows same max. size for the >> TextMessage content, i.e. 55002 byte[] instances containing 539 >> MB total). >> >> However, trying to browse a queue shows no content, even if there >> is enough heap memory available. >> >> As far as i understand the sourcecode, this also due to the >> configured memoryLimit, because - i hope this is the answer you >> expect - the calculation for available causes hasSpace = false. >> >> I found this here: >> >> AbstractPendingMessageCursor { public boolean hasSpace() { return >> systemUsage != null ? >> (!systemUsage.getMemoryUsage().isFull(memoryUsageHighWaterMark)) >> : true; } public boolean isFull() { return systemUsage != null ? >> systemUsage.getMemoryUsage().isFull() : false; } } >> >> >> #hasSpace is in this case called during a click on a queue in >> the Webconsole; see the 2 stacks during this workflow: >> >> Daemon Thread [Queue:aaa114] (Suspended (breakpoint at line 107 >> in QueueStorePrefetch)) owns: QueueStorePrefetch (id=6036) owns: >> StoreQueueCursor (id=6037) owns: Object (id=6038) >> QueueStorePrefetch.doFillBatch() line: 107 >> QueueStorePrefetch(AbstractStoreCursor).fillBatch() line: 381 >> QueueStorePrefetch(AbstractStoreCursor).reset() line: 142 >> StoreQueueCursor.reset() line: 159 >> Queue.doPageInForDispatch(boolean, boolean) line: 1897 >> Queue.pageInMessages(boolean) line: 2119 Queue.iterate() line: >> 1596 DedicatedTaskRunner.runTask() line: 112 >> DedicatedTaskRunner$1.run() line: 42 >> >> Daemon Thread [ActiveMQ VMTransport: vm://localhost#1] >> (Suspended (breakpoint at line 107 in QueueStorePrefetch)) owns: >> QueueStorePrefetch (id=5974) owns: StoreQueueCursor (id=5975) >> owns: Object (id=5976) owns: Object (id=5977) >> QueueStorePrefetch.doFillBatch() line: 107 >> QueueStorePrefetch(AbstractStoreCursor).fillBatch() line: 381 >> QueueStorePrefetch(AbstractStoreCursor).reset() line: 142 >> StoreQueueCursor.reset() line: 159 >> Queue.doPageInForDispatch(boolean, boolean) line: 1897 >> Queue.pageInMessages(boolean) line: 2119 Queue.iterate() line: >> 1596 Queue.wakeup() line: 1822 >> Queue.addSubscription(ConnectionContext, Subscription) line: 491 >> ManagedQueueRegion(AbstractRegion).addConsumer(ConnectionContext, >> >> ConsumerInfo) line: 399 >> ManagedRegionBroker(RegionBroker).addConsumer(ConnectionContext, >> ConsumerInfo) line: 427 >> ManagedRegionBroker.addConsumer(ConnectionContext, ConsumerInfo) >> line: 244 >> AdvisoryBroker(BrokerFilter).addConsumer(ConnectionContext, >> ConsumerInfo) line: 102 >> AdvisoryBroker.addConsumer(ConnectionContext, ConsumerInfo) line: >> 104 >> CompositeDestinationBroker(BrokerFilter).addConsumer(ConnectionContext, >> >> ConsumerInfo) >> line: 102 >> TransactionBroker(BrokerFilter).addConsumer(ConnectionContext, >> ConsumerInfo) line: 102 >> StatisticsBroker(BrokerFilter).addConsumer(ConnectionContext, >> ConsumerInfo) line: 102 >> BrokerService$5(MutableBrokerFilter).addConsumer(ConnectionContext, >> >> ConsumerInfo) line: 107 >> TransportConnection.processAddConsumer(ConsumerInfo) line: 663 >> ConsumerInfo.visit(CommandVisitor) line: 348 >> TransportConnection.service(Command) line: 334 >> TransportConnection$1.onCommand(Object) line: 188 >> ResponseCorrelator.onCommand(Object) line: 116 >> MutexTransport.onCommand(Object) line: 50 VMTransport.iterate() >> line: 248 DedicatedTaskRunner.runTask() line: 112 >> DedicatedTaskRunner$1.run() line: 42 >> >> >> >> Setting queueBrowsePrefetch="1" and queuePrefetch="1" in the >> PolicyEntry for queue=">" also has no effect. >> >> >> Am 08.01.16 um 16:32 schrieb Tim Bain: >>> If you increase your JVM size (4GB, 8GB, etc., the biggest your >>> OS and hardware will support), does the behavior change? Does >>> it truly take all available memory, or just all the memory that >>> you've made available to it (which isn't tiny but really isn't >>> all that big)? >>> >>> Also, how do you know that the MessageCursor seems to decide >>> that there is not enough memory and stops delivery of queue >>> content to browsers/consumers? What symptom tells you that? On >>> Jan 8, 2016 8:25 AM, "Klaus Pittig" >>> <klaus.pit...@futura4retail.com> wrote: >>> >>>> (related issue: >>>> https://issues.apache.org/jira/browse/AMQ-6115) >>>> >>>> There's a problem when Using ActiveMQ with a large number of >>>> Persistence Queues (250) á 1000 persistent TextMessages á 10 >>>> KB. >>>> >>>> Our scenario requires these messages to remain in the storage >>>> over a long time (days), until they are consumed (large >>>> amounts of data are staged for distribution for many >>>> consumer, that could be offline for some days). >>>> >>>> >>>> After the Persistence Store is filled with these Messages and >>>> after a broker restart we can browse/consume some Queues >>>> _until_ the #checkpoint call after 30 seconds. >>>> >>>> This call causes the broker to use all available memory and >>>> never releases it for other tasks such as Queue >>>> browse/consume. Internally the MessageCursor seems to decide, >>>> that there is not enough memory and stops delivery of queue >>>> content to browsers/consumers. >>>> >>>> >>>> => Is there a way to avoid this behaviour by configuration or >>>> is this a bug? >>>> >>>> The expectation is, that we can consume/browse any queue >>>> under all circumstances. >>>> >>>> Settings below are in production for some time now and >>>> several recommendations are applied found in the ActiveMQ >>>> documentation (destination policies, systemUsage, persistence >>>> store options etc.) >>>> >>>> - Behaviour is tested with ActiveMQ: 5.11.2, 5.13.0 and >>>> 5.5.1. - Memory Settings: Xmx=1024m - Java: 1.8 or 1.7 - OS: >>>> Windows, MacOS, Linux - PersistenceAdapter: KahaDB or >>>> LevelDB - Disc: enough free space (200 GB) and physical >>>> memory (16 GB max). >>>> >>>> Besides the above mentioned settings we use the following >>>> settings for the broker (btw: changing the memoryLimit to a >>>> lower value like 1mb does not change the situation): >>>> >>>> <destinationPolicy> <policyMap> <policyEntries> <policyEntry >>>> queue=">" producerFlowControl="false" >>>> optimizedDispatch="true" memoryLimit="128mb" >>>> timeBeforeDispatchStarts="1000"> <dispatchPolicy> >>>> <strictOrderDispatchPolicy /> </dispatchPolicy> >>>> <pendingQueuePolicy> <storeCursor /> </pendingQueuePolicy> >>>> </policyEntry> </policyEntries> </policyMap> >>>> </destinationPolicy> <systemUsage> <systemUsage >>>> sendFailIfNoSpace="true"> <memoryUsage> <memoryUsage >>>> limit="50 mb" /> </memoryUsage> <storeUsage> <storeUsage >>>> limit="80000 mb" /> </storeUsage> <tempUsage> <tempUsage >>>> limit="1000 mb" /> </tempUsage> </systemUsage> >>>> </systemUsage> >>>> >>>> If we set the **cursorMemoryHighWaterMark** in the >>>> destinationPolicy to a higher value like **150** or **600** >>>> depending on the difference between memoryUsage and the >>>> available heap space relieves the situation a bit for a >>>> workaround, but this is not really an option for production >>>> systems in my point of view. >>>> >>>> Screenie with information from Oracle Mission Control showing >>>> those ActiveMQTextMessage instances that are never released >>>> from memory: >>>> >>>> http://goo.gl/EjEixV >>>> >>>> >>>> Cheers Klaus >>>> >>> >> >