We are using open source one from ArkMQ Best Regards Shiv
From: Clebert Suconic <[email protected]> Sent: 15 January 2026 08:26 PM To: [email protected] Subject: Re: K8s broker pod getting killed with OOM Unverified Sender: The sender of this email has not been verified. Review the content of the message carefully and verify the identity of the sender before acting on this email: replying, opening attachments or clicking links. >> We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is 1.2.5. Which one? the commercial version from Red Hat / openshift, or the opensource one from ArkMQ? On Thu, Jan 15, 2026 at 9:51 AM Clebert Suconic <[email protected]<mailto:[email protected]>> wrote: Those points you described are the reason why I suggesting using max-address-size on the every destination... have max-size at say 20M for every destination (make 100K for small destinations if you like, but I think 20M for every destination is mostly okay, unless you have a lot of destinations). have max-read at 10M... This should then optimize your memory usage On Thu, Jan 15, 2026 at 6:53 AM Shiv Kumar Dixit <[email protected]<mailto:[email protected]>> wrote: Hello Arthur and Clebert When our broker pod starts, it first starts 2 init containers which terminate and release resources after completing the setup. So our pod basically runs 2 containers – one for vault and another for broker. We verified the memory and CPU usage of these init containers and main containers using top pod and it shows reasonable data. Yes we see Linux OOMKiller is invoked and we are trying to read its report to see any meaningful information. In the meanwhile, we have noticed below scenario is causing OOMKilling of broker container. 1. There are lot of pending messages on a given queue TEST along with small pending messages on various other queues. Since we are using global max size, some of messages are loaded in memory and rest are in paging folder. 1. There are 3-4 consumers on TEST queue but they are very slow hence pending message backlog is not cleared. We see below log in broker: AMQ224127: Message dispatch from paging is blocked. Address TEST/Queue TEST will not read any more messages from paging until pending messages are acknowledged. There are currently 5150 messages pending (20972400 bytes) with max reads at maxPageReadMessages(-1) and maxPageReadBytes(20971520). Either increase reading attributes at the address-settings or change your consumers to acknowledge more often. 1. We also see below log in broker: AMQ224108: Stopped paging on address ‘TEST’; size=62986496 bytes (96016 messages); maxSize=-1 bytes (-1 messages); globalSize=430581015 bytes (158406 messages); globalMaxSize=4194304000 bytes (-1 messages); 1. If such blocked consumers and pending messages combination can cause broker pod to go into OOM which is running with 30 GB of heap and 40 GB of pod memory? 1. Since consumers were not consuming messages on time and gave consent to purge the messages, we tried to purge the message manually via broker GUI. Sometimes it worked and more messages got loaded from pages to broker memory but many times broker pod got OOM and restarted. 1. This cycle of successful purge or broker restart continued till all messages from pages were loaded into memory and purged. Post cleanup there was no broker restart. 1. If purging messages via broker GUI can cause OOM even though broker pod is running with 30 GB of heap and 40 GB of pod memory? 1. What is the best way to optimize the broker configuration in such cases where we will always have slow consumers and possibly lot of pending messages in memory and paging folders? This impacted broker pod A has a network bridge with another independent broker pod B in a hub and spoke model which has very less connection and almost no pending messages. We also noticed that if broker pod A goes into OOM due to slow consumer and pending messages as described above, second broker pod B which is connected over network bridge with first broker pod A also goes into restart loop with OOM. Does restart of source pod A and disconnection-reconnection of small numbers of bridges can cause target broker pod B to restart? We have seen this side effect as well. We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is 1.2.5. Best Regards Shiv From: Arthur Naseef <[email protected]<mailto:[email protected]>> Sent: 15 January 2026 06:40 AM To: [email protected]<mailto:[email protected]> Subject: Re: K8s broker pod getting killed with OOM Unverified Sender: The sender of this email has not been verified. Review the content of the message carefully and verify the identity of the sender before acting on this email: replying, opening attachments or clicking links. So 3100 connections is a large number, but that doesn't sound like a good reason for the broker pod to go OOM. Also, getting up to 40gb, I would say the 50% rule of thumb may be too conservative (i.e. a higher percentage could be reasonable), which is contradicted by your outcome. Are there other containers running in the same POD that might be taking up memory? Maybe sidecars? Unfortunately, I don't have a working kubernetes setup available right now. If I did, I could poke around and try to give specific tips on checking the memory use of the POD. Do you know if the Linux OOM killer is getting invoked? That would be reported by the kernel of the node on which the pod was executing. If you can view that report, it includes a lot of useful information, including all of the processes involved and the amount of memory used by each. Art On Wed, Jan 14, 2026 at 3:52 PM Shiv Kumar Dixit <[email protected]<mailto:[email protected]>> wrote: Thanks Clebert and Arthur for inputs. I will try your suggestions and let you know how it goes. I have another observation based on issue happening in live. Based on input from Arthur, current setup is configured with 20 GB heap and 40 GB pod. As the pod started, we got 3100 connections to broker and within minutes the pod got OOMKilled. If there is any relation b/w number of connections on broker and pod going OOM? Best Regards Shiv -----Original Message----- From: Clebert Suconic <[email protected]<mailto:[email protected]>> Sent: 15 January 2026 04:06 AM To: [email protected]<mailto:[email protected]> Subject: Re: K8s broker pod getting killed with OOM Unverified Sender: The sender of this email has not been verified. Review the content of the message carefully and verify the identity of the sender before acting on this email: replying, opening attachments or clicking links. so, in summary, what I'm recommending you is: use max-size-messages for all the queues.. for your large queues, use something like 10MB and for your small queues 100K also keep max-read-page-bytes in use... keep it at 20M If I could change the past I would have a max-size on every address we deploy, and having global-max-size for the upmost emergency case.. it's something I'm looking to change into artemis 3.0 or 4.0. (I can't change that into a minor version, as it could break certain cases... as some users that I know use heavy filtering and can't really rely on paging). On Wed, Jan 14, 2026 at 5:31 PM Clebert Suconic <[email protected]<mailto:[email protected]>> wrote: > > I would recommend against trusting global-max-size. and use max-size > for all the addresses. > > Also what is your reading attributes. I would recommending using the > new prefetch values. > > > > And also what operator are you using? arkmq? your own? > > On Wed, Jan 14, 2026 at 7:44 AM Shiv Kumar Dixit > <[email protected]<mailto:[email protected]>> > wrote: > > > > We are hosting Artemis broker in Kubernetes using operator-based solution. > > We deploy the broker as statefulset with 2 or 4 replicas. We assign for > > e.g. 6 GB for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for > > global-max-size. All addresses normally use -1 for max-size-bytes but some > > less frequently used queues are defined with 100KB for max-size-bytes to > > allow early paging. > > > > > > > > We have following observations: > > > > 1. As the broker pod starts, broker container immediately occupies 6 GB for > > max heap. It seems expected as both min and max heap are same. > > > > 2. Pod memory usage starts with 6+ GB and once we have pending messages, > > good producers and consumers connect to broker, invalid SSL attempts > > happen, broker GUI access happens etc. during normal broker operations - > > pod memory usage keeps increasing and now reaches 9 GB. > > > > 3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event > > and restarts the pod. Here we don’t see broker container getting killed > > with OOM rather pod is killed and restarted. It forces the broker to > > restart. > > > > 4. We have configured artemis.profile to capture memory dump in case of OOM > > of broker but it never happens. So, we are assuming broker process is not > > going out of memory, but pod is going out of memory due to increased > > non-heap usage. > > > > 5. Only way to recover here is to increase heap and pod memory limits from > > 6 GB and 9 GB to higher values and wait for next re-occurrence. > > > > > > > > 1. Is there any way to analyse what is going wrong with non-heap native > > memory usage? > > > > 2. If non-heap native memory is expected to increase to such extent due to > > pending messages, SSL errors etc.? > > > > 3. Is there any param we can use to restrict the non-heap native memory > > usage? > > > > 4. If netty which handles connection aspect of broker can create such > > memory consumption and cause OOM of pod? > > > > 5. Can we have any monitoring param that can hint that pod is potentially > > in danger of getting killed? > > > > > > > > Thanks > > > > Shiv > > > > -- > Clebert Suconic -- Clebert Suconic --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]> -- Clebert Suconic -- Clebert Suconic
