Re: K8s broker pod getting killed with OOM

Clebert Suconic Thu, 15 Jan 2026 06:56:43 -0800

>>
We are using Artemis 2.37.0 version in K8s and Artemis IO operator version
is 1.2.5.


Which one? the commercial version from Red Hat / openshift, or the
opensource one from ArkMQ?

On Thu, Jan 15, 2026 at 9:51 AM Clebert Suconic <[email protected]>
wrote:

> Those points you described are the reason why I suggesting using
> max-address-size on the every destination...
>
>
> have max-size at say 20M for every destination (make 100K for small
> destinations if you like, but I think 20M for every destination is mostly
> okay, unless you have a lot of destinations).
>
>
> have max-read at 10M...
>
> This should then optimize your memory usage
>
> On Thu, Jan 15, 2026 at 6:53 AM Shiv Kumar Dixit <
> [email protected]> wrote:
>
>> Hello Arthur and Clebert
>>
>> When our broker pod starts, it first starts 2 init containers which
>> terminate and release resources after completing the setup. So our pod
>> basically runs 2 containers – one for vault and another for broker. We
>> verified the memory and CPU usage of these init containers and main
>> containers using top pod and it shows reasonable data.
>>
>>
>>
>> Yes we see Linux OOMKiller is invoked and we are trying to read its
>> report to see any meaningful information.
>>
>>
>>
>> In the meanwhile, we have noticed below scenario is causing OOMKilling of
>> broker container.
>>
>>    1. There are lot of pending messages on a given queue TEST along with
>>    small pending messages on various other queues. Since we are using global
>>    max size, some of messages are loaded in memory and rest are in paging
>>    folder.
>>
>>
>>
>>    2. There are 3-4 consumers on TEST queue but they are very slow hence
>>    pending message backlog is not cleared. We see below log in broker:
>>
>> AMQ224127: Message dispatch from paging is blocked. Address TEST/Queue
>> TEST will not read any more messages from paging until pending messages are
>> acknowledged. There are currently 5150 messages pending (20972400 bytes)
>> with max reads at maxPageReadMessages(-1) and maxPageReadBytes(20971520).
>> Either increase reading attributes at the address-settings or change your
>> consumers to acknowledge more often.
>>
>>
>>
>>    3. We also see below log in broker:
>>
>> AMQ224108: Stopped paging on address ‘TEST’; size=62986496 bytes (96016
>> messages); maxSize=-1 bytes (-1 messages); globalSize=430581015 bytes
>> (158406 messages); globalMaxSize=4194304000 bytes (-1 messages);
>>
>>
>>
>>    4. If such blocked consumers and pending messages combination can
>>    cause broker pod to go into OOM which is running with 30 GB of heap and 40
>>    GB of pod memory?
>>
>>
>>
>>    5. Since consumers were not consuming messages on time and gave
>>    consent to purge the messages, we tried to purge the message manually via
>>    broker GUI. Sometimes it worked and more messages got loaded from pages to
>>    broker memory but many times broker pod got OOM and restarted.
>>
>>
>>
>>    6. This cycle of successful purge or broker restart continued till
>>    all messages from pages were loaded into memory and purged. Post cleanup
>>    there was no broker restart.
>>
>>
>>
>>    7. If purging messages via broker GUI can cause OOM even though
>>    broker pod is running with 30 GB of heap and 40 GB of pod memory?
>>
>>
>>
>>    8. What is the best way to optimize the broker configuration in such
>>    cases where we will always have slow consumers and possibly lot of pending
>>    messages in memory and paging folders?
>>
>>
>>
>> This impacted broker pod A has a network bridge with another independent
>> broker pod B in a hub and spoke model which has very less connection and
>> almost no pending messages. We also noticed that if broker pod A goes into
>> OOM due to slow consumer and pending messages as described above, second
>> broker pod B which is connected over network bridge with first broker pod A
>> also goes into restart loop with OOM. Does restart of source pod A and
>> disconnection-reconnection of small numbers of bridges can cause target
>> broker pod B to restart? We have seen this side effect as well.
>>
>>
>>
>> We are using Artemis 2.37.0 version in K8s and Artemis IO operator
>> version is 1.2.5.
>>
>>
>>
>> Best Regards
>>
>> Shiv
>>
>>
>>
>> *From:* Arthur Naseef <[email protected]>
>> *Sent:* 15 January 2026 06:40 AM
>> *To:* [email protected]
>> *Subject:* Re: K8s broker pod getting killed with OOM
>>
>>
>>
>>
>>
>> *Unverified Sender: *The sender of this email has not been verified.
>> Review the content of the message carefully and verify the identity of the
>> sender before acting on this email: replying, opening attachments or
>> clicking links.
>>
>>
>>
>> So 3100 connections is a large number, but that doesn't sound like a good
>> reason for the broker pod to go OOM.  Also, getting up to 40gb, I would say
>> the 50% rule of thumb may be too conservative (i.e. a higher percentage
>> could be reasonable), which is contradicted by your outcome.  Are there
>> other containers running in the same POD that might be taking up memory?
>> Maybe sidecars?
>>
>>
>>
>> Unfortunately, I don't have a working kubernetes setup available right
>> now.  If I did, I could poke around and try to give specific tips on
>> checking the memory use of the POD.
>>
>>
>>
>> Do you know if the Linux OOM killer is getting invoked?  That would be
>> reported by the kernel of the node on which the pod was executing.  If you
>> can view that report, it includes a lot of useful information, including
>> all of the processes involved and the amount of memory used by each.
>>
>>
>>
>> Art
>>
>>
>>
>>
>>
>> On Wed, Jan 14, 2026 at 3:52 PM Shiv Kumar Dixit <
>> [email protected]> wrote:
>>
>> Thanks Clebert and Arthur for inputs. I will try your suggestions and let
>> you know how it goes.
>>
>> I have another observation based on issue happening in live. Based on
>> input from Arthur, current setup is configured with 20 GB heap and 40 GB
>> pod. As the pod started, we got 3100 connections to broker and within
>> minutes the pod got OOMKilled. If there is any relation b/w number of
>> connections on broker and pod going OOM?
>>
>> Best Regards
>> Shiv
>>
>> -----Original Message-----
>> From: Clebert Suconic <[email protected]>
>> Sent: 15 January 2026 04:06 AM
>> To: [email protected]
>> Subject: Re: K8s broker pod getting killed with OOM
>>
>>
>>
>> Unverified Sender: The sender of this email has not been verified. Review
>> the content of the message carefully and verify the identity of the sender
>> before acting on this email: replying, opening attachments or clicking
>> links.
>>
>>
>> so, in summary, what I'm recommending you is:
>>
>> use max-size-messages for all the queues.. for your large queues, use
>> something like 10MB and for your small queues 100K
>>
>> also keep max-read-page-bytes in use... keep it at 20M
>>
>>
>>
>> If I could change the past I would have a max-size on every address we
>> deploy, and having global-max-size for the upmost emergency case..
>> it's something I'm looking to change into artemis 3.0 or 4.0. (I can't
>> change that into a minor version, as it could break certain cases...
>> as some users that I know use heavy filtering and can't really rely on
>> paging).
>>
>>
>> On Wed, Jan 14, 2026 at 5:31 PM Clebert Suconic <
>> [email protected]> wrote:
>> >
>> > I would recommend against trusting global-max-size. and use max-size
>> > for all the addresses.
>> >
>> > Also what is your reading attributes. I would recommending using the
>> > new prefetch values.
>> >
>> >
>> >
>> > And also what operator are you using? arkmq? your own?
>> >
>> > On Wed, Jan 14, 2026 at 7:44 AM Shiv Kumar Dixit
>> > <[email protected]> wrote:
>> > >
>> > > We are hosting Artemis broker in Kubernetes using operator-based
>> solution. We deploy the broker as statefulset with 2 or 4 replicas. We
>> assign for e.g. 6 GB for heap and 9 GB for pod, 1.2 GB (1/5 of max heap)
>> for global-max-size. All addresses normally use -1 for max-size-bytes but
>> some less frequently used queues are defined with 100KB for max-size-bytes
>> to allow early paging.
>> > >
>> > >
>> > >
>> > > We have following observations:
>> > >
>> > > 1. As the broker pod starts, broker container immediately occupies 6
>> GB for max heap. It seems expected as both min and max heap are same.
>> > >
>> > > 2. Pod memory usage starts with 6+ GB and once we have pending
>> messages, good producers and consumers connect to broker, invalid SSL
>> attempts happen, broker GUI access happens etc. during normal broker
>> operations - pod memory usage keeps increasing and now reaches 9 GB.
>> > >
>> > > 3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling
>> event and restarts the pod. Here we don’t see broker container getting
>> killed with OOM rather pod is killed and restarted. It forces the broker to
>> restart.
>> > >
>> > > 4. We have configured artemis.profile to capture memory dump in case
>> of OOM of broker but it never happens. So, we are assuming broker process
>> is not going out of memory, but pod is going out of memory due to increased
>> non-heap usage.
>> > >
>> > > 5. Only way to recover here is to increase heap and pod memory limits
>> from 6 GB and 9 GB to higher values and wait for next re-occurrence.
>> > >
>> > >
>> > >
>> > > 1. Is there any way to analyse what is going wrong with non-heap
>> native memory usage?
>> > >
>> > > 2. If non-heap native memory is expected to increase to such extent
>> due to pending messages, SSL errors etc.?
>> > >
>> > > 3. Is there any param we can use to restrict the non-heap native
>> memory usage?
>> > >
>> > > 4. If netty which handles connection aspect of broker can create such
>> memory consumption and cause OOM of pod?
>> > >
>> > > 5. Can we have any monitoring param that can hint that pod is
>> potentially in danger of getting killed?
>> > >
>> > >
>> > >
>> > > Thanks
>> > >
>> > > Shiv
>> >
>> >
>> >
>> > --
>> > Clebert Suconic
>>
>>
>>
>> --
>> Clebert Suconic
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> --
> Clebert Suconic
>


-- 
Clebert Suconic

Re: K8s broker pod getting killed with OOM

Reply via email to