RE: K8s broker pod getting killed with OOM

Shiv Kumar Dixit Thu, 15 Jan 2026 08:08:29 -0800

We are using open source one from ArkMQ

Best Regards
Shiv

From: Clebert Suconic <[email protected]>
Sent: 15 January 2026 08:26 PM
To: [email protected]
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

>>
We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is 
1.2.5.

Which one? the commercial version from Red Hat / openshift, or the opensource 
one from ArkMQ?

On Thu, Jan 15, 2026 at 9:51 AM Clebert Suconic 
<[email protected]<mailto:[email protected]>> wrote:
Those points you described are the reason why I suggesting using 
max-address-size on the every destination...

have max-size at say 20M for every destination (make 100K for small 
destinations if you like, but I think 20M for every destination is mostly okay, 
unless you have a lot of destinations).

have max-read at 10M...

This should then optimize your memory usage

On Thu, Jan 15, 2026 at 6:53 AM Shiv Kumar Dixit 
<[email protected]<mailto:[email protected]>> 
wrote:
Hello Arthur and Clebert
When our broker pod starts, it first starts 2 init containers which terminate 
and release resources after completing the setup. So our pod basically runs 2 
containers – one for vault and another for broker. We verified the memory and 
CPU usage of these init containers and main containers using top pod and it 
shows reasonable data.

Yes we see Linux OOMKiller is invoked and we are trying to read its report to 
see any meaningful information.

In the meanwhile, we have noticed below scenario is causing OOMKilling of 
broker container.

  1.  There are lot of pending messages on a given queue TEST along with small 
pending messages on various other queues. Since we are using global max size, 
some of messages are loaded in memory and rest are in paging folder.

  1.  There are 3-4 consumers on TEST queue but they are very slow hence 
pending message backlog is not cleared. We see below log in broker:

AMQ224127: Message dispatch from paging is blocked. Address TEST/Queue TEST 
will not read any more messages from paging until pending messages are 
acknowledged. There are currently 5150 messages pending (20972400 bytes) with 
max reads at maxPageReadMessages(-1) and maxPageReadBytes(20971520). Either 
increase reading attributes at the address-settings or change your consumers to 
acknowledge more often.

  1.  We also see below log in broker:

AMQ224108: Stopped paging on address ‘TEST’; size=62986496 bytes (96016 
messages); maxSize=-1 bytes (-1 messages); globalSize=430581015 bytes (158406 
messages); globalMaxSize=4194304000 bytes (-1 messages);

  1.  If such blocked consumers and pending messages combination can cause 
broker pod to go into OOM which is running with 30 GB of heap and 40 GB of pod 
memory?

  1.  Since consumers were not consuming messages on time and gave consent to 
purge the messages, we tried to purge the message manually via broker GUI. 
Sometimes it worked and more messages got loaded from pages to broker memory 
but many times broker pod got OOM and restarted.

  1.  This cycle of successful purge or broker restart continued till all 
messages from pages were loaded into memory and purged. Post cleanup there was 
no broker restart.

  1.  If purging messages via broker GUI can cause OOM even though broker pod 
is running with 30 GB of heap and 40 GB of pod memory?

  1.  What is the best way to optimize the broker configuration in such cases 
where we will always have slow consumers and possibly lot of pending messages 
in memory and paging folders?

This impacted broker pod A has a network bridge with another independent broker 
pod B in a hub and spoke model which has very less connection and almost no 
pending messages. We also noticed that if broker pod A goes into OOM due to 
slow consumer and pending messages as described above, second broker pod B 
which is connected over network bridge with first broker pod A also goes into 
restart loop with OOM. Does restart of source pod A and 
disconnection-reconnection of small numbers of bridges can cause target broker 
pod B to restart? We have seen this side effect as well.

We are using Artemis 2.37.0 version in K8s and Artemis IO operator version is 
1.2.5.

Best Regards
Shiv

From: Arthur Naseef <[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 06:40 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

So 3100 connections is a large number, but that doesn't sound like a good 
reason for the broker pod to go OOM.  Also, getting up to 40gb, I would say the 
50% rule of thumb may be too conservative (i.e. a higher percentage could be 
reasonable), which is contradicted by your outcome.  Are there other containers 
running in the same POD that might be taking up memory?  Maybe sidecars?

Unfortunately, I don't have a working kubernetes setup available right now.  If 
I did, I could poke around and try to give specific tips on checking the memory 
use of the POD.

Do you know if the Linux OOM killer is getting invoked?  That would be reported 
by the kernel of the node on which the pod was executing.  If you can view that 
report, it includes a lot of useful information, including all of the processes 
involved and the amount of memory used by each.

Art

On Wed, Jan 14, 2026 at 3:52 PM Shiv Kumar Dixit 
<[email protected]<mailto:[email protected]>> 
wrote:
Thanks Clebert and Arthur for inputs. I will try your suggestions and let you 
know how it goes.

I have another observation based on issue happening in live. Based on input 
from Arthur, current setup is configured with 20 GB heap and 40 GB pod. As the 
pod started, we got 3100 connections to broker and within minutes the pod got 
OOMKilled. If there is any relation b/w number of connections on broker and pod 
going OOM?

Best Regards
Shiv

-----Original Message-----
From: Clebert Suconic 
<[email protected]<mailto:[email protected]>>
Sent: 15 January 2026 04:06 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: K8s broker pod getting killed with OOM

Unverified Sender: The sender of this email has not been verified. Review the 
content of the message carefully and verify the identity of the sender before 
acting on this email: replying, opening attachments or clicking links.

so, in summary, what I'm recommending you is:

use max-size-messages for all the queues.. for your large queues, use something 
like 10MB and for your small queues 100K

also keep max-read-page-bytes in use... keep it at 20M

If I could change the past I would have a max-size on every address we deploy, 
and having global-max-size for the upmost emergency case..
it's something I'm looking to change into artemis 3.0 or 4.0. (I can't change 
that into a minor version, as it could break certain cases...
as some users that I know use heavy filtering and can't really rely on paging).

On Wed, Jan 14, 2026 at 5:31 PM Clebert Suconic 
<[email protected]<mailto:[email protected]>> wrote:
>
> I would recommend against trusting global-max-size. and use max-size
> for all the addresses.
>
> Also what is your reading attributes. I would recommending using the
> new prefetch values.
>
>
>
> And also what operator are you using? arkmq? your own?
>
> On Wed, Jan 14, 2026 at 7:44 AM Shiv Kumar Dixit
> <[email protected]<mailto:[email protected]>> 
> wrote:
> >
> > We are hosting Artemis broker in Kubernetes using operator-based solution. 
> > We deploy the broker as statefulset with 2 or 4 replicas. We assign for 
> > e.g. 6 GB for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for 
> > global-max-size. All addresses normally use -1 for max-size-bytes but some 
> > less frequently used queues are defined with 100KB for max-size-bytes to 
> > allow early paging.
> >
> >
> >
> > We have following observations:
> >
> > 1. As the broker pod starts, broker container immediately occupies 6 GB for 
> > max heap. It seems expected as both min and max heap are same.
> >
> > 2. Pod memory usage starts with 6+ GB and once we have pending messages, 
> > good producers and consumers connect to broker, invalid SSL attempts 
> > happen, broker GUI access happens etc. during normal broker operations - 
> > pod memory usage keeps increasing and now reaches 9 GB.
> >
> > 3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event 
> > and restarts the pod. Here we don’t see broker container getting killed 
> > with OOM rather pod is killed and restarted. It forces the broker to 
> > restart.
> >
> > 4. We have configured artemis.profile to capture memory dump in case of OOM 
> > of broker but it never happens. So, we are assuming broker process is not 
> > going out of memory, but pod is going out of memory due to increased 
> > non-heap usage.
> >
> > 5. Only way to recover here is to increase heap and pod memory limits from 
> > 6 GB and 9 GB to higher values and wait for next re-occurrence.
> >
> >
> >
> > 1. Is there any way to analyse what is going wrong with non-heap native 
> > memory usage?
> >
> > 2. If non-heap native memory is expected to increase to such extent due to 
> > pending messages, SSL errors etc.?
> >
> > 3. Is there any param we can use to restrict the non-heap native memory 
> > usage?
> >
> > 4. If netty which handles connection aspect of broker can create such 
> > memory consumption and cause OOM of pod?
> >
> > 5. Can we have any monitoring param that can hint that pod is potentially 
> > in danger of getting killed?
> >
> >
> >
> > Thanks
> >
> > Shiv
>
>
>
> --
> Clebert Suconic

--
Clebert Suconic

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

--
Clebert Suconic

--
Clebert Suconic

RE: K8s broker pod getting killed with OOM

Reply via email to