We are hosting Artemis broker in Kubernetes using operator-based solution. We 
deploy the broker as statefulset with 2 or 4 replicas. We assign for e.g. 6 GB 
for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for global-max-size. All 
addresses normally use -1 for max-size-bytes but some less frequently used 
queues are defined with 100KB for max-size-bytes to allow early paging.

We have following observations:
1. As the broker pod starts, broker container immediately occupies 6 GB for max 
heap. It seems expected as both min and max heap are same.
2. Pod memory usage starts with 6+ GB and once we have pending messages, good 
producers and consumers connect to broker, invalid SSL attempts happen, broker 
GUI access happens etc. during normal broker operations - pod memory usage 
keeps increasing and now reaches 9 GB.
3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event and 
restarts the pod. Here we don't see broker container getting killed with OOM 
rather pod is killed and restarted. It forces the broker to restart.
4. We have configured artemis.profile to capture memory dump in case of OOM of 
broker but it never happens. So, we are assuming broker process is not going 
out of memory, but pod is going out of memory due to increased non-heap usage.
5. Only way to recover here is to increase heap and pod memory limits from 6 GB 
and 9 GB to higher values and wait for next re-occurrence.

1. Is there any way to analyse what is going wrong with non-heap native memory 
usage?
2. If non-heap native memory is expected to increase to such extent due to 
pending messages, SSL errors etc.?
3. Is there any param we can use to restrict the non-heap native memory usage?
4. If netty which handles connection aspect of broker can create such memory 
consumption and cause OOM of pod?
5. Can we have any monitoring param that can hint that pod is potentially in 
danger of getting killed?

Thanks
Shiv

Reply via email to