We are hosting Artemis broker in Kubernetes using operator-based solution. We deploy the broker as statefulset with 2 or 4 replicas. We assign for e.g. 6 GB for heap and 9 GB for pod, 1.2 GB (1/5 of max heap) for global-max-size. All addresses normally use -1 for max-size-bytes but some less frequently used queues are defined with 100KB for max-size-bytes to allow early paging.
We have following observations: 1. As the broker pod starts, broker container immediately occupies 6 GB for max heap. It seems expected as both min and max heap are same. 2. Pod memory usage starts with 6+ GB and once we have pending messages, good producers and consumers connect to broker, invalid SSL attempts happen, broker GUI access happens etc. during normal broker operations - pod memory usage keeps increasing and now reaches 9 GB. 3. Once the pod hits limit of 9 GB, K8s kills the pod with OOMKilling event and restarts the pod. Here we don't see broker container getting killed with OOM rather pod is killed and restarted. It forces the broker to restart. 4. We have configured artemis.profile to capture memory dump in case of OOM of broker but it never happens. So, we are assuming broker process is not going out of memory, but pod is going out of memory due to increased non-heap usage. 5. Only way to recover here is to increase heap and pod memory limits from 6 GB and 9 GB to higher values and wait for next re-occurrence. 1. Is there any way to analyse what is going wrong with non-heap native memory usage? 2. If non-heap native memory is expected to increase to such extent due to pending messages, SSL errors etc.? 3. Is there any param we can use to restrict the non-heap native memory usage? 4. If netty which handles connection aspect of broker can create such memory consumption and cause OOM of pod? 5. Can we have any monitoring param that can hint that pod is potentially in danger of getting killed? Thanks Shiv
