Hello,

First of all, this is my stack:

- Ubuntu 22.04.3 on x86/64 with 2GM of physical RAM that has been enough
for years.
- Java 11.0.20.1+1-post-Ubuntu-0ubuntu122.04 / openjdk 11.0.20.1 2023-08-24
- Tomcat 9.0.58 (JAVA_OPTS="-Djava.awt.headless=true -Xmx900m -Xms16m
......")
- My app, which I developed myself, and has been running without any OOM
crashes for years

Well, a couple of weeks ago my website started crushing about every 5-7
days. Between crashes the RAM usage is fine and very steady (as it has been
for years) and it uses just about 50% of the "Max memory" (according to
what the Tomcat Manager server status shows). The 3 types of G1 heap are
steady and low. And there are no leaks as far as I can tell. And I haven't
made any significant changes to my app in the last months.

When my website crashes, I can see on the Ubuntu log that some process has
invoked the "oom-killer" and that this killer investigates which process is
using most of the RAM and it is Tomcat/Java so it kills it. This is what I
see on the log when it was Nginx that invoked the OOM-killer:

Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597771]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nginx.service,mems_allowed=0,global_oom,task_memcg=/system.slice/tomcat9.service,task=java,pid=470,uid=998
Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597932] Out of memory:
Killed process 470 (java) total-vm:4553056kB, anon-rss:1527944kB,
file-rss:2872kB, shmem-rss:0kB, UID:998 pgtables:3628kB oom_score_adj:0

I would like to be able to know what was happening inside the JVM when it
was using too much RAM and deserved to be killed. Was it a problem in Java
not associated with Tomcat or my app? Was it Tomcat itself that ate too
much RAM? I doubt it. Was it my application? If it was my application (and
I have to assume it was), how/why was it using all that RAM? What were the
objects, threads, etc that were involved in the crash? What part of the
heap memory was using all that RAM?

This can happen at any time, like at 4am so I can not run to the computer
to see what was going on at that moment. I need some way to get a detailed
log of what was going on when the crush took place.

So my question is, what tool should I use to investigate these crashes? I
have started trying to make "New Relic" work since it seems that this
service could help me, but I am having some problems making it work and I
still don't know if this would be a solution in the first place. So, while
I struggle with New Relic, I would appreciate your suggestions.

Thanks in advance!

Reply via email to