Thanks, Christian. Today I noticed something that is totally new to me. Prometheus went down and I got the query because of which it went down but strangely at that time I checked the server did not go OOM, the Memory dropped directly from constant usage of 77% to zero, but usually when a Query takes a long time the Memory usage spikes up which causes the Prometheus to crash because of OOM. This time there was no sudden spike in either CPU or Memory Utilization.
Any thoughts on this? On Monday, November 9, 2020 at 5:31:18 PM UTC+5:30 Christian Hoffmann wrote: > Hi, > > On 11/9/20 10:56 AM, [email protected] wrote: > > Hi. I am using Promtheus v 2.20.1 and suddenly my Prometheus crashed > > because of Memory overshoot. How to pinpoint what caused the Prometheus > > to go OOM or which query caused the Prometheus go OOM? > > Prometheus writes the currently active queries to a file which is read > upon restart. Prometheus will print all unfinished queries, see here: > > > https://www.robustperception.io/what-queries-were-running-when-prometheus-died > > This should help pin-pointing the relevant queries. > > Often it's some combination of querying long timestamps and/or high > cardinality metrics. > > Kind regards, > Christian > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1bfe152b-bf4a-4c33-85a0-9ad9637a241fn%40googlegroups.com.

