Hi all,

We have a recurring problem with Prometheus repeatedly getting OOMKilled on
startup while trying to process the write ahead log. I tried to look
through Github issues but there was no solution or currently open issue as
far as I could see.

We are running on Kubernetes in GKE using the prometheus-operator Helm
chart, using Google Cloud's Preemptible VMs. These VMs get killed every 24
hours maximum, so our Prometheus pods also get killed and automatically
migrated by Kubernetes (the data is on a persistent volume of course). To
avoid loss of metrics, we run two identically configured replicas with
their own storage, scraping all the same targets.

We monitor numerous GCE VMs that do batch processing, running anywhere
between a few minutes to several hours. This workload is bursty,
fluctuating between tens and hundreds of VMs active at any time, so
sometimes the Prometheus wal folder grows to  between 10-15GB in size.
Prometheus usually handles this workload with about half a CPU core and 8GB
of RAM and if left to its own devices, the wal folder will shrink again
when the load decreases.

The problem is that when there is a backlog and Prometheus is restarted
(due to the preemptive VM going away), it will use several times more RAM
to recover the wal folder. This often exhausts all the available memory on
the Kubernetes worker, so Prometheus is killed by the OOM killed over and
over again, until I log in and delete the wal folder, losing several hours
of metrics. I have already doubled the size of the VMs just to accommodate
Prometheus and I am reluctant to do this again. Running non-preemptive VMs
would triple the cost of these instances and Prometheus might still get
restarted when we roll out an update -- so this would probably not even
solve the issue properly.

I don't know if there is something special in our use case, but I did come
across a blog describing the same high memory usage behaviour on startup.

I feel that unless there is a fix I can do, this would warrant either a bug
or feature request -- Prometheus should be able to recover without operator
intervention or losing metrics. And for a process running on Kubernetes, we
should be able to set memory "request" and "limit" values that are close to
actual expected usage, rather than 3-4 times the steady state usage just to
accommodate the memory requirements of the startup phase.

Please let me know what information I should provide, if any. I have some
graph screenshots that would be relevant.

Many thanks,
Vik

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com.

Reply via email to