I have been thinking about this problem as well, since we ran into a
similar issue yesterday. In our case, Prometheus had already failed to
write out a TSDB block for a few hours but kept on piling data into the
head block.

Could TSDB write out blocks *during* WAL recovery? Say, for every two
hours' worth of WAL or even more frequently, it could pause recovery, write
a block, delete the WAL up to that point, continue recovery. This would put
something of a bound on the memory usage during recovery, and alleviate the
issue that recovery from out-of-memory takes *even more memory*.

Would this help in your case?

/MR


On Wed, Jul 1, 2020 at 3:06 PM Viktor Radnai <[email protected]>
wrote:

> Hi all,
>
> We have a recurring problem with Prometheus repeatedly getting OOMKilled
> on startup while trying to process the write ahead log. I tried to look
> through Github issues but there was no solution or currently open issue as
> far as I could see.
>
> We are running on Kubernetes in GKE using the prometheus-operator Helm
> chart, using Google Cloud's Preemptible VMs. These VMs get killed every 24
> hours maximum, so our Prometheus pods also get killed and automatically
> migrated by Kubernetes (the data is on a persistent volume of course). To
> avoid loss of metrics, we run two identically configured replicas with
> their own storage, scraping all the same targets.
>
> We monitor numerous GCE VMs that do batch processing, running anywhere
> between a few minutes to several hours. This workload is bursty,
> fluctuating between tens and hundreds of VMs active at any time, so
> sometimes the Prometheus wal folder grows to  between 10-15GB in size.
> Prometheus usually handles this workload with about half a CPU core and 8GB
> of RAM and if left to its own devices, the wal folder will shrink again
> when the load decreases.
>
> The problem is that when there is a backlog and Prometheus is restarted
> (due to the preemptive VM going away), it will use several times more RAM
> to recover the wal folder. This often exhausts all the available memory on
> the Kubernetes worker, so Prometheus is killed by the OOM killed over and
> over again, until I log in and delete the wal folder, losing several hours
> of metrics. I have already doubled the size of the VMs just to accommodate
> Prometheus and I am reluctant to do this again. Running non-preemptive VMs
> would triple the cost of these instances and Prometheus might still get
> restarted when we roll out an update -- so this would probably not even
> solve the issue properly.
>
> I don't know if there is something special in our use case, but I did come
> across a blog describing the same high memory usage behaviour on startup.
>
> I feel that unless there is a fix I can do, this would warrant either a
> bug or feature request -- Prometheus should be able to recover without
> operator intervention or losing metrics. And for a process running on
> Kubernetes, we should be able to set memory "request" and "limit" values
> that are close to actual expected usage, rather than 3-4 times the steady
> state usage just to accommodate the memory requirements of the startup
> phase.
>
> Please let me know what information I should provide, if any. I have some
> graph screenshots that would be relevant.
>
> Many thanks,
> Vik
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gb5nbM3a1hXOPmeJxFN6mhAFbkkY7Ec21iGxbxPUtA4pQ%40mail.gmail.com.

Reply via email to