Hi Ben, Julien and all, To follow up on my issue from last week, the OOM loop does occur even with Prometheus 2.19.2.
This time around the instance has just enough memory to complete WAL replay but it OOMs immediately after that, this could be an improvement or just a coincidence. The WAL folder is about 16GB and the OOM occurs at around 43GB (due to the Kubernetes worker running out of memory). Anything else I could try? Thanks, Vik On Wed, 1 Jul 2020 at 19:10, Viktor Radnai <[email protected]> wrote: > Hi Julien, > > Thanks for clarifying that. In that case I'll see if the issue will recur > with 2.19.2 in the next few weeks. > > Vik > > On Wed, 1 Jul 2020 at 19:08, Julien Pivotto <[email protected]> > wrote: > >> When 2.19 will run then it will create mmaped head which will improve >> that. >> >> I agree that starting 2.19 with a 2.18 wal won't make a change. >> >> Le mer. 1 juil. 2020 à 19:55, Viktor Radnai <[email protected]> a >> écrit : >> >>> Hi again Ben, >>> >>> Unfortunately upgrading to 2.19.2 does not solve the startup issue. >>> Prometheus gets OOMKilled before even starting to parse the last 25 >>> segments which represent the last 50 minutes worth of data. Based on this >>> the estimated memory requirement should be somewhere between 60-70GB but >>> the iworker node only has 52GB. The other Prometheus pod currently consumes >>> 7.7GB. >>> >>> The left of the graph is 2.18.1, the right is 2.19.2. I inadvertently >>> reinstated a previously set 40GB memory limit and updated the replicaset to >>> increase it back to 50GB -- this is the reason for the second Prometheus >>> restart and the slightly higher plateau for the last two OOMs. >>> >>> Unless there is a way to move some WAL segments out and the restore them >>> later, I'll try to delete the last 50 minutes worth of segments to get the >>> pod to come up. >>> >>> Thanks, >>> Vik >>> >>> On Wed, 1 Jul 2020 at 16:39, Viktor Radnai <[email protected]> >>> wrote: >>> >>>> Hi Ben, >>>> >>>> We are running 2.18.1 -- I will upgrade to 2.19.2 and see if this >>>> solves the problem. I currently have one of the two replicas in production >>>> crashlooping so I'll try to roll this out in the next few hours and report >>>> back. >>>> >>>> Thanks, >>>> Vik >>>> >>>> On Wed, 1 Jul 2020 at 16:32, Ben Kochie <[email protected]> wrote: >>>> >>>>> What version of Prometheus do you have deployed? We've made several >>>>> major improvements to WAL handling and startup in the last couple of >>>>> releases. >>>>> >>>>> I would recommend upgrading to 2.19.2 if you haven't. >>>>> >>>>> On Wed, Jul 1, 2020 at 5:06 PM Viktor Radnai <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> We have a recurring problem with Prometheus repeatedly getting >>>>>> OOMKilled on startup while trying to process the write ahead log. I tried >>>>>> to look through Github issues but there was no solution or currently open >>>>>> issue as far as I could see. >>>>>> >>>>>> We are running on Kubernetes in GKE using the prometheus-operator >>>>>> Helm chart, using Google Cloud's Preemptible VMs. These VMs get killed >>>>>> every 24 hours maximum, so our Prometheus pods also get killed and >>>>>> automatically migrated by Kubernetes (the data is on a persistent volume >>>>>> of >>>>>> course). To avoid loss of metrics, we run two identically configured >>>>>> replicas with their own storage, scraping all the same targets. >>>>>> >>>>>> We monitor numerous GCE VMs that do batch processing, running >>>>>> anywhere between a few minutes to several hours. This workload is bursty, >>>>>> fluctuating between tens and hundreds of VMs active at any time, so >>>>>> sometimes the Prometheus wal folder grows to between 10-15GB in size. >>>>>> Prometheus usually handles this workload with about half a CPU core and >>>>>> 8GB >>>>>> of RAM and if left to its own devices, the wal folder will shrink again >>>>>> when the load decreases. >>>>>> >>>>>> The problem is that when there is a backlog and Prometheus is >>>>>> restarted (due to the preemptive VM going away), it will use several >>>>>> times >>>>>> more RAM to recover the wal folder. This often exhausts all the available >>>>>> memory on the Kubernetes worker, so Prometheus is killed by the OOM >>>>>> killed >>>>>> over and over again, until I log in and delete the wal folder, losing >>>>>> several hours of metrics. I have already doubled the size of the VMs just >>>>>> to accommodate Prometheus and I am reluctant to do this again. Running >>>>>> non-preemptive VMs would triple the cost of these instances and >>>>>> Prometheus >>>>>> might still get restarted when we roll out an update -- so this would >>>>>> probably not even solve the issue properly. >>>>>> >>>>>> I don't know if there is something special in our use case, but I did >>>>>> come across a blog describing the same high memory usage behaviour on >>>>>> startup. >>>>>> >>>>>> I feel that unless there is a fix I can do, this would warrant either >>>>>> a bug or feature request -- Prometheus should be able to recover without >>>>>> operator intervention or losing metrics. And for a process running on >>>>>> Kubernetes, we should be able to set memory "request" and "limit" values >>>>>> that are close to actual expected usage, rather than 3-4 times the steady >>>>>> state usage just to accommodate the memory requirements of the startup >>>>>> phase. >>>>>> >>>>>> Please let me know what information I should provide, if any. I have >>>>>> some graph screenshots that would be relevant. >>>>>> >>>>>> Many thanks, >>>>>> Vik >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Prometheus Users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>> >>>> -- >>>> My other sig is hilarious >>>> >>> >>> >>> -- >>> My other sig is hilarious >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/CANx-tGj6rBmimfUVGwuWD1%3D03fdvkCeYOote1huXBN2Kh2n08A%40mail.gmail.com >>> <https://groups.google.com/d/msgid/prometheus-users/CANx-tGj6rBmimfUVGwuWD1%3D03fdvkCeYOote1huXBN2Kh2n08A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> > > -- > My other sig is hilarious > -- My other sig is hilarious -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CANx-tGh-QORdQ_PiSVzXGfj-9FfFc9tQKx%3D5AwJ%3DCrM6n4pqgw%40mail.gmail.com.

