Re: [prometheus-users] Prometheus wal folder and memory usage on startup

Ben Kochie Wed, 08 Jul 2020 12:07:22 -0700

* Get a bigger server
* Reduce the number of metrics you collect
* Shard your server
Probably some combination of all of these.


On Wed, Jul 8, 2020 at 8:21 PM Viktor Radnai <[email protected]>
wrote:

> Hi Ben, Julien and all,
>
> To follow up on my issue from last week, the OOM loop does occur even with
> Prometheus 2.19.2.
>
> This time around the instance has just enough memory to complete WAL
> replay but it OOMs immediately after that, this could be an improvement or
> just a coincidence. The WAL folder is about 16GB and the OOM occurs at
> around 43GB (due to the Kubernetes worker running out of memory). Anything
> else I could try?
>
> Thanks,
> Vik
>
> On Wed, 1 Jul 2020 at 19:10, Viktor Radnai <[email protected]>
> wrote:
>
>> Hi Julien,
>>
>> Thanks for clarifying that. In that case I'll see if the issue will recur
>> with 2.19.2 in the next few weeks.
>>
>> Vik
>>
>> On Wed, 1 Jul 2020 at 19:08, Julien Pivotto <[email protected]>
>> wrote:
>>
>>> When 2.19 will run then it will create mmaped head which will improve
>>> that.
>>>
>>> I agree that starting 2.19 with a 2.18 wal won't make a change.
>>>
>>> Le mer. 1 juil. 2020 à 19:55, Viktor Radnai <[email protected]> a
>>> écrit :
>>>
>>>> Hi again Ben,
>>>>
>>>> Unfortunately upgrading to 2.19.2 does not solve the startup issue.
>>>> Prometheus gets OOMKilled before even starting to parse the last 25
>>>> segments which represent the last 50 minutes worth of data. Based on this
>>>> the estimated memory requirement should be somewhere between 60-70GB but
>>>> the iworker node only has 52GB. The other Prometheus pod currently consumes
>>>> 7.7GB.
>>>>
>>>> The left of the graph is 2.18.1, the right is 2.19.2. I inadvertently
>>>> reinstated a previously set 40GB memory limit and updated the replicaset to
>>>> increase it back to 50GB -- this is the reason for the second Prometheus
>>>> restart and the slightly higher plateau for the last two OOMs.
>>>>
>>>> Unless there is a way to move some WAL segments out and the restore
>>>> them later, I'll try to delete the last 50 minutes worth of segments to get
>>>> the pod to come up.
>>>>
>>>> Thanks,
>>>> Vik
>>>>
>>>> On Wed, 1 Jul 2020 at 16:39, Viktor Radnai <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Ben,
>>>>>
>>>>> We are running 2.18.1 -- I will upgrade to 2.19.2 and see if this
>>>>> solves the problem. I currently have one of the two replicas in production
>>>>> crashlooping so I'll try to roll this out in the next few hours and report
>>>>> back.
>>>>>
>>>>> Thanks,
>>>>> Vik
>>>>>
>>>>> On Wed, 1 Jul 2020 at 16:32, Ben Kochie <[email protected]> wrote:
>>>>>
>>>>>> What version of Prometheus do you have deployed? We've made several
>>>>>> major improvements to WAL handling and startup in the last couple of
>>>>>> releases.
>>>>>>
>>>>>> I would recommend upgrading to 2.19.2 if you haven't.
>>>>>>
>>>>>> On Wed, Jul 1, 2020 at 5:06 PM Viktor Radnai <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We have a recurring problem with Prometheus repeatedly getting
>>>>>>> OOMKilled on startup while trying to process the write ahead log. I 
>>>>>>> tried
>>>>>>> to look through Github issues but there was no solution or currently 
>>>>>>> open
>>>>>>> issue as far as I could see.
>>>>>>>
>>>>>>> We are running on Kubernetes in GKE using the prometheus-operator
>>>>>>> Helm chart, using Google Cloud's Preemptible VMs. These VMs get killed
>>>>>>> every 24 hours maximum, so our Prometheus pods also get killed and
>>>>>>> automatically migrated by Kubernetes (the data is on a persistent 
>>>>>>> volume of
>>>>>>> course). To avoid loss of metrics, we run two identically configured
>>>>>>> replicas with their own storage, scraping all the same targets.
>>>>>>>
>>>>>>> We monitor numerous GCE VMs that do batch processing, running
>>>>>>> anywhere between a few minutes to several hours. This workload is 
>>>>>>> bursty,
>>>>>>> fluctuating between tens and hundreds of VMs active at any time, so
>>>>>>> sometimes the Prometheus wal folder grows to  between 10-15GB in size.
>>>>>>> Prometheus usually handles this workload with about half a CPU core and 
>>>>>>> 8GB
>>>>>>> of RAM and if left to its own devices, the wal folder will shrink again
>>>>>>> when the load decreases.
>>>>>>>
>>>>>>> The problem is that when there is a backlog and Prometheus is
>>>>>>> restarted (due to the preemptive VM going away), it will use several 
>>>>>>> times
>>>>>>> more RAM to recover the wal folder. This often exhausts all the 
>>>>>>> available
>>>>>>> memory on the Kubernetes worker, so Prometheus is killed by the OOM 
>>>>>>> killed
>>>>>>> over and over again, until I log in and delete the wal folder, losing
>>>>>>> several hours of metrics. I have already doubled the size of the VMs 
>>>>>>> just
>>>>>>> to accommodate Prometheus and I am reluctant to do this again. Running
>>>>>>> non-preemptive VMs would triple the cost of these instances and 
>>>>>>> Prometheus
>>>>>>> might still get restarted when we roll out an update -- so this would
>>>>>>> probably not even solve the issue properly.
>>>>>>>
>>>>>>> I don't know if there is something special in our use case, but I
>>>>>>> did come across a blog describing the same high memory usage behaviour 
>>>>>>> on
>>>>>>> startup.
>>>>>>>
>>>>>>> I feel that unless there is a fix I can do, this would warrant
>>>>>>> either a bug or feature request -- Prometheus should be able to recover
>>>>>>> without operator intervention or losing metrics. And for a process 
>>>>>>> running
>>>>>>> on Kubernetes, we should be able to set memory "request" and "limit" 
>>>>>>> values
>>>>>>> that are close to actual expected usage, rather than 3-4 times the 
>>>>>>> steady
>>>>>>> state usage just to accommodate the memory requirements of the startup
>>>>>>> phase.
>>>>>>>
>>>>>>> Please let me know what information I should provide, if any. I have
>>>>>>> some graph screenshots that would be relevant.
>>>>>>>
>>>>>>> Many thanks,
>>>>>>> Vik
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Prometheus Users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/CANx-tGgY3vJ-dzyOjYMAu1dRvdsfO83Ux_Y0g7XAeKzPTmGWLQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> My other sig is hilarious
>>>>>
>>>>
>>>>
>>>> --
>>>> My other sig is hilarious
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Prometheus Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/prometheus-users/CANx-tGj6rBmimfUVGwuWD1%3D03fdvkCeYOote1huXBN2Kh2n08A%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/prometheus-users/CANx-tGj6rBmimfUVGwuWD1%3D03fdvkCeYOote1huXBN2Kh2n08A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>
>> --
>> My other sig is hilarious
>>
>
>
> --
> My other sig is hilarious
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmr%2BzHBQnjT%3Duw707fJr5F9bA3vM%3DJPMXQ9Y3GGwrdh9Kw%40mail.gmail.com.

Re: [prometheus-users] Prometheus wal folder and memory usage on startup

Reply via email to