[prometheus-users] Re: Prometheus using a large amount of memory when managing storage.

2021-10-21 Thread Chad Sesvold
I was just tweaking the TSDB block durations to match prod. I was reading that we might want to reduce the TSDB block durations to help free up memory. We are seeing OOM in the system logs. I am watching memory using the following command. watch "ps ax -o pcpu,rss,ppid,pid,stime,args |

Re: [prometheus-users] Re: Prometheus using a large amount of memory when managing storage.

2021-10-21 Thread Chad Sesvold
So I have been doing a little more testing. I did find that we had some software installed on the non prod boxes that was causing some issues. We were scaling metrics every 20 seconds. My guess is that the software was slowing down prometheus writes. I am guessing that I had a race

Re: [prometheus-users] Re: Prometheus using a large amount of memory when managing storage.

2021-10-16 Thread Ben Kochie
On Fri, Oct 15, 2021 at 9:51 PM Chad Sesvold wrote: > At this point I am the only one running queries. When I have no target > defined the memory seems to be flat. > > When I changed the follow in non-pro it seemed to stabilize the memory > usage. > > --storage.tsdb.max-block-duration 15d >

[prometheus-users] Re: Prometheus using a large amount of memory when managing storage.

2021-10-16 Thread Brian Candler
Is there a specific reason why you're tweaking the TSDB block durations? That is, did you observe some problem with the defaults? Otherwise I'd suggest you just run with defaults. In any case, if the problem you're debugging is discrepancies between prod and non-prod, you should be running

[prometheus-users] Re: Prometheus using a large amount of memory when managing storage.

2021-10-15 Thread Chad Sesvold
At this point I am the only one running queries. When I have no target defined the memory seems to be flat. When I changed the follow in non-pro it seemed to stabilize the memory usage. --storage.tsdb.max-block-duration 15d --storage.tsdb.min-block-duration 1h I will try copying the binaries

[prometheus-users] Re: Prometheus using a large amount of memory when managing storage.

2021-10-15 Thread Brian Candler
Look at Status > TSDB Status from the web interface of both systems. In particular, what does the first entry ("Head Stats") show for each system? Do you have any idea of series churn, i.e. how many new series are being created and deleted per hour? (Although if you're scraping a subset of