I was just tweaking the TSDB block durations to match prod. I was reading
that we might want to reduce the TSDB block durations to help free up
memory.
We are seeing OOM in the system logs. I am watching memory using the
following command.
watch "ps ax -o pcpu,rss,ppid,pid,stime,args |
So I have been doing a little more testing. I did find that we had some
software installed on the non prod boxes that was causing some issues. We
were scaling metrics every 20 seconds. My guess is that the software was
slowing down prometheus writes. I am guessing that I had a race
On Fri, Oct 15, 2021 at 9:51 PM Chad Sesvold wrote:
> At this point I am the only one running queries. When I have no target
> defined the memory seems to be flat.
>
> When I changed the follow in non-pro it seemed to stabilize the memory
> usage.
>
> --storage.tsdb.max-block-duration 15d
>
Is there a specific reason why you're tweaking the TSDB block durations?
That is, did you observe some problem with the defaults? Otherwise I'd
suggest you just run with defaults.
In any case, if the problem you're debugging is discrepancies between prod
and non-prod, you should be running
At this point I am the only one running queries. When I have no target
defined the memory seems to be flat.
When I changed the follow in non-pro it seemed to stabilize the memory
usage.
--storage.tsdb.max-block-duration 15d
--storage.tsdb.min-block-duration 1h
I will try copying the binaries
Look at Status > TSDB Status from the web interface of both systems. In
particular, what does the first entry ("Head Stats") show for each system?
Do you have any idea of series churn, i.e. how many new series are being
created and deleted per hour? (Although if you're scraping a subset of
6 matches
Mail list logo