Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

Ben Kochie Wed, 01 Mar 2023 22:11:45 -0800

On Thu, Mar 2, 2023 at 4:57 AM Christoph Anton Mitterer <cales...@gmail.com>
wrote:


> On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote:
> >
> > Debian release cycles are too slow for the pace of Prometheus
> > development.
>
> It's rather simple to pull the version from Debian unstable, if on
> needs so, and that seems pretty current.
>
>
> > You'd be better off running Prometheus using podman, or deploying
> > official binaries with Ansible[0].
>
> Well I guess view on how software should be distributed differ.
>
> The "traditional" system of having distributions has many advantages
> and is IMO a core reason for the success of Linux and OpenSource.
>
> All "modern" alternatives like flatpaks, snaps, and similar repos are
> IMO especially security wise completely inadequate (especially the fact
> that there is no trusted intermediate (like the distribution) which
> does some basic maintenance.
>

And I didn't say to use those. I said to use our official OCI container
image or release binaries.


>
> It's anyway not possible here because of security policy reasons.
>

That allows you to pull from unstable? :confused-pikachu:


>
>
> >
> > No, but It depends on your queries. Without seeing what you're
> > graphing there's no way to tell. Your queries could be complex or
> > inefficient. Kinda like writing slow SQL queries.
>
> As mentioned already in the other thread, so far I merely do only what:
> https://grafana.com/grafana/dashboards/1860-node-exporter-full/
> does.
>
>
> > There are ways to speed up graphs for specific things, for example
> > you can use recording rules to pre-render parts of the queries.
> >
> > For example, if you want to graph node CPU utilization you can have a
> > recording rule like this:
> >
> > groups:
> >   - name: node_exporter
> >     interval: 60s
> >     rules:
> >       - record: instance:node_cpu_utilization:ratio_rate1m
> >         expr: >
> >           avg without (cpu) (
> >             sum without (mode) (
> >
> > rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"
> > }[1m])
> >             )
> >           )
> >
> > This will give you a single metric per node that will be faster to
> > render over longer periods of time. It also effectively down-samples
> > by only recording one point per minute.or
>
> But will dashbords like Node Exporter Full automatically use such?
> And if so... will they (or rather Prometheus) use the real time series
> (with full resolution) when needed?
>

Nope. That dashboard is meant to be generic, not efficient. It's a nice
demo, but not something I use or recommend other than to get ideas.


>
> If so, then the idea would be to create such a rule for every metric
> I'm interested in and that is slow, right?
>
>
>
> > Also "Medium sized VM" doesn't give us any indication of how much CPU
> > or memory you have. Prometheus uses page cache for database access.
> > So maybe your system is lacking enough memory to effectively cache
> > the data you're accessing.
>
> Right now it's 2 (virtual CPUs) with 4.5 GB RAM... I'd guess it might
> need more CPU?
>

Maybe not CPU right now. What do the metrics say? ;-)


>
> Previously I suspected IO to be the reason, and while in fact IO is
> slow (the backend seems to deliver only ~100MB/s)... there seems to be
> nearly no IO at all while waiting for the "slow graph" (which is Node
> Export Full's "CPU Basic" panel), e.g. when selecting the last 30 days.
>
> Kinda surprising... does Prometheus read it's TSDB really that
> efficiently?
>

Without seeing more of what's going on in your system, it's hard to say.
You have adequate CPU and memory for 40 nodes. You'll probably want about
2x what you have for 300 nodes.

>From what I can tell so far, downsampling isn't going to fix your
performance problem. Something else is going on.


>
>
> Could it be a problem, when the Grafana runs on another VM? Though
> there didn't seem to be any network bottleneck... and I guess Grafana
> just always accesses Prometheus via TCP, so there should be no further
> positive caching effect when both run on the same node?
>

No, not likely a problem. I have seen much larger installs running without
problem.


>
>
> > No, we've talked about having variable retention times, but nobody
> > has implemented this. It's possible to script this via the DELETE
> > endpoint[1]. It would be easy enough to write a cron job that deletes
> > specific metrics older than X, but I haven't seen this packaged into
> > a simple tool. I would love to see something like this created.
> >
> > [1]:
> > https://prometheus.io/docs/prometheus/latest/querying/api/#delete-
> > series
>
> Does it make sense to open a feature request ticket for that?
>
>
There already are tons of issues about this. The problem is nobody wants to
write the code and maintain it. Prometheus is an open source project, not a
company.


> I mean it would solve at least my storage "issue" (well it's not really
> a showstopper... as it was mentioned one could simply by a big check
> HDD/SSD).
>

I mean, the kind of space you're talking about isn't expensive. My laptop
has 2T of NVMe storage and my homelab server has 50TiB of N+2 redundant
storage.

Again, downsampling isn't going to solve your problems. The actual samples
are not really the bottleneck in the size of setup you're talking about.
Mostly it's series index reads that tend to slow things down.

Say you want to read the full CPU history for a year for a 2 CPU server.
Scanning that is going to require loading the series indexes and samples
should really only read a few megabytes of data from disk.

I think the main issue you're running into is that node exporter full
dashboard. I haven't looked at that one in a while, but it's very poorly
written. For example I just looked at this panel: CPU Busy

It has one of the worst queries I've seen in a long time for how to compute
CPU utilization.

(sum by(instance)
(irate(node_cpu_seconds_total{instance="$node",job="$job",
mode!="idle"}[$__rate_interval]))
 /
 on(instance) group_left sum by
(instance)((irate(node_cpu_seconds_total{instance="$node",job="$job"}[$__rate_interval]))))
* 100

* It uses irate(), which is not what you want for a graph of utilization
over time.
* It scans every CPU and mode twice minus idle.

No wonder you are having performance issues.

Replacing that panel query with something like this would make it far more
efficient:

avg without (cpu,mode) (

1-rate(node_cpu_seconds_total{instance="$node",job="$job",mode="idle"}[$__rate_interval])
) * 100

This would cut the number of series touched by over 90%.


> And could via the same way be something made that downsamples data for
> times longer ago?
>

Downsampling is not your problem. Sorry, Prometheus is not RRD, the
problems you are running into are unrelated. You're optimizing for a
problem that basically doesn't exist in Prometheus.


>
>
> Both together would really give quite some flexibility.
>
> For metrics where old data is "boring" one could just delete
> everything older than e.g. 2 weeks, while keeping full details for that
> time.
>
> For metrics where one is interested in larger time ranges, but where
> sample resolution doesn't matter so much, one could downsample it...
> like everything older then 2 weeks... then even more for everything
> older than 6 months, then even more for everything older than 1 year...
> and so on.
>
> For few metrics where full resolution data is interesting over a really
> long time span, one could just keep it.
>
>
>
> > > Seem at least quite big to me... that would - assuming all days can
> > > be
> > > compressed roughly to that (which isn't sure of course) - mean for
> > > one
> > > year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node
> > > (just for the data for node exporter with a 15s interval).
> >
> > Without seeing a full meta.json and the size of the files in one dir,
> > it's hard to say exactly if this is good or bad. It depends a bit on
> > how many series/samples are in each block. Just guessing, it seems
> > like you have about 2000 metrics per node.
>
> Yes... so far each node just runs node-exporter, and that seems to
> have:
> $ curl localhost:9100/metrics 2>/dev/null | grep -v ^# | wc -l
> 2144
>
> … metrics in the version I'm using of it.
>
>
> > Seems reasonable, we're only talking about 2TiB per year for all 300
> > of your servers. Seems perfectly reasonable to me.
>
> Okay... good... I just wasn't sure whether that's "normal"... but I
> guess I can live with it quite well.
>
> >
>
>
> Thanks for your help :-)
>
> Chris.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoA9BUx2M%2BmcxVw75gWE2_0HpHD_7PPj2ZzTgw%3DXtTa9g%40mail.gmail.com.

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

Reply via email to