On Thu, Mar 2, 2023 at 4:57 AM Christoph Anton Mitterer <cales...@gmail.com> wrote:
> On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote: > > > > Debian release cycles are too slow for the pace of Prometheus > > development. > > It's rather simple to pull the version from Debian unstable, if on > needs so, and that seems pretty current. > > > > You'd be better off running Prometheus using podman, or deploying > > official binaries with Ansible[0]. > > Well I guess view on how software should be distributed differ. > > The "traditional" system of having distributions has many advantages > and is IMO a core reason for the success of Linux and OpenSource. > > All "modern" alternatives like flatpaks, snaps, and similar repos are > IMO especially security wise completely inadequate (especially the fact > that there is no trusted intermediate (like the distribution) which > does some basic maintenance. > And I didn't say to use those. I said to use our official OCI container image or release binaries. > > It's anyway not possible here because of security policy reasons. > That allows you to pull from unstable? :confused-pikachu: > > > > > > No, but It depends on your queries. Without seeing what you're > > graphing there's no way to tell. Your queries could be complex or > > inefficient. Kinda like writing slow SQL queries. > > As mentioned already in the other thread, so far I merely do only what: > https://grafana.com/grafana/dashboards/1860-node-exporter-full/ > does. > > > > There are ways to speed up graphs for specific things, for example > > you can use recording rules to pre-render parts of the queries. > > > > For example, if you want to graph node CPU utilization you can have a > > recording rule like this: > > > > groups: > > - name: node_exporter > > interval: 60s > > rules: > > - record: instance:node_cpu_utilization:ratio_rate1m > > expr: > > > avg without (cpu) ( > > sum without (mode) ( > > > > rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal" > > }[1m]) > > ) > > ) > > > > This will give you a single metric per node that will be faster to > > render over longer periods of time. It also effectively down-samples > > by only recording one point per minute.or > > But will dashbords like Node Exporter Full automatically use such? > And if so... will they (or rather Prometheus) use the real time series > (with full resolution) when needed? > Nope. That dashboard is meant to be generic, not efficient. It's a nice demo, but not something I use or recommend other than to get ideas. > > If so, then the idea would be to create such a rule for every metric > I'm interested in and that is slow, right? > > > > > Also "Medium sized VM" doesn't give us any indication of how much CPU > > or memory you have. Prometheus uses page cache for database access. > > So maybe your system is lacking enough memory to effectively cache > > the data you're accessing. > > Right now it's 2 (virtual CPUs) with 4.5 GB RAM... I'd guess it might > need more CPU? > Maybe not CPU right now. What do the metrics say? ;-) > > Previously I suspected IO to be the reason, and while in fact IO is > slow (the backend seems to deliver only ~100MB/s)... there seems to be > nearly no IO at all while waiting for the "slow graph" (which is Node > Export Full's "CPU Basic" panel), e.g. when selecting the last 30 days. > > Kinda surprising... does Prometheus read it's TSDB really that > efficiently? > Without seeing more of what's going on in your system, it's hard to say. You have adequate CPU and memory for 40 nodes. You'll probably want about 2x what you have for 300 nodes. >From what I can tell so far, downsampling isn't going to fix your performance problem. Something else is going on. > > > Could it be a problem, when the Grafana runs on another VM? Though > there didn't seem to be any network bottleneck... and I guess Grafana > just always accesses Prometheus via TCP, so there should be no further > positive caching effect when both run on the same node? > No, not likely a problem. I have seen much larger installs running without problem. > > > > No, we've talked about having variable retention times, but nobody > > has implemented this. It's possible to script this via the DELETE > > endpoint[1]. It would be easy enough to write a cron job that deletes > > specific metrics older than X, but I haven't seen this packaged into > > a simple tool. I would love to see something like this created. > > > > [1]: > > https://prometheus.io/docs/prometheus/latest/querying/api/#delete- > > series > > Does it make sense to open a feature request ticket for that? > > There already are tons of issues about this. The problem is nobody wants to write the code and maintain it. Prometheus is an open source project, not a company. > I mean it would solve at least my storage "issue" (well it's not really > a showstopper... as it was mentioned one could simply by a big check > HDD/SSD). > I mean, the kind of space you're talking about isn't expensive. My laptop has 2T of NVMe storage and my homelab server has 50TiB of N+2 redundant storage. Again, downsampling isn't going to solve your problems. The actual samples are not really the bottleneck in the size of setup you're talking about. Mostly it's series index reads that tend to slow things down. Say you want to read the full CPU history for a year for a 2 CPU server. Scanning that is going to require loading the series indexes and samples should really only read a few megabytes of data from disk. I think the main issue you're running into is that node exporter full dashboard. I haven't looked at that one in a while, but it's very poorly written. For example I just looked at this panel: CPU Busy It has one of the worst queries I've seen in a long time for how to compute CPU utilization. (sum by(instance) (irate(node_cpu_seconds_total{instance="$node",job="$job", mode!="idle"}[$__rate_interval])) / on(instance) group_left sum by (instance)((irate(node_cpu_seconds_total{instance="$node",job="$job"}[$__rate_interval])))) * 100 * It uses irate(), which is not what you want for a graph of utilization over time. * It scans every CPU and mode twice minus idle. No wonder you are having performance issues. Replacing that panel query with something like this would make it far more efficient: avg without (cpu,mode) ( 1-rate(node_cpu_seconds_total{instance="$node",job="$job",mode="idle"}[$__rate_interval]) ) * 100 This would cut the number of series touched by over 90%. > And could via the same way be something made that downsamples data for > times longer ago? > Downsampling is not your problem. Sorry, Prometheus is not RRD, the problems you are running into are unrelated. You're optimizing for a problem that basically doesn't exist in Prometheus. > > > Both together would really give quite some flexibility. > > For metrics where old data is "boring" one could just delete > everything older than e.g. 2 weeks, while keeping full details for that > time. > > For metrics where one is interested in larger time ranges, but where > sample resolution doesn't matter so much, one could downsample it... > like everything older then 2 weeks... then even more for everything > older than 6 months, then even more for everything older than 1 year... > and so on. > > For few metrics where full resolution data is interesting over a really > long time span, one could just keep it. > > > > > > Seem at least quite big to me... that would - assuming all days can > > > be > > > compressed roughly to that (which isn't sure of course) - mean for > > > one > > > year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node > > > (just for the data for node exporter with a 15s interval). > > > > Without seeing a full meta.json and the size of the files in one dir, > > it's hard to say exactly if this is good or bad. It depends a bit on > > how many series/samples are in each block. Just guessing, it seems > > like you have about 2000 metrics per node. > > Yes... so far each node just runs node-exporter, and that seems to > have: > $ curl localhost:9100/metrics 2>/dev/null | grep -v ^# | wc -l > 2144 > > … metrics in the version I'm using of it. > > > > Seems reasonable, we're only talking about 2TiB per year for all 300 > > of your servers. Seems perfectly reasonable to me. > > Okay... good... I just wasn't sure whether that's "normal"... but I > guess I can live with it quite well. > > > > > > Thanks for your help :-) > > Chris. > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmoA9BUx2M%2BmcxVw75gWE2_0HpHD_7PPj2ZzTgw%3DXtTa9g%40mail.gmail.com.