> I am planning to store 3 years of data from 300 server in a single > prometheus server. The data will primarily consist of default exporter > metrics and the server has 500G memory and 80 cores.
We currently scrape metrics from 908 different sources (from 'count(up)'), 153 of which are the Prometheus (Unix/Linux) host agent on servers here (the rest are a combination of additional agents and Blackbox checks). We're currently running at a typical ingestion rate of 73,000 samples a second (some of those additional agents generate a lot of sample points due to copious histograms) and have around 1.4 million active series (taken from 'promtheus_tsdb_head_series'). Our current retention goes back to November of 2018, when we took our Prometheus setup into production. We're doing all of this on a 1U server with a six-core Xeon E-2226G CPU, 64 GB of RAM, and a mirrored pair of 20 TB HDDs. The server is not particularly busy; it runs about 4% CPU utilization and under 1Mbytes/sec of both network traffic and disk writes. Querying (in Prometheus) for the three year average node_load15 across all of the servers temporarily briefly took the system to 12% CPU usage and almost 80% disk utilization to read data (and it appears negligible additional memory usage); this will vary with the query. If you want to make very long historical queries, you will need to increase various internal safety limits in Prometheus (and possibly also query time limits in Grafana), but the server you're describing should be more than able to handle this. - cks -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1215884.1708458389%40apps0.cs.toronto.edu.