> I am planning to store 3 years of data from 300 server in a single 
> prometheus server. The data will primarily consist of default exporter 
> metrics and the server has 500G memory and 80 cores.

We currently scrape metrics from 908 different sources (from
'count(up)'), 153 of which are the Prometheus (Unix/Linux) host agent on
servers here (the rest are a combination of additional agents and
Blackbox checks). We're currently running at a typical ingestion rate of
73,000 samples a second (some of those additional agents generate a lot
of sample points due to copious histograms) and have around 1.4 million
active series (taken from 'promtheus_tsdb_head_series'). Our current
retention goes back to November of 2018, when we took our Prometheus
setup into production.

We're doing all of this on a 1U server with a six-core Xeon E-2226G CPU,
64 GB of RAM, and a mirrored pair of 20 TB HDDs. The server is not
particularly busy; it runs about 4% CPU utilization and under
1Mbytes/sec of both network traffic and disk writes. Querying (in
Prometheus) for the three year average node_load15 across all of the
servers temporarily briefly took the system to 12% CPU usage and almost
80% disk utilization to read data (and it appears negligible additional
memory usage); this will vary with the query.

If you want to make very long historical queries, you will need to
increase various internal safety limits in Prometheus (and possibly also
query time limits in Grafana), but the server you're describing should
be more than able to handle this.

        - cks

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1215884.1708458389%40apps0.cs.toronto.edu.

Reply via email to