Maybe you are just collecting a lot of metrics in a single prometheus 
instance.  There's a tool which will give you an estimate of RAM usage here:
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

For disk space, I'd start with an estimate of 1.7 bytes per metric sample - 
so that usage depends on your scrape interval.  You say it's growing at 
about 900MB/hour; if you were using a 15-second scrape interval that 
implies about 2.2m metrics, which is quite high to be putting into one 
prometheus instance (the recommended maximum is 2 million).

So the first thing to check is how many metrics you're *actually* 
collecting, and also whether you have a high churn rate in time series 
(i.e. lots of pods starting and stopping).  You can get this info from the 
prometheus GUI under "status > runtime & build info".  Look especially at 
"Head Stats".

Your 30GB RAM usage suggests high series churn.  Beware that if you are 
monitoring pod-level metrics, every pod is unique, so will generate its own 
set of timeseries.  If you have 10 pods destroyed and created per minute, 
and each pod generates 10K metrics, that's 6 million new time series every 
hour.  At any instant not all of these will be active, but the "head" chunk 
typically carries the last 2 hours' worth of timeseries.  The solution is 
not to churn pods so much, or else filter the data collection so you're 
collecting much less pod-level data.

If you are sure that the number of series you're collecting is much lower 
than 2m, then there may be a problem.  Please report the stats, the *exact* 
version of prometheus you're running, and also show any logs generated by 
prometheus itself.

If you are in fact collecting millions of timeseries (and wish to keep them 
all rather than dropping some), then as I said before this is more than is 
recommended for a single prometheus instance.  If you have 5 clusters then 
it sounds like you'd be better with a separate prometheus per cluster, 
especially as they are in separate AWS accounts.  You can still have a 
single Grafana instance, which either queries them individually, or uses 
something like promxy to combine them, or use federation to collect a 
subset of metrics into a separate prometheus for a global view, or you can 
look at higher-performance add-ons like Thanos.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d959316f-0c59-4b7d-b76e-237c6dcb85c0o%40googlegroups.com.

Reply via email to