[prometheus-users] Re: Prometheus Evicted state

Brian Candler Fri, 03 Jul 2020 00:35:06 -0700

Maybe you are just collecting a lot of metrics in a single prometheus 
instance.  There's a tool which will give you an estimate of RAM usage here:
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

For disk space, I'd start with an estimate of 1.7 bytes per metric sample -
so that usage depends on your scrape interval. You say it's growing at
about 900MB/hour; if you were using a 15-second scrape interval that
implies about 2.2m metrics, which is quite high to be putting into one
prometheus instance (the recommended maximum is 2 million).

So the first thing to check is how many metrics you're *actually*
collecting, and also whether you have a high churn rate in time series
(i.e. lots of pods starting and stopping). You can get this info from the
prometheus GUI under "status > runtime & build info". Look especially at
"Head Stats".

Your 30GB RAM usage suggests high series churn. Beware that if you are
monitoring pod-level metrics, every pod is unique, so will generate its own
set of timeseries. If you have 10 pods destroyed and created per minute,
and each pod generates 10K metrics, that's 6 million new time series every
hour. At any instant not all of these will be active, but the "head" chunk
typically carries the last 2 hours' worth of timeseries. The solution is
not to churn pods so much, or else filter the data collection so you're
collecting much less pod-level data.

If you are sure that the number of series you're collecting is much lower
than 2m, then there may be a problem. Please report the stats, the *exact*
version of prometheus you're running, and also show any logs generated by
prometheus itself.

If you are in fact collecting millions of timeseries (and wish to keep them
all rather than dropping some), then as I said before this is more than is
recommended for a single prometheus instance. If you have 5 clusters then
it sounds like you'd be better with a separate prometheus per cluster,
especially as they are in separate AWS accounts. You can still have a
single Grafana instance, which either queries them individually, or uses
something like promxy to combine them, or use federation to collect a
subset of metrics into a separate prometheus for a global view, or you can
look at higher-performance add-ons like Thanos.

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/d959316f-0c59-4b7d-b76e-237c6dcb85c0o%40googlegroups.com.

[prometheus-users] Re: Prometheus Evicted state

Reply via email to