[prometheus-users] Re: Common tag to metrics

Brian Candler Fri, 21 Jan 2022 01:56:10 -0800

What matters is the number of timeseries, not the number of hosts.  Try 
scraping a single host, and then see how many timeseries you get.  e.g. use 
this promql query:

count({__name__=~".+",instance="xxxxxxx"})

where xxxxxxx is the instance label that you scraped; it may be ip:port.  
Then you'll get an idea how many timeseries that telegraf exposes for a 
single host.  (I don't use telegraf.  Personally I'd suggest node_exporter 
instead).

You can also get useful information by going to the Prometheus web 
interface and going to Status > TSDB Status.  The "Number of series" in 
"Head stats" is what you're looking for.  Try this with say 1 target, then 
101 targets, and see how much it increases.

Once you know roughly how many timeseries you expect to collect, then 
there's a memory estimator here:
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

Suppose you're generating 1000 metrics per host.  Then 2000 hosts would be 
2 million timeseries, which is fairly high - this is the point at which you 
typically start thinking about splitting into multiple prometheus servers, 
each scraping a subset of targets.

16GB may be able to handle this, but as you can see from the estimator, 
it's also quite sensitive to the number of labels per timeseries and the 
number of unique values per label.  Make sure you're running a recent 
version of prometheus; newer versions are more RAM-efficient.  And make 
sure you're using a block filesystem for storage (e.g. local disk or EBS), 
not a shared filesystem (definitely *not* NFS or SMB).

Also beware: if telegraf is configured to do something stupid, like expose 
a label with high cardinality which keeps changing, then it can cause the 
number of active timeseries to explode.  It's up to you to manage this 
risk.  node_exporter is a safer bet in my opinion, but that's only because 
I haven't used telegraf with prometheus.

On Friday, 21 January 2022 at 08:44:23 UTC [email protected] wrote:

> Hello Team,
>
> I have want to monitoring 10000 servers metrics via telegraf. Prometheus 
> running as a stand alone not in docker and kubernet.  So what is idea size 
> of host for running Prometheus without any failure.  
>
> Currently I have 2 core cpu and 16gb memory and 100GB disk. But if I add 
> 2000 host as a target. Memory utilization goes very high and then 
> Prometheus service went down. 
>
> Please help me to setup Prometheus.
>
> Thanks and Regards
> Ritesh patel 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8c0e06da-806a-4a35-bab1-1d1dcce77b50n%40googlegroups.com.

[prometheus-users] Re: Common tag to metrics

Reply via email to