Total number of time series scraped would be more important I think, so you 
also need to know how many targets you'll have.
I had Prometheus servers scraping 20-30M time series total and that was 
eating pretty much all memory on server with 256GB ram.
In general when doing capacity planning we expect 4KB of memory per time 
series for base Go memory, and then we need to double that for garbage 
collector (you can try to tweak GOGC env variable to trade some cpu for 
less gc memory overhead).
With 25M time series 4KB per series means 100GB of Go allocations, and 
200GB to account for garbage collector, which usually fits 256GB.
But we do run a huge number of services, so Prometheus will scrape lots of 
targets and get a small number of metrics from each.
You want to scrape 2M from a single target and that means Prometheus will 
have to request, read and parse a huge response body, this might require 
more peak memory and it might be slow, so your scrape interval would have 
to allow for that.
Another thing to remember is churn - if your time series have labels that 
keep changing all the time then you might run out of memory, since 
everything that prometheus scrapes (even only once) ends up in memory until 
it persists data to disk, which is by default every 2h AFAIR. If the list 
of values of your APN is not a fixed set and you keep seeing random values 
over time, then that will accumulate in memory, so your capacity planning 
would have to take into account how many unique values of APN (and other 
values) are there and if this is going to grow over time. That's assuming 
you want to stick with a single prometheus instance, if you can shard your 
scrapes then you can scale horizontally.

It's always hard to give a concrete answer to question like this since it 
all depends, but it's usually a matter of having enough memory, cpu is 
typically (in my environment at least) less important.
On Tuesday, 14 June 2022 at 12:13:24 UTC+1 [email protected] wrote:

> I have a use case where a particular service (that can be horizontally 
> scaled to desired replica count) exposes a 2 Million time series. 
> Prometheus might expect huge resources to scrape such service (this is 
> normal). But I'm not sure if there is a recommendation from the community 
> on instrumentation best practices and maximum count to expose.
>
> Thanks,
> Teja
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com.

Reply via email to