Total number of time series scraped would be more important I think, so you also need to know how many targets you'll have. I had Prometheus servers scraping 20-30M time series total and that was eating pretty much all memory on server with 256GB ram. In general when doing capacity planning we expect 4KB of memory per time series for base Go memory, and then we need to double that for garbage collector (you can try to tweak GOGC env variable to trade some cpu for less gc memory overhead). With 25M time series 4KB per series means 100GB of Go allocations, and 200GB to account for garbage collector, which usually fits 256GB. But we do run a huge number of services, so Prometheus will scrape lots of targets and get a small number of metrics from each. You want to scrape 2M from a single target and that means Prometheus will have to request, read and parse a huge response body, this might require more peak memory and it might be slow, so your scrape interval would have to allow for that. Another thing to remember is churn - if your time series have labels that keep changing all the time then you might run out of memory, since everything that prometheus scrapes (even only once) ends up in memory until it persists data to disk, which is by default every 2h AFAIR. If the list of values of your APN is not a fixed set and you keep seeing random values over time, then that will accumulate in memory, so your capacity planning would have to take into account how many unique values of APN (and other values) are there and if this is going to grow over time. That's assuming you want to stick with a single prometheus instance, if you can shard your scrapes then you can scale horizontally.
It's always hard to give a concrete answer to question like this since it all depends, but it's usually a matter of having enough memory, cpu is typically (in my environment at least) less important. On Tuesday, 14 June 2022 at 12:13:24 UTC+1 [email protected] wrote: > I have a use case where a particular service (that can be horizontally > scaled to desired replica count) exposes a 2 Million time series. > Prometheus might expect huge resources to scrape such service (this is > normal). But I'm not sure if there is a recommendation from the community > on instrumentation best practices and maximum count to expose. > > Thanks, > Teja > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com.

