One place where time series of this magnitude from a single target are unfortunately common is kube-state-metrics <https://github.com/kubernetes/kube-state-metrics> (KSM). On a large cluster, I see almost 1M metrics. Those are relatively cheap because they are nearly constant and compress well, but I believe there was quite some work in that project to make scraping work well from the target side. This includes playing with compression - depending on your network it may be faster to stream uncompressed, than to compress and uncompress.
In summary, 2M time series from a single target is unusual but not without precedent. Look at KSM for issues that they encountered and possible solutions. /MR On Tue, Jun 14, 2022 at 2:44 PM [email protected] <[email protected]> wrote: > Total number of time series scraped would be more important I think, so > you also need to know how many targets you'll have. > I had Prometheus servers scraping 20-30M time series total and that was > eating pretty much all memory on server with 256GB ram. > In general when doing capacity planning we expect 4KB of memory per time > series for base Go memory, and then we need to double that for garbage > collector (you can try to tweak GOGC env variable to trade some cpu for > less gc memory overhead). > With 25M time series 4KB per series means 100GB of Go allocations, and > 200GB to account for garbage collector, which usually fits 256GB. > But we do run a huge number of services, so Prometheus will scrape lots of > targets and get a small number of metrics from each. > You want to scrape 2M from a single target and that means Prometheus will > have to request, read and parse a huge response body, this might require > more peak memory and it might be slow, so your scrape interval would have > to allow for that. > Another thing to remember is churn - if your time series have labels that > keep changing all the time then you might run out of memory, since > everything that prometheus scrapes (even only once) ends up in memory until > it persists data to disk, which is by default every 2h AFAIR. If the list > of values of your APN is not a fixed set and you keep seeing random values > over time, then that will accumulate in memory, so your capacity planning > would have to take into account how many unique values of APN (and other > values) are there and if this is going to grow over time. That's assuming > you want to stick with a single prometheus instance, if you can shard your > scrapes then you can scale horizontally. > > It's always hard to give a concrete answer to question like this since it > all depends, but it's usually a matter of having enough memory, cpu is > typically (in my environment at least) less important. > On Tuesday, 14 June 2022 at 12:13:24 UTC+1 [email protected] wrote: > >> I have a use case where a particular service (that can be horizontally >> scaled to desired replica count) exposes a 2 Million time series. >> Prometheus might expect huge resources to scrape such service (this is >> normal). But I'm not sure if there is a recommendation from the community >> on instrumentation best practices and maximum count to expose. >> >> Thanks, >> Teja >> > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gbZcZJGcp9EXRuL0YqYqzyDKh11R54U88C7f-yH8pz9LQ%40mail.gmail.com.

