Re: [prometheus-users] Re: Does Prometheus recommend exposing 2M timeseries per scrape endpoint?

Matthias Rampke Sat, 18 Jun 2022 03:46:48 -0700

One place where time series of this magnitude from a single target are
unfortunately common is kube-state-metrics
<https://github.com/kubernetes/kube-state-metrics> (KSM). On a large
cluster, I see almost 1M metrics. Those are relatively cheap because they
are nearly constant and compress well, but I believe there was quite some
work in that project to make scraping work well from the target side. This
includes playing with compression - depending on your network it may be
faster to stream uncompressed, than to compress and uncompress.


In summary, 2M time series from a single target is unusual but not without
precedent. Look at KSM for issues that they encountered and possible
solutions.

/MR

On Tue, Jun 14, 2022 at 2:44 PM [email protected] <[email protected]>
wrote:

> Total number of time series scraped would be more important I think, so
> you also need to know how many targets you'll have.
> I had Prometheus servers scraping 20-30M time series total and that was
> eating pretty much all memory on server with 256GB ram.
> In general when doing capacity planning we expect 4KB of memory per time
> series for base Go memory, and then we need to double that for garbage
> collector (you can try to tweak GOGC env variable to trade some cpu for
> less gc memory overhead).
> With 25M time series 4KB per series means 100GB of Go allocations, and
> 200GB to account for garbage collector, which usually fits 256GB.
> But we do run a huge number of services, so Prometheus will scrape lots of
> targets and get a small number of metrics from each.
> You want to scrape 2M from a single target and that means Prometheus will
> have to request, read and parse a huge response body, this might require
> more peak memory and it might be slow, so your scrape interval would have
> to allow for that.
> Another thing to remember is churn - if your time series have labels that
> keep changing all the time then you might run out of memory, since
> everything that prometheus scrapes (even only once) ends up in memory until
> it persists data to disk, which is by default every 2h AFAIR. If the list
> of values of your APN is not a fixed set and you keep seeing random values
> over time, then that will accumulate in memory, so your capacity planning
> would have to take into account how many unique values of APN (and other
> values) are there and if this is going to grow over time. That's assuming
> you want to stick with a single prometheus instance, if you can shard your
> scrapes then you can scale horizontally.
>
> It's always hard to give a concrete answer to question like this since it
> all depends, but it's usually a matter of having enough memory, cpu is
> typically (in my environment at least) less important.
> On Tuesday, 14 June 2022 at 12:13:24 UTC+1 [email protected] wrote:
>
>> I have a use case where a particular service (that can be horizontally
>> scaled to desired replica count) exposes a 2 Million time series.
>> Prometheus might expect huge resources to scrape such service (this is
>> normal). But I'm not sure if there is a recommendation from the community
>> on instrumentation best practices and maximum count to expose.
>>
>> Thanks,
>> Teja
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gbZcZJGcp9EXRuL0YqYqzyDKh11R54U88C7f-yH8pz9LQ%40mail.gmail.com.

Re: [prometheus-users] Re: Does Prometheus recommend exposing 2M timeseries per scrape endpoint?

Reply via email to