Re: [prometheus-users] Scaling Prometheus

[email protected] Mon, 12 Oct 2020 02:29:51 -0700

I found the formula from 
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion
 
to be pretty accurate for estimating memory usage. It doesn't cover 
querying but from my experience memory needed for queries is usually hiding 
in the extra "double for gc" multiplication.
Try running your numbers there and see if it reflects what you are seeing.


On Sunday, 11 October 2020 at 21:14:20 UTC+1 kvr wrote:

>
> There are different services and each could scale to 1000+ pods in a given 
> namespace. 
> But even then managing a Prometheus instance pair per set of apps is not 
> tenable. The management overhead would be too great when there are several 
> such apps.
>
> Version wise, we are keeping up, but not aggressively. 
> We are on 2.18.2 and the instance under test does not have Thanos. It only 
> scrapes and does some rule evaluation (the memory usage is the same even 
> when rule eval is disabled).
> We are using prometheus operator to reload config.
>
> Yeah, I read that ~2GB of memory is sufficient per million metrics, so I 
> am surprised that it consumes such a large amount.  Will having a diverse 
> scrape intervals have such an effect?
>
> Our stats at peak:
> ~15M head series
> ~45M head chunks
> ~475K samples/s ingested
> ~7000 pods scraped
>
> Thanks!
>
> On Sunday, October 11, 2020 at 12:38:27 PM UTC+5:30 [email protected] 
> wrote:
>
>> If all of the 1000s of pods in a namespace are of the same thing, you can 
>> use the hashmod feature to horizontally scale.
>>
>> You can have several Prometheus instances per namespace, each responsible 
>> for a fraction of the pods.
>>
>> Just to be sure, are you keeping up to date on the latest releases? 200G 
>> of memory seems like a lot for 15M series.
>>
>> Are you using Thanos or a remote write service?
>>
>> On Sun, Oct 11, 2020, 07:14 kvr <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> We are hitting some limits with our current setup of Prometheus. I have 
>>> read a lot of posts here as well as blogs and videos but still need some 
>>> guidance.
>>>
>>> Our current setup is at it's limit. Head series count is around 15M 
>>> during pod churn regularly. Each app exports between 5000 and 8000 metrics 
>>> series. So a 1000 pods causes about 8M new series in the head block. 
>>> Prometheus currently has access to 300 GB of memory, but it can't use 
>>> past 200GB in reality. It starts degrading around the 150GB mark. 
>>> - Scrape time for Prometheus scraping itself is 5+ seconds and config 
>>> reloads fail.
>>> - We verified that this is not due to a cardinality explosion from a 
>>> misbehaving app. So this has naturally degraded due to load.
>>> - We eliminated bad queries as a cause by spinning up an additional 
>>> Prometheus which just scrapes targets and nothing else. So the bottleneck 
>>> is just ingestion. 
>>>
>>> So the next step for us is to shard and use namespace level Prometheis. 
>>> But I expect a similar level of usage in about an year again at the 
>>> namespace level, with multiple apps in a single namespace scaling to 1000s 
>>> of pods exporting 5K metrics each. And I will not be able to shard again 
>>> because I don't want to go below  the NS granularity. 
>>>
>>> How have others dealt with this situation where is the bottle neck is 
>>> going to be ingestion and not queries?
>>>
>>> Thanks for your time,
>>> KVR
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/cf15cc42-fe3e-4f4d-8489-3750fac7f81en%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-users/cf15cc42-fe3e-4f4d-8489-3750fac7f81en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/93409c67-fa9c-4c4b-88b3-b34366f044a8n%40googlegroups.com.

Re: [prometheus-users] Scaling Prometheus

Reply via email to