If all of the 1000s of pods in a namespace are of the same thing, you can use the hashmod feature to horizontally scale.
You can have several Prometheus instances per namespace, each responsible for a fraction of the pods. Just to be sure, are you keeping up to date on the latest releases? 200G of memory seems like a lot for 15M series. Are you using Thanos or a remote write service? On Sun, Oct 11, 2020, 07:14 kvr <[email protected]> wrote: > Hello, > > We are hitting some limits with our current setup of Prometheus. I have > read a lot of posts here as well as blogs and videos but still need some > guidance. > > Our current setup is at it's limit. Head series count is around 15M during > pod churn regularly. Each app exports between 5000 and 8000 metrics series. > So a 1000 pods causes about 8M new series in the head block. > Prometheus currently has access to 300 GB of memory, but it can't use past > 200GB in reality. It starts degrading around the 150GB mark. > - Scrape time for Prometheus scraping itself is 5+ seconds and config > reloads fail. > - We verified that this is not due to a cardinality explosion from a > misbehaving app. So this has naturally degraded due to load. > - We eliminated bad queries as a cause by spinning up an additional > Prometheus which just scrapes targets and nothing else. So the bottleneck > is just ingestion. > > So the next step for us is to shard and use namespace level Prometheis. > But I expect a similar level of usage in about an year again at the > namespace level, with multiple apps in a single namespace scaling to 1000s > of pods exporting 5K metrics each. And I will not be able to shard again > because I don't want to go below the NS granularity. > > How have others dealt with this situation where is the bottle neck is > going to be ingestion and not queries? > > Thanks for your time, > KVR > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/cf15cc42-fe3e-4f4d-8489-3750fac7f81en%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/cf15cc42-fe3e-4f4d-8489-3750fac7f81en%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmrwKTTnpTqLVHwFbAM9GmD0%3Dk9%2BxJFPsa4ZD1iQwyfV0A%40mail.gmail.com.

