Hey, I'm using Prometheus v2.29.1. My scrape interval is 15 seconds and I'm measuring RAM using "container_memory_working_set_bytes"(metrics used to check k8s pod usage)
Using "Status" in the Prometheus web UI, I see the following Head Stats: Number of Series 7644889 Number of Chunks 8266039 Number of Label Pairs 9968 Like I mentioned above, We're getting* the average Metrics Per node as 8257* and we have around 300 targets now, which makes our total metrics around 2,100,000. *Are you monitoring Kubernetes pods by any chance? *I'm not monitoring any pods, I connect to certain nodes that send in custom metrics. Since I'm using a pod and not a node, the resources assigned to this pod are exclusive. On Thursday, 10 February 2022 at 00:20:04 UTC-8 Brian Candler wrote: > What prometheus version? How often are you polling? How are you measuring > the RAM utilisation? > > Let me give you a comparison. I have a prometheus instance here which is > polling 161 node_exporter targets, 38 snmp_exporter targets, 46 > blackbox_exporter targets, and a handful of others, with a 1 minute scrape > interval. It's running inside an lxd container, and uses a grand total of > *2.5GB > RAM* (as reported by "free" inside the container, "used" column). The > entire physical server has 16GB of RAM, and is running a bunch of other > monitoring tools in other containers as well. The physical host has 9GB of > available RAM (as reported by "free" on the host, "available" column). > > This is with prometheus-2.33.0, under Ubuntu 18.04, although I haven't > noticed significantly higher RAM utilisation with older versions of > prometheus. > > Using "Status" in the Prometheus web UI, I see the following Head Stats: > > Number of Series 525141 > Number of Chunks 525141 > Number of Label Pairs 15305 > > I can use a relatively expensive query to count the individual metrics at > the current instance in time (takes a few seconds): > count by (job) ({__name__=~".+"}) > > This shows 391,863 metrics for node(*), 99,175 metrics for snmp, 23,138 > metrics for haproxy (keepalived), and roughly 10,000 other metrics in total. > > (*) Given that there are 161 node targets, that's an average of 2433 > metrics per node (from node_exporter). > > In summary, I find prometheus to be extremely frugal in its use of RAM, > and therefore if you're getting OOM problems then there must be something > different about your system. > > Are you monitoring kubernetes pods by any chance? Is there a lot of churn > in those pods (i.e. pods being created and destroyed)? If you generate > large numbers of short-lived timeseries, then that will require a lot more > memory. The Head Stats figures is the place to start. > > Aside: a week or two ago, there was an isolated incident where this server > started using more CPU and RAM. Memory usage graphs showed the RAM growing > steadily over a period of about 5 hours; at that point, it was under so > much memory pressure I couldn't log in to diagnose, and was forced to > reboot. However since node_exporter is only returning the overall RAM on > the host, not per-container, I can't tell which of the many containers > running on that host was the culprit. > > [image: ram.png] > This server is also running victoriametrics, nfsen, loki, smokeping, > oxidized, netdisco, nagios, and some other bits and bobs - so it could have > been any one of those. In fact, given that Ubuntu does various daily > housecleaning activities at 06:25am, it could have been any of those as > well. > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cb2f43ff-eebf-48cc-a77a-637482430448n%40googlegroups.com.