I think I will just kick the bucket and move my Promstack to a dedicated single node AWS ECS cluster, this way I can use EBS for persistent storage
Tim Schwenke schrieb am Montag, 26. Oktober 2020 um 14:10:49 UTC+1: > Thanks for your answers, @mierzwa and Ben > > @mierzwa, I have checked the robust perception blog post you have linked > and it really seems like my Prometheus is using too much memory. I have > only around 20k series, but with quite high cardinality over a long period > of time. If we just look at a few hours or days the churn should be very > low / zero and only increase when ever an API endpoint gets called the > first time and so on. > > @Ben, the Prometheus is using AWS EFS for storage and so the disk size > should be unlimited. It sits currently at just 13 GB while resident size is > at around 2 GB and virtual memory around 7 GB. The node Prometheus is > running on has 8 GB of RAM. Now that you have mentioned storage, I have > read quite some time ago that Prometheus does not officially support NFS > and I think the AWS EFS volume is mounted via NFS into the node. > > --------------------------------------------------------- > > Here is the info I get from the Prometheus web interface: > > https://trallnag.htmlsave.net/ > > This links to a html document that I saved > > > > [email protected] schrieb am Montag, 26. Oktober 2020 um 12:57:58 UTC+1: > >> It looks like the head chunks has been growing without compacting. Is the >> disk full? What's in the logs? What's in the data directory? >> >> On Mon, Oct 26, 2020 at 11:45 AM Tim Schwenke <[email protected]> >> wrote: >> >>> Is it expected that Prometheus will take all the memory it gets over >>> time? I'm running Prometheus in a very "stable" environment with about 200 >>> containers / targets in addition with Cadvisor and Node Exporter over 3 to >>> 4 nodes. And now after a few weeks of uptime Prometheus reached the memory >>> limits and I get memory limit hit events all the time. >>> >>> Here are screenshots of relevant dashboards: >>> >>> https://github.com/trallnag/random-data/issues/1#issuecomment-716463651 >>> >>> Could be something wrong with my setup or is it all ok? The performance >>> of queries in Grafana etc is not impacted >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/f89538c2-c07b-424d-a377-614f0098a337n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/prometheus-users/f89538c2-c07b-424d-a377-614f0098a337n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/06d6ba09-4558-4cbd-9efd-f6f8bf31d90fn%40googlegroups.com.

