On 22 Nov 12:15, Tim Schwenke wrote: > @All, I have switched from EFS to EBS (so no more NFS) and the head chunks > have stopped growing. So it really seems to have something to do with that > even though AWS's EFS NFS is supposed to be completely POSIX compliant. > Weird.
There might have a large range of issues, like client caching, coming into play with network storage. > > @[email protected], I see, thanks. Your post is also helpful for getting > alert rules right > [email protected] schrieb am Montag, 2. November 2020 um 09:47:39 UTC+1: > > > Hi, > > > > That's expected with the applications built with the newer Go versions. On > > Linux, they will "reserve" whatever memory is available in the host, but > > it's actually might not be allocations. This might be a memory that the > > kernel can take whenever it needs, so those cases will not trigger OOMs. To > > see what happens look for working sets metric instead of RSS. > > > > See https://www.bwplotka.dev/2019/golang-memory-monitoring/ for details 🤗 > > > > niedziela, 1 listopada 2020 o 23:29:13 UTC+1 [email protected] > > napisał(a): > > > >> The currently latest version v2.22.0 > >> > >> [email protected] schrieb am Donnerstag, 29. Oktober 2020 um > >> 21:12:00 UTC+1: > >> > >>> We see something similar for recent versions of Prometheus — but > >>> something else might cause it. What version are you at? > >>> > >>> torsdag 29. oktober 2020 skrev Tim Schwenke <[email protected]>: > >>> > >>>> I think I will just kick the bucket and move my Promstack to a > >>>> dedicated single node AWS ECS cluster, this way I can use EBS for > >>>> persistent storage > >>>> > >>>> Tim Schwenke schrieb am Montag, 26. Oktober 2020 um 14:10:49 UTC+1: > >>>> > >>>>> Thanks for your answers, @mierzwa and Ben > >>>>> > >>>>> @mierzwa, I have checked the robust perception blog post you have > >>>>> linked and it really seems like my Prometheus is using too much memory. > >>>>> I > >>>>> have only around 20k series, but with quite high cardinality over a > >>>>> long > >>>>> period of time. If we just look at a few hours or days the churn should > >>>>> be > >>>>> very low / zero and only increase when ever an API endpoint gets called > >>>>> the > >>>>> first time and so on. > >>>>> > >>>>> @Ben, the Prometheus is using AWS EFS for storage and so the disk size > >>>>> should be unlimited. It sits currently at just 13 GB while resident > >>>>> size is > >>>>> at around 2 GB and virtual memory around 7 GB. The node Prometheus is > >>>>> running on has 8 GB of RAM. Now that you have mentioned storage, I have > >>>>> read quite some time ago that Prometheus does not officially support > >>>>> NFS > >>>>> and I think the AWS EFS volume is mounted via NFS into the node. > >>>>> > >>>>> --------------------------------------------------------- > >>>>> > >>>>> Here is the info I get from the Prometheus web interface: > >>>>> > >>>>> https://trallnag.htmlsave.net/ > >>>>> > >>>>> This links to a html document that I saved > >>>>> > >>>>> > >>>>> > >>>>> [email protected] schrieb am Montag, 26. Oktober 2020 um 12:57:58 > >>>>> UTC+1: > >>>>> > >>>>>> It looks like the head chunks has been growing without compacting. Is > >>>>>> the disk full? What's in the logs? What's in the data directory? > >>>>>> > >>>>>> On Mon, Oct 26, 2020 at 11:45 AM Tim Schwenke <[email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Is it expected that Prometheus will take all the memory it gets over > >>>>>>> time? I'm running Prometheus in a very "stable" environment with > >>>>>>> about 200 > >>>>>>> containers / targets in addition with Cadvisor and Node Exporter over > >>>>>>> 3 to > >>>>>>> 4 nodes. And now after a few weeks of uptime Prometheus reached the > >>>>>>> memory > >>>>>>> limits and I get memory limit hit events all the time. > >>>>>>> > >>>>>>> Here are screenshots of relevant dashboards: > >>>>>>> > >>>>>>> > >>>>>>> https://github.com/trallnag/random-data/issues/1#issuecomment-716463651 > >>>>>>> > >>>>>>> Could be something wrong with my setup or is it all ok? The > >>>>>>> performance of queries in Grafana etc is not impacted > >>>>>>> > >>>>>>> -- > >>>>>>> You received this message because you are subscribed to the Google > >>>>>>> Groups "Prometheus Users" group. > >>>>>>> To unsubscribe from this group and stop receiving emails from it, > >>>>>>> send an email to [email protected]. > >>>>>>> To view this discussion on the web visit > >>>>>>> https://groups.google.com/d/msgid/prometheus-users/f89538c2-c07b-424d-a377-614f0098a337n%40googlegroups.com > >>>>>>> > >>>>>>> <https://groups.google.com/d/msgid/prometheus-users/f89538c2-c07b-424d-a377-614f0098a337n%40googlegroups.com?utm_medium=email&utm_source=footer> > >>>>>>> . > >>>>>>> > >>>>>> -- > >>>> You received this message because you are subscribed to the Google > >>>> Groups "Prometheus Users" group. > >>>> > >>> To unsubscribe from this group and stop receiving emails from it, send > >>>> an email to [email protected]. > >>>> > >>> To view this discussion on the web visit > >>>> https://groups.google.com/d/msgid/prometheus-users/06d6ba09-4558-4cbd-9efd-f6f8bf31d90fn%40googlegroups.com > >>>> > >>>> <https://groups.google.com/d/msgid/prometheus-users/06d6ba09-4558-4cbd-9efd-f6f8bf31d90fn%40googlegroups.com?utm_medium=email&utm_source=footer> > >>>> . > >>>> > >>> > >>> > >>> -- > >>> -- > >>> David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen > >>> > >>> > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/bf3431d6-1b0b-47e6-821c-4df379cca8a5n%40googlegroups.com. -- Julien Pivotto @roidelapluie -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20201122210917.GA560969%40oxygen.

