Julien, What is the best practices regarding service discovery? We do have a large amount of jobs with a large number of inactive targets - could this negatively impact our memory usage?
For example: - 1 (1/728 active targets) - 2 (143/1797 active targets) - 3 (1/728 active targets) - 4 (41/41 active targets) - 5 (41/41 active targets) - 6 (13/1237 active targets) - 7 (9/1243 active targets) - 8 (5/1261 active targets) - 9 (5/1261 active targets) - 10 (1/1261 active targets) - 11 (41/41 active targets) - 12 (1/1261 active targets) - 13 (41/1261 active targets) - 14 (5/1261 active targets) - 15 (5/1261 active targets) - 16(41/1261 active targets) - 17 (3/728 active targets) - 18 (3/728 active targets) Is the best practice now to use labels or regex in the re-label configs? Here is an example of the config for envoy-stats: https://gitlab.com/-/snippets/2052337 On Friday, December 18, 2020 at 6:50:07 PM UTC-6 Julien Pivotto wrote: > On 18 Dec 16:40, Brett Larson wrote: > > Here is a link to the "snapshot" of the dashboard. > > > https://snapshot.raintank.io/dashboard/snapshot/crxdjU7fhzAhl0x0KWiH1ZHGZXKyhqmF > > > Thanks, however this does not seem to show any memory issue. > > There might be an issue with your configuration, where you could take > advantage of some tweaks , like reusing the same sd configs + relabeling > or using selectors: in your kubernetes config. > > I would expect that if that is still an issue, a memory profile of > Prometheus when memory is high would help. > > > > On Friday, December 18, 2020 at 6:09:13 PM UTC-6 Julien Pivotto wrote: > > > > > On 18 Dec 16:03, Brett Larson wrote: > > > > Hello, > > > > I am using a Prometheus 2.21 server and I am seeing that the memory > is > > > > growing at an untenable rate. I will start the pod at 4GB and > eventually > > > it > > > > will move to 10GB after a few days and go into a crashloop back off > > > state. > > > > > > > > The pod is configured to only keep around 8 hours of data, and this > data > > > is > > > > stored to empyDir, not a persistent file system. We are doing remote > > > write > > > > to postgres for only about 4 metrics. > > > > > > > > We do have some no-nos (high cardinality labels & pod names) but > > > > unfortunately these are needed. > > > > > > > > I don't understand why after a retention of 8 hours we would still > cause > > > > memory like this to grow and i'm looking for some guidance on how I > can > > > > troubleshoot this. > > > > > > > > Please let me know, > > > > Thank you! > > > > > > Can you share screenshots of the > > > https://grafana.com/grafana/dashboards/12054 dashboard ? > > > > > > -- > > > Julien Pivotto > > > @roidelapluie > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Prometheus Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/98ef5eea-5552-42c8-9798-8c005cd01eaen%40googlegroups.com > . > > > -- > Julien Pivotto > @roidelapluie > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/64ca3851-4f7b-4d61-a047-d5b4c3b58c23n%40googlegroups.com.

