On Fri, Jun 5, 2020 at 2:55 PM Dinesh N <[email protected]> wrote:
> Hi Aliaksandr, > > Thanks for the valuable insights.. > > I shall take a look at bomb-squad in the meanwhile do you foresee any > generic options optimizations using metrics_relabel_configs or by using > sample_limit can help to reduce the cardinalities. > `sample_limit` won't help here, since it limits the number of samples that can be scraped from a single target. It doesn't limit the number of unique label=value pairs. The generic solution is to identify label with the biggest number of unique values via `/api/v1/status/tsdb` page and then remove these labels via `metrics_relabel_configs` using `action: labeldrop`. > > Time series - > > Currently we have close to 8 million time series for a single block which > compacts in an event of every 2 hours > > Promethus config - > > RAM - 120 GB > CPU - 32 core CPU > Storage - 1 TB > > Problem statement - > > Once the RSS memory spikes more than 110 GB it crashes and which kind of > makes our system very unstable ... Even we can't be increasing the > resources more as we already operating with higest config. > > > Any directions/approaches/mechanism are highly appreciated . > Try increasing scrape_interval for all the metrics. This should reduce RAM usage for Prometheus. Another option is to try VictoriaMetrics - it should use lower amounts of RAM comparing to Prometheus for this workload. -- Best Regards, Aliaksandr Valialkin, CTO VictoriaMetrics -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPbKnmATa894%3DbrO7UQrfAq%3Dzqw3q%2B-%2BKpdeBtF4pAKEknUkBA%40mail.gmail.com.

