On Fri, Jun 5, 2020 at 2:55 PM Dinesh N <[email protected]>
wrote:

> Hi Aliaksandr,
>
> Thanks for the valuable insights..
>
> I shall take a look at bomb-squad in the meanwhile do you foresee any
> generic options optimizations using metrics_relabel_configs or by using
> sample_limit can help to reduce the cardinalities.
>

`sample_limit` won't help here, since it limits the number of samples that
can be scraped from a single target. It doesn't limit the number of unique
label=value pairs.
The generic solution is to identify label with the biggest number of unique
values via `/api/v1/status/tsdb` page and then remove these labels via
`metrics_relabel_configs` using `action: labeldrop`.



>
> Time series -
>
> Currently we have close to 8 million time series for a single block which
> compacts in an event of every 2 hours
>
> Promethus config -
>
> RAM - 120 GB
> CPU - 32 core CPU
> Storage - 1 TB
>
> Problem statement -
>
> Once the RSS memory spikes more than 110 GB it crashes and which kind of
> makes our system very unstable ... Even we can't be increasing the
> resources more as we already operating with higest config.
>

>
> Any directions/approaches/mechanism are highly appreciated .
>

Try increasing scrape_interval for all the metrics. This should reduce RAM
usage for Prometheus.

Another option is to try VictoriaMetrics - it should use lower amounts of
RAM comparing to Prometheus for this workload.

-- 
Best Regards,

Aliaksandr Valialkin, CTO VictoriaMetrics

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAPbKnmATa894%3DbrO7UQrfAq%3Dzqw3q%2B-%2BKpdeBtF4pAKEknUkBA%40mail.gmail.com.

Reply via email to