Re: [prometheus-users] Re: Single Prometheus for Large Cluster

patricia lee Tue, 07 Sep 2021 05:57:50 -0700

Thank you Brian for the reply. Yes I mean host (nodes).
What we have done for the mean time is we have set the retentionTime of
prometheus to 5minutes (which I am not comfortable) but was advised by
seniors just for us to continue.
Thanks for the information above, i'll check it out and try on our cluster
environment.







On Tue, Sep 7, 2021 at 4:50 PM Brian Candler <[email protected]> wrote:

> It's not clear what you mean by "No. of Nodes" - whether you mean hosts
> (e.g. which you're scraping using node_exporter), or pods, or something
> else.  But what matters is the total number of metrics, and the amount of
> metric churn,  i.e. the rate at which new timeseries are being created
> dynamically; and also how much querying is going on.
>
> If you go to Prometheus web interface, Status > TSDB Status, you'll get
> some statistics which may help you.  Consider:
>
> - collecting fewer metrics (by changing what you scrape, and/or using
> metric_relabel_configs to drop some timeseries which are not of interest)
>
> - see if it's possible to reduce timeseries churn.  For example, if you
> have one application which is generating large numbers of short-lived pods
> then you may wish to reduce or suppress the metrics collected for those
> pods.
>
> - have a look at the PromQL queries being executed, and whether any of
> these are using excessing amounts of RAM.  The query log
> <https://prometheus.io/docs/guides/query-log/> may help.  You can also
> apply limits to how much memory is used by individual queries using
>       --query.max-concurrency=20  # default
>       --query.max-samples=50000000  # default
> (although that may cause the offending queries to fail)
>
> There are also blog posts out there which you can turn up with a search,
> e.g.
> https://source.coveo.com/2021/03/03/prometheus-memory/
>
> On Tuesday, 7 September 2021 at 07:34:51 UTC+1 [email protected] wrote:
>
>> Hi everyone, I am new here.
>>
>> I would like to seek some advice on the design approach we should take.
>> With the given problem below, in terms of cost, how can we set up
>> Prometheus with a large cluster.
>>
>> *Variables:*
>> *Installation: *Kube-stack-prometheus helm chart.
>> *Autoscale*: yes
>> *No. of Nodes*: 1000 up to 1300
>> *Mesh*: Istio
>> *Memory Usage:* 50GB (Still gets OOM)
>> *Installed: *1 Prometheus, 1 Kiali, 1 Grafana and 1 Jaeger
>>
>> *Issue:*
>> 1. We cannot expand a larger node for Prometheus as 60GB memory is
>> already expensive.  (cost not approved by management)
>> 2. Removing unnecessary metrics is not yet advised because we do not know
>> which metrics of istio, jaeger and kiali are needed.
>>
>> *Tried solution:*
>> We have federated the single instance of prometheus with Thanos
>> Receivers, however, the issue is still there because kiali queries its data
>> directly from prometheus which eventually gets OOM.
>>
>> *Question:*
>> We are thinking of firing up multiple prometheus for each namespace and
>> adding thanos-sidecar with the same scrape config since thanos will
>> deduplicate all duplicated metrics. This approach would solve the issue in
>> Grafana queries but not in Kiali.
>>
>> How can we set up a multiple prometheus (low cost) but single instance
>> prometheus for kiali (whole cluster)?
>>
>> Appreciate any help. Thank you.
>>
>>
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/24a15533-094e-4a4c-9644-5d4375b6aaa2n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/24a15533-094e-4a4c-9644-5d4375b6aaa2n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAAMbZt8Vnh7K5%2B7VpLTRHX4xM7wth7J_vS%2BuEvqdg--9bpod4g%40mail.gmail.com.

Re: [prometheus-users] Re: Single Prometheus for Large Cluster

Reply via email to