Thank you! I remember seeing it somewhere in the past, but couldn't remember it.
Regarding system load - even if it triggers an alert at 3 AM, as long as it goes to email and gets checked up in the morning, I think it's fine. At least you're not missing out on (potentially) abnormal behavior. On Thu, Jan 13, 2022 at 10:00 AM Brian Candler <[email protected]> wrote: > On Thursday, 13 January 2022 at 07:41:33 UTC [email protected] wrote: > >> What is the best way to have alerts when metric X passes a threshold for >> most servers, but for the ones that are already running close to X, set a >> different rule? >> > > See https://www.robustperception.io/using-time-series-as-alert-thresholds > for the direct answer to that question. > > You can also monitor on trends rather than static thresholds - e.g. for > disk space you can use predict_linear to detect when a filesystem looks > like it's going to become full. See this thread > <https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit> > . > > However, I'd also caution you against setting alerts on causes, and > concentrate your alerting on symptoms instead. You can't avoid all > cause-based alerts, but you can minimise them. > > "CPU load" for example, is not a particularly useful metric to alert on. > Suppose the CPU load hits 99% at 3am in the morning, *but the service is > still working fine.* Do you really want to get someone out of bed for > this? And if you do get them out of bed, what exactly are they going to do > about it anyway? > > This document, which is only a few pages, is well worth reading: > > https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/43a0cd05-75a9-4a03-af2c-b29cad12435fn%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/43a0cd05-75a9-4a03-af2c-b29cad12435fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAOKso16bqt6tOhdy13AYxwdZVfe2b%2BzmgeX84rnDeaFGiNP_Rw%40mail.gmail.com.

