On Fri, 23 Oct 2020 at 22:41, Jimmy the Greek <[email protected]> wrote:
[...]
> As you can see I run prom QL test
> rate(coredns_nodecache_setup_errors_total{}[5m]) that evaluates to 1.666.
> Therefore when I test NodeLocalDNSSetupErrorsHigh which will trigger when
> that value is above 0 for 5 minute period the test only passes if I set
> eval_time to 6m, and fails if I set it to 5m (alert doesn't trigger).
>
> What is the relation between the for time in the alert rule itself and the
> eval_time in the test?
In this case the "for: 5m" means that the alert rule has to be firing
for 5 minutes. Because a rate() needs two samples in order to
calculate a rate your rate() function starts to return a value at 1m,
then when your rules are evaluated at 6m the alert starts firing.
Aside: if you get the promtool from 2.22.0 it's now possible to look
at the ALERTS timeseries, including pending alerts where the for
threshold hasn't been reached, I wouldn't recommend you actually test
the "for" threshold in rules in most cases (you're kind of testing
Prometheus rather than your rules then). But it is possible to
temporarily add a test for debugging like:
- expr: ALERTS{alertstate="pending"}
eval_time: 5m
Which the failure output of will tell you that your alert is pending
at that point, e.g.:
expr: "ALERTS{alertstate=\"pending\"}", time: 5m,
exp:"nil"
got:"{__name__=\"ALERTS\",
alertname=\"NodeLocalDNSSetupErrorsHigh\", alertstate=\"pending\",
errortype=\"configmap\", pod=\"unit-test\", severity=\"critical\"}
1E+00"
David
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/CAP9KPhD-z-map3OubH-ivr5Q7_c1wOF_UL8uOxKdu9bCvqCfuQ%40mail.gmail.com.