On Fri, 23 Oct 2020 at 22:41, Jimmy the Greek <[email protected]> wrote:
[...]
> As you can see I run prom QL test 
> rate(coredns_nodecache_setup_errors_total{}[5m]) that evaluates to 1.666. 
> Therefore when I test NodeLocalDNSSetupErrorsHigh which will trigger when 
> that value is above 0 for 5 minute period the test only passes if I set 
> eval_time to 6m, and fails if I set it to 5m (alert doesn't trigger).
>
> What is the relation between the for time in the alert rule itself and the 
> eval_time in the test?

In this case the "for: 5m" means that the alert rule has to be firing
for 5 minutes. Because a rate() needs two samples in order to
calculate a rate your rate() function starts to return a value at 1m,
then when your rules are evaluated at 6m the alert starts firing.

Aside: if you get the promtool from 2.22.0 it's now possible to look
at the ALERTS timeseries, including pending alerts where the for
threshold hasn't been reached, I wouldn't recommend you actually test
the "for" threshold in rules in most cases (you're kind of testing
Prometheus rather than your rules then). But it is possible to
temporarily add a test for debugging like:

  - expr: ALERTS{alertstate="pending"}
    eval_time: 5m

Which the failure output of will tell you that your alert is pending
at that point, e.g.:

   expr: "ALERTS{alertstate=\"pending\"}", time: 5m,
        exp:"nil"
        got:"{__name__=\"ALERTS\",
alertname=\"NodeLocalDNSSetupErrorsHigh\", alertstate=\"pending\",
errortype=\"configmap\", pod=\"unit-test\", severity=\"critical\"}
1E+00"

David

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAP9KPhD-z-map3OubH-ivr5Q7_c1wOF_UL8uOxKdu9bCvqCfuQ%40mail.gmail.com.

Reply via email to