On Thu, Jun 24, 2021 at 2:39 AM dc3o <[email protected]> wrote: > Using black box exporter for monitoring internal apps. In non production > environments I would like to set the alerting rule to skip registering the > alert if monitored endpoint is down for more than a few days. My main > concern is that alert rule like: > > probe_success{job="blackbox"} != 1 and > avg_over_time(probe_success[3d]) *100 > 10 > > could miss some issues in prod environments. >
My first thought when I read this is to use inhibit rules in Alert Manager: define an alert that will fire after the endpoint is down for more than the number of days and use that as the source. The bit I'm not sure about is the concern you have. I would expect that you have labels to tell non-prod apart from prod, so you can inhibit the non-prod alerts and leave the prod alerts alone. Does that make sense? -- Marcelo Magallón -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABiJYgYUQuAV2VTcZvd4NAtaCPOV14XWLnSsoCZ0Mma7jOPOyw%40mail.gmail.com.

