On Thu, Jun 24, 2021 at 2:39 AM dc3o <[email protected]> wrote:

> Using black box exporter for monitoring internal apps. In non production
> environments I would like to set the alerting rule to  skip registering the
> alert if monitored endpoint is down for more than a few days. My main
> concern is that alert rule like:
>
>      probe_success{job="blackbox"} != 1 and
> avg_over_time(probe_success[3d]) *100 > 10
>
> could miss some issues in prod environments.
>

My first thought when I read this is to use inhibit rules in Alert Manager:
define an alert that will fire after the endpoint is down for more than the
number of days and use that as the source. The bit I'm not sure about is
the concern you have. I would expect that you have labels to tell non-prod
apart from prod, so you can inhibit the non-prod alerts and leave the prod
alerts alone.

Does that make sense?

-- 
Marcelo Magallón

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABiJYgYUQuAV2VTcZvd4NAtaCPOV14XWLnSsoCZ0Mma7jOPOyw%40mail.gmail.com.

Reply via email to