First of all, thanks for your answer.
Scraping the Alertmanager is an interesting idea. However, although rather
unlikely, Prometheus may be able to scrape it, but not send alerts to it.
In the meantime, I found another way on the Internet which should be more
reliable:
- alert: PrometheusErrorSendingAlertsToSomeAlertmanagers
annotations:
description: '{{ printf "%.1f" $value }}% errors while sending alerts
from Prometheus
{{$labels.instance}} to Alertmanager {{$labels.alertmanager}}.'
summary: Prometheus has encountered more than 1% errors sending alerts
to a specific Alertmanager.
expr: |
(
rate(prometheus_notifications_errors_total{job="prometheus"}[5m])
/
rate(prometheus_notifications_sent_total{job="prometheus"}[5m])
)
* 100
> 1 # This is a percentage.
for: 15m
labels:
severity: critical
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/d17d1295-84f5-41c3-b587-51d346a65614n%40googlegroups.com.