On Saturday, 20 November 2021 at 21:17:05 UTC [email protected] wrote:
> But it is not working properly. Metric
> prometheus_notifications_alertmanagers_discovered starts at 0, and then it
> goes to 1 as expected.
>
> However, when I stop the service, it does not revert to 0:
>
It's unclear to me what that particularly metric measures. It could just
be talking about the service discovery of alertmanagers. Given that your
prometheus.yml contains:
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
then the service discovery ("targets") is always returning one
alertmanager, whether or not that alertmanager is up or down.
> Or is there a better way to check whether the connection between
> Prometheus and Alertmanager is healthy?
>
I suggest you scrape the alertmanager itself, by adding a new scrape job:
- job_name: alertmanager
static_configs:
- targets: ['localhost:9093']
Then you can check the up{job="alertmanager"} metric to tell if
alertmanager is up or down. In addition, you'll collect extra
alertmanager-specific metrics, such as the number of alerts which have been
sent out over different channels. Use "curl localhost:9093/metrics" to see
them.
Of course, if alertmanager is down, it's hard to get alerted on this
condition :-)
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/40a8fda7-bcac-4afb-b866-4c8dee232b77n%40googlegroups.com.