[prometheus-users] Re: How to detect lost connection to Alertmanager

Brian Candler Sun, 21 Nov 2021 02:29:35 -0800

On Saturday, 20 November 2021 at 21:17:05 UTC [email protected] wrote:

> But it is not working properly. Metric 
> prometheus_notifications_alertmanagers_discovered starts at 0, and then it 
> goes to 1 as expected.
>
> However, when I stop the service, it does not revert to 0:
>


It's unclear to me what that particularly metric measures.  It could just 
be talking about the service discovery of alertmanagers.  Given that your 
prometheus.yml contains:

alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']
 
then the service discovery ("targets") is always returning one 
alertmanager, whether or not that alertmanager is up or down.
 

> Or is there a better way to check whether the connection between 
> Prometheus and Alertmanager is healthy?
>

I suggest you scrape the alertmanager itself, by adding a new scrape job:

  - job_name: alertmanager
    static_configs:
      - targets: ['localhost:9093']

Then you can check the up{job="alertmanager"} metric to tell if 
alertmanager is up or down.  In addition, you'll collect extra 
alertmanager-specific metrics, such as the number of alerts which have been 
sent out over different channels. Use "curl localhost:9093/metrics" to see 
them.

Of course, if alertmanager is down, it's hard to get alerted on this 
condition :-)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/40a8fda7-bcac-4afb-b866-4c8dee232b77n%40googlegroups.com.

[prometheus-users] Re: How to detect lost connection to Alertmanager

Reply via email to