Re: [prometheus-users] Re: sometimes I just received a resolved email but not firing email

Brian Candler Sun, 16 Feb 2020 03:30:48 -0800

On 16/02/2020 10:09, bryan wrote:

yes, I'm running an alertmanager cluste, and I have turn on prometheus"debug" level logging, but nothing could be found, for details:


Have you set --log.level=debug on the alertmanager processes as well?

I see the following in my (non-clustered) test environment:

Feb 16 11:04:41 prometheus alertmanager[1772]: level=debugts=2020-02-16T11:04:41.923Z caller=dispatch.go:135 component=dispatchermsg="Received alert" alert=UpDown[0f48c03][active]Feb 16 11:05:56 prometheus alertmanager[1772]: level=debugts=2020-02-16T11:05:56.922Z caller=dispatch.go:135 component=dispatchermsg="Received alert" alert=UpDown[0f48c03][active]Feb 16 11:06:26 prometheus alertmanager[1772]: level=debugts=2020-02-16T11:06:26.952Z caller=dispatch.go:465 component=dispatcheraggrGroup="{}:{alertname=\"UpDown\"}" msg=flushingalerts=[UpDown[0f48c03][active]]Feb 16 11:07:11 prometheus alertmanager[1772]: level=debugts=2020-02-16T11:07:11.924Z caller=dispatch.go:135 component=dispatchermsg="Received alert" alert=UpDown[0f48c03][active]

This shows the alerts being received from prometheus. However I don'tsee any debug logs for the SMTP exchanges when it's sending out mail.


When I resolve the problem, alertmanager logs show:

Feb 16 11:22:11 prometheus alertmanager[1772]: level=debugts=2020-02-16T11:22:11.922Z caller=dispatch.go:135 component=dispatchermsg="Received alert" alert=UpDown[0f48c03][resolved]Feb 16 11:23:26 prometheus alertmanager[1772]: level=debugts=2020-02-16T11:23:26.921Z caller=dispatch.go:135 component=dispatchermsg="Received alert" alert=UpDown[0f48c03][resolved]


So I was wrong: prometheus *does* actively notify resolved alerts.

If the SMTP server was down, I didn't get any error logged. But afterrestarting the SMTP server, the message was delivered - so it appearsthat alertmanager does its own queueing and retrying.

One thing that might be useful to you is the alertmanager metrics forfailed notifications:


$ curl -s localhost:9093/metrics | grep notifications_failed

# HELP alertmanager_notifications_failed_total The total number offailed notifications.

# TYPE alertmanager_notifications_failed_total counter
alertmanager_notifications_failed_total{integration="email"} 0
alertmanager_notifications_failed_total{integration="hipchat"} 0
alertmanager_notifications_failed_total{integration="opsgenie"} 0
alertmanager_notifications_failed_total{integration="pagerduty"} 0
alertmanager_notifications_failed_total{integration="pushover"} 0
alertmanager_notifications_failed_total{integration="slack"} 0
alertmanager_notifications_failed_total{integration="victorops"} 0
alertmanager_notifications_failed_total{integration="webhook"} 0
alertmanager_notifications_failed_total{integration="wechat"} 0

You could try this on all your alertmanager nodes, and see if aparticular one has problems with E-mail.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6df6f20c-6439-f315-eb5e-812e0ff328cd%40pobox.com.

Re: [prometheus-users] Re: sometimes I just received a resolved email but not firing email

Reply via email to