On 10.03.21 00:31, dc3o wrote:
> Few times we had to bring our database clusters down due to maintenance. 
> Prior to this we create a silence for a limited period of time. The silence 
> is properly catching all the alerts. Problem is that once the db host is 
> down, Prometheus is no longer scraping metrics and marks the initial alert 
> as resolved. No metrics no problem. Looks like send resolved is skipping 
> silencing pipeline and we're getting alert fatigue of resolved events.

Yeah, in my understanding, silencing right now has a semantic
independent from silencing. Which is IMHO confusing because a silenced
alert is not repeatedly sent to the receiver as configured with the
repeat_interval. (Some receivers are configured to consider an alert
resolved after a while if not receiving any updates).

See the old issue
https://github.com/prometheus/alertmanager/issues/226 with some
considerations when Alertmanager should send resolved and when not. I
expect some movement on this front in the near future. Reporting your
use case and your expectation there might be helpful.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] [email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20210318161132.GF2773%40jahnn.

Reply via email to