On 10.03.21 00:31, dc3o wrote: > Few times we had to bring our database clusters down due to maintenance. > Prior to this we create a silence for a limited period of time. The silence > is properly catching all the alerts. Problem is that once the db host is > down, Prometheus is no longer scraping metrics and marks the initial alert > as resolved. No metrics no problem. Looks like send resolved is skipping > silencing pipeline and we're getting alert fatigue of resolved events.
Yeah, in my understanding, silencing right now has a semantic independent from silencing. Which is IMHO confusing because a silenced alert is not repeatedly sent to the receiver as configured with the repeat_interval. (Some receivers are configured to consider an alert resolved after a while if not receiving any updates). See the old issue https://github.com/prometheus/alertmanager/issues/226 with some considerations when Alertmanager should send resolved and when not. I expect some movement on this front in the near future. Reporting your use case and your expectation there might be helpful. -- Björn Rabenstein [PGP-ID] 0x851C3DA17D748D03 [email] [email protected] -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20210318161132.GF2773%40jahnn.

