[prometheus-users] Alerts resolved upon prometheus crash

Julien Pivotto Wed, 04 Mar 2020 03:39:22 -0800

Hello there,

We are running a pair of HA prometheis and HA alertmanagers.


One prometheus server OOM'd; and restarted. When it was down, we
received alert resolution notifications from the alertmanager:

> resolved (duration: 115h45m0s)

But a few seconds after:

> firing (duration: 115h52m16s)

I would have expected that the second prometheus, which had the alert
all the time and was working as expected, would have prevented the alert
to disappear.

Note that the alert does NOT have a `for` clause.

There is an entry at 9:44:39, then the server drops, and the alert is
firing again at 9:53. Note: We received the new "firing" at 9:52, with included 
115h52m16s of duration.

Both Prometheis servers send alerts to both alertmanagers.


What can have appened here?

Our evaluation_interval is 1m, and resend-delay is default.

-- 
 (o-    Julien Pivotto
 //\    Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200304113821.GA19241%40oxygen.

signature.asc
Description: PGP signature

[prometheus-users] Alerts resolved upon prometheus crash

Reply via email to