On 04 Mar 12:38, Julien Pivotto wrote:
> Hello there,
> 
> We are running a pair of HA prometheis and HA alertmanagers.
> 
> One prometheus server OOM'd; and restarted. When it was down, we
> received alert resolution notifications from the alertmanager:
> 
> > resolved (duration: 115h45m0s)
> 
> But a few seconds after:
> 
> > firing (duration: 115h52m16s)
> 
> I would have expected that the second prometheus, which had the alert
> all the time and was working as expected, would have prevented the alert
> to disappear.
> 
> Note that the alert does NOT have a `for` clause.
> 
> There is an entry at 9:44:39, then the server drops, and the alert is
> firing again at 9:53. Note: We received the new "firing" at 9:52, with 
> included 115h52m16s of duration.
> 
> Both Prometheis servers send alerts to both alertmanagers.
> 
> 
> What can have appened here?
> 
> Our evaluation_interval is 1m, and resend-delay is default.
> 
> -- 
>  (o-    Julien Pivotto
>  //\    Open-Source Consultant
>  V_/_   Inuits - https://www.inuits.eu
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/20200304113821.GA19241%40oxygen.

Note: alertmanagers are 0.20.0 pulled from GH releases and both
prometheus are 2.16.0 pulled from GH releases too.


-- 
 (o-    Julien Pivotto
 //\    Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200304113945.GB19241%40oxygen.

Attachment: signature.asc
Description: PGP signature

Reply via email to