On 04 Mar 12:39, Julien Pivotto wrote:
> On 04 Mar 12:38, Julien Pivotto wrote:
> > Hello there,
> > 
> > We are running a pair of HA prometheis and HA alertmanagers.
> > 
> > One prometheus server OOM'd; and restarted. When it was down, we
> > received alert resolution notifications from the alertmanager:
> > 
> > > resolved (duration: 115h45m0s)
> > 
> > But a few seconds after:
> > 
> > > firing (duration: 115h52m16s)
> > 
> > I would have expected that the second prometheus, which had the alert
> > all the time and was working as expected, would have prevented the alert
> > to disappear.
> > 
> > Note that the alert does NOT have a `for` clause.
> > 
> > There is an entry at 9:44:39, then the server drops, and the alert is
> > firing again at 9:53. Note: We received the new "firing" at 9:52, with 
> > included 115h52m16s of duration.
> > 
> > Both Prometheis servers send alerts to both alertmanagers.
> > 
> > 
> > What can have appened here?
> > 
> > Our evaluation_interval is 1m, and resend-delay is default.
> > 
> > -- 
> >  (o-    Julien Pivotto
> >  //\    Open-Source Consultant
> >  V_/_   Inuits - https://www.inuits.eu
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups 
> > "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected].
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/prometheus-users/20200304113821.GA19241%40oxygen.
> 
> Note: alertmanagers are 0.20.0 pulled from GH releases and both
> prometheus are 2.16.0 pulled from GH releases too.


When I look at the metrics, it looks like
rate(alertmanager_alerts_received_total[5m]) is showing a lot of
'resolved' at that time. It it possible that Prometheus somehow sends
resolved alerts when TSDB is not yet ready? And because those rules were
running for a long time, we tried to restore them ?

regards,


-- 
 (o-    Julien Pivotto
 //\    Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200304114503.GA22356%40oxygen.

Attachment: signature.asc
Description: PGP signature

Reply via email to