Re: [prometheus-users] Alerts resolved upon prometheus crash

Julien Pivotto Thu, 05 Mar 2020 01:35:22 -0800

On 05 Mar 01:17, Daniel Swarbrick wrote:
> By default, Alertmanager will consider alerts resolved if 5 minutes or more 
> elapses without the alert firiing (resolve_timeout config option).
> 
> If your Prometheus instance crashes and takes more than 5 minutes to 
> restart, it's highly likely that any previously firing alerts will be 
> "resolved". If the alerting rule conditions still exist after the restart, 
> new alerts will be fired.


Except that another prometheus server was still sending the alerts, so
that is not likely the explanation :(

But the server was in a pretty bad shape so maybe the alertmanager on
the same host was foobar too doring that time.

> On Wednesday, March 4, 2020 at 12:45:11 PM UTC+1, Julien Pivotto wrote:
> >
> > On 04 Mar 12:39, Julien Pivotto wrote: 
> > > On 04 Mar 12:38, Julien Pivotto wrote: 
> > > > Hello there, 
> > > > 
> > > > We are running a pair of HA prometheis and HA alertmanagers. 
> > > > 
> > > > One prometheus server OOM'd; and restarted. When it was down, we 
> > > > received alert resolution notifications from the alertmanager: 
> > > > 
> > > > > resolved (duration: 115h45m0s) 
> > > > 
> > > > But a few seconds after: 
> > > > 
> > > > > firing (duration: 115h52m16s) 
> > > > 
> > > > I would have expected that the second prometheus, which had the alert 
> > > > all the time and was working as expected, would have prevented the 
> > alert 
> > > > to disappear. 
> > > > 
> > > > Note that the alert does NOT have a `for` clause. 
> > > > 
> > > > There is an entry at 9:44:39, then the server drops, and the alert is 
> > > > firing again at 9:53. Note: We received the new "firing" at 9:52, with 
> > included 115h52m16s of duration. 
> > > > 
> > > > Both Prometheis servers send alerts to both alertmanagers. 
> > > > 
> > > > 
> > > > What can have appened here? 
> > > > 
> > > > Our evaluation_interval is 1m, and resend-delay is default. 
> > > > 
> > > > -- 
> > > >  (o-    Julien Pivotto 
> > > >  //\    Open-Source Consultant 
> > > >  V_/_   Inuits - https://www.inuits.eu 
> > > > 
> > > > -- 
> > > > You received this message because you are subscribed to the Google 
> > Groups "Prometheus Users" group. 
> > > > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to [email protected] <javascript:>. 
> > > > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/prometheus-users/20200304113821.GA19241%40oxygen.
> >  
> >
> > > 
> > > Note: alertmanagers are 0.20.0 pulled from GH releases and both 
> > > prometheus are 2.16.0 pulled from GH releases too. 
> >
> >
> > When I look at the metrics, it looks like 
> > rate(alertmanager_alerts_received_total[5m]) is showing a lot of 
> > 'resolved' at that time. It it possible that Prometheus somehow sends 
> > resolved alerts when TSDB is not yet ready? And because those rules were 
> > running for a long time, we tried to restore them ? 
> >
> > regards, 
> >
> >
> > -- 
> >  (o-    Julien Pivotto 
> >  //\    Open-Source Consultant 
> >  V_/_   Inuits - https://www.inuits.eu 
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/c78909f5-1f22-4e2a-a276-794408a8dae5%40googlegroups.com.


-- 
 (o-    Julien Pivotto
 //\    Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200305093509.GA26460%40oxygen.

signature.asc
Description: PGP signature

Re: [prometheus-users] Alerts resolved upon prometheus crash

Reply via email to