On 04 Mar 12:38, Julien Pivotto wrote: > Hello there, > > We are running a pair of HA prometheis and HA alertmanagers. > > One prometheus server OOM'd; and restarted. When it was down, we > received alert resolution notifications from the alertmanager: > > > resolved (duration: 115h45m0s) > > But a few seconds after: > > > firing (duration: 115h52m16s) > > I would have expected that the second prometheus, which had the alert > all the time and was working as expected, would have prevented the alert > to disappear. > > Note that the alert does NOT have a `for` clause. > > There is an entry at 9:44:39, then the server drops, and the alert is > firing again at 9:53. Note: We received the new "firing" at 9:52, with > included 115h52m16s of duration. > > Both Prometheis servers send alerts to both alertmanagers. > > > What can have appened here? > > Our evaluation_interval is 1m, and resend-delay is default. > > -- > (o- Julien Pivotto > //\ Open-Source Consultant > V_/_ Inuits - https://www.inuits.eu > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/20200304113821.GA19241%40oxygen.
Note: alertmanagers are 0.20.0 pulled from GH releases and both prometheus are 2.16.0 pulled from GH releases too. -- (o- Julien Pivotto //\ Open-Source Consultant V_/_ Inuits - https://www.inuits.eu -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20200304113945.GB19241%40oxygen.
signature.asc
Description: PGP signature

