That's an interesting problem. When you have alerts where one server in the HA pair can take multiple minutes longer to resolve it than the other one (because they don't do SD at the same time), I'm not sure what you can do besides routing those alerts into a route that has a long-ish (longer than those couple of minutes, which is common anyway) "group_interval" set, so at least you shouldn't get a resolved notification and new firing notification flapping (resolved notifications also obey they "group_interval"). Or do you have a long-enough "group_interval" and still get multiple firing/resolved notifications actually sent out of Alertmanager?
On Thu, Jul 30, 2020 at 12:43 PM Giedrius Statkevičius < [email protected]> wrote: > Hello all, > > Let's say we have >=2 Prometheus nodes that are scraping the same k8s > metrics. k8s SD happens every 5 minutes. Then, imagine an alerting rule > expression such as: > > absent({pod="my-cool-pod}) == 1 > > Then, what happens in practice is that you will see the alert quickly > becoming firing -> resolved -> firing -> resolved because AFAICT one > Prometheus node will send an alert towards AlertManager with the state > "resolved" and then after some seconds the 2nd will still send an alert > with the state "firing" because the metric is still not there. Then, it > sends an alert with the state "resolved" and only then it finally becomes > actually resolved. Seems like the magic happens here: > https://github.com/prometheus/prometheus/blob/master/rules/alerting.go#L103-L106. > I would imagine that in such a scenario we should depend on AlertManager > resolving the alert automatically for us after some time to get a > "consistent" state. > > Any thoughts on this or perhaps I am missing something? > > BR, > Giedrius > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/b2de936f-a79e-40c0-80be-0452e52980a8o%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/b2de936f-a79e-40c0-80be-0452e52980a8o%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Julius Volz PromLabs - promlabs.com -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAObpH5z9NybKoyv9c%2BZhxe0CD3uS8j-HM8VUt%2BcKDNn2yV7UXA%40mail.gmail.com.

