Hello all,

Let's say we have >=2 Prometheus nodes that are scraping the same k8s 
metrics. k8s SD happens every 5 minutes. Then, imagine an alerting rule 
expression such as:

absent({pod="my-cool-pod}) == 1

Then, what happens in practice is that you will see the alert quickly 
becoming firing -> resolved -> firing -> resolved because AFAICT one 
Prometheus node will send an alert towards AlertManager with the state 
"resolved" and then after some seconds the 2nd will still send an alert 
with the state "firing" because the metric is still not there. Then, it 
sends an alert with the state "resolved" and only then it finally becomes 
actually resolved. Seems like the magic happens here: 
https://github.com/prometheus/prometheus/blob/master/rules/alerting.go#L103-L106.
 
I would imagine that in such a scenario we should depend on AlertManager 
resolving the alert automatically for us after some time to get a 
"consistent" state.

Any thoughts on this or perhaps I am missing something?

BR,
Giedrius

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b2de936f-a79e-40c0-80be-0452e52980a8o%40googlegroups.com.

Reply via email to