I can see two possible issues here.

Firstly, the value of the annotation you see in the resolved messsage is 
always the value at the time *before* the alert resolved, not the value 
which is now below the threshold.

Let me simplify your expression to:

    foo > 85

This is a PromQL filter.  In general there could be many timeseries for 
metric "foo".  If you have ten timeseries, and two of them have values over 
85, then the result of this expression is those two timeseries, with their 
labels and those two values above 85.  But if all the timeseries are below 
85, then this expression returns no timeseries, and therefore it has no 
values.

So: suppose one "foo" timeseries goes up to 90 for long enough to trigger 
the alert (for: 2m).  You will get an alert with annotation:

description: Current value = 90

Maybe then it goes up to 95 for a while.  You don't get a new notification 
except in certain circumances (group_interval etc).

When the value of foo drops below the threshold, say to 70, then the alert 
ceases to exist.  Alertmanager sends out a "resolved" message with all the 
labels and annotations of the alert as it was *when it last existed*, i.e.

description: Current value = 95

There's nothing else it can do.  The "expr" in the alerting rule returns no 
timeseries, which means no values and no labels.  You can't create an 
annotation for an alert that doesn't exist.

It's for this reason that I removed all my alert annotations which had 
$value in them, since the Resolved messages are confusing.  However you 
could instead change them to something more verbose, e.g.

description: Most recent triggering value = 95

The second issue is, is it possible the value dipped below the threshold 
for one rule evaluation interval?

Prometheus does debouncing in one direction (the alert must be constantly 
active "for: 2m" before it goes from Pending into Firing), but not in the 
other direction. A single dip below the threshold and it will resolve 
immediately, and then it could go into Pending then Firing again.  You 
would see that as a resolved followed by a new alert.

There is a closed issue for alertmanager debouncing / flap detection here:
https://github.com/prometheus/alertmanager/issues/204

Personally I think prometheus itself should have a "Resolving" state 
analogous to "Pending", so a brief trip below the threshold doesn't 
instantly resolve - but like I say, that issue is closed.

HTH,

Brian.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6112779d-7e79-45f9-8fd7-6e73236651fa%40googlegroups.com.

Reply via email to