I know this cannot be called as a Bug, but I find it a little odd that you cannot know the value that it dropped to in your alert once it has resolved.
On Saturday, April 18, 2020 at 7:56:47 PM UTC+5:30, Yagyansh S. Kumar wrote: > > Thanks a lot for the detailed explantion, Brain. > I guess I need to monitor the resolved alerts a bit more closely and then > take a call. > > On Saturday, April 18, 2020 at 3:16:56 PM UTC+5:30, Brian Candler wrote: >> >> I can see two possible issues here. >> >> Firstly, the value of the annotation you see in the resolved messsage is >> always the value at the time *before* the alert resolved, not the value >> which is now below the threshold. >> >> Let me simplify your expression to: >> >> foo > 85 >> >> This is a PromQL filter. In general there could be many timeseries for >> metric "foo". If you have ten timeseries, and two of them have values over >> 85, then the result of this expression is those two timeseries, with their >> labels and those two values above 85. But if all the timeseries are below >> 85, then this expression returns no timeseries, and therefore it has no >> values. >> >> So: suppose one "foo" timeseries goes up to 90 for long enough to trigger >> the alert (for: 2m). You will get an alert with annotation: >> >> description: Current value = 90 >> >> Maybe then it goes up to 95 for a while. You don't get a new >> notification except in certain circumances (group_interval etc). >> >> When the value of foo drops below the threshold, say to 70, then the >> alert ceases to exist. Alertmanager sends out a "resolved" message with >> all the labels and annotations of the alert as it was *when it last >> existed*, i.e. >> >> description: Current value = 95 >> >> There's nothing else it can do. The "expr" in the alerting rule returns >> no timeseries, which means no values and no labels. You can't create an >> annotation for an alert that doesn't exist. >> >> It's for this reason that I removed all my alert annotations which had >> $value in them, since the Resolved messages are confusing. However you >> could instead change them to something more verbose, e.g. >> >> description: Most recent triggering value = 95 >> >> The second issue is, is it possible the value dipped below the threshold >> for one rule evaluation interval? >> >> Prometheus does debouncing in one direction (the alert must be constantly >> active "for: 2m" before it goes from Pending into Firing), but not in the >> other direction. A single dip below the threshold and it will resolve >> immediately, and then it could go into Pending then Firing again. You >> would see that as a resolved followed by a new alert. >> >> There is a closed issue for alertmanager debouncing / flap detection here: >> https://github.com/prometheus/alertmanager/issues/204 >> >> Personally I think prometheus itself should have a "Resolving" state >> analogous to "Pending", so a brief trip below the threshold doesn't >> instantly resolve - but like I say, that issue is closed. >> >> HTH, >> >> Brian. >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d19b530a-5776-47fc-9dab-70922a66e848%40googlegroups.com.

