Re: [prometheus-users] Discrepancy in Resolved Alerts.

Yagyansh S. Kumar Sat, 18 Apr 2020 07:28:21 -0700

I know this cannot be called as a Bug, but I find it a little odd that you 
cannot know the value that it dropped to in your alert once it has resolved.


On Saturday, April 18, 2020 at 7:56:47 PM UTC+5:30, Yagyansh S. Kumar wrote:
>
> Thanks a lot for the detailed explantion, Brain.
> I guess I need to monitor the resolved alerts a bit more closely and then 
> take a call. 
>
> On Saturday, April 18, 2020 at 3:16:56 PM UTC+5:30, Brian Candler wrote:
>>
>> I can see two possible issues here.
>>
>> Firstly, the value of the annotation you see in the resolved messsage is 
>> always the value at the time *before* the alert resolved, not the value 
>> which is now below the threshold.
>>
>> Let me simplify your expression to:
>>
>>     foo > 85
>>
>> This is a PromQL filter.  In general there could be many timeseries for 
>> metric "foo".  If you have ten timeseries, and two of them have values over 
>> 85, then the result of this expression is those two timeseries, with their 
>> labels and those two values above 85.  But if all the timeseries are below 
>> 85, then this expression returns no timeseries, and therefore it has no 
>> values.
>>
>> So: suppose one "foo" timeseries goes up to 90 for long enough to trigger 
>> the alert (for: 2m).  You will get an alert with annotation:
>>
>> description: Current value = 90
>>
>> Maybe then it goes up to 95 for a while.  You don't get a new 
>> notification except in certain circumances (group_interval etc).
>>
>> When the value of foo drops below the threshold, say to 70, then the 
>> alert ceases to exist.  Alertmanager sends out a "resolved" message with 
>> all the labels and annotations of the alert as it was *when it last 
>> existed*, i.e.
>>
>> description: Current value = 95
>>
>> There's nothing else it can do.  The "expr" in the alerting rule returns 
>> no timeseries, which means no values and no labels.  You can't create an 
>> annotation for an alert that doesn't exist.
>>
>> It's for this reason that I removed all my alert annotations which had 
>> $value in them, since the Resolved messages are confusing.  However you 
>> could instead change them to something more verbose, e.g.
>>
>> description: Most recent triggering value = 95
>>
>> The second issue is, is it possible the value dipped below the threshold 
>> for one rule evaluation interval?
>>
>> Prometheus does debouncing in one direction (the alert must be constantly 
>> active "for: 2m" before it goes from Pending into Firing), but not in the 
>> other direction. A single dip below the threshold and it will resolve 
>> immediately, and then it could go into Pending then Firing again.  You 
>> would see that as a resolved followed by a new alert.
>>
>> There is a closed issue for alertmanager debouncing / flap detection here:
>> https://github.com/prometheus/alertmanager/issues/204
>>
>> Personally I think prometheus itself should have a "Resolving" state 
>> analogous to "Pending", so a brief trip below the threshold doesn't 
>> instantly resolve - but like I say, that issue is closed.
>>
>> HTH,
>>
>> Brian.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d19b530a-5776-47fc-9dab-70922a66e848%40googlegroups.com.

Re: [prometheus-users] Discrepancy in Resolved Alerts.

Reply via email to