[prometheus-users] Re: Alarm Refresh

Brian Candler Tue, 31 Aug 2021 00:39:31 -0700

If an alert goes away, even for one rule evaluation cycle, it's immediately 
resolved.  I'm guessing this is what has happened here.  You can prove it 
by entering the alerting expression in the PromQL browser in the prometheus 
web UI, graphing it over the time when this was happening, and seeing if 
the alert value goes away briefly.

Personally I would love to see alerts go into a "resolving" state so that 
alerts which are mostly "fail" with occasional "success" or "don't know" 
don't keep re-alerting.  There is some discussion here:
https://github.com/prometheus/alertmanager/issues/204
(although if the feature were implemented as I just described, then it 
would be implemented in prometheus rather than alertmanager)

For now, it's up to you to write more complex alerting rules using history, 
such as (average|sum|count|min|max)_over_time with a range vector, so that 
the alerts stay firing.

On Tuesday, 31 August 2021 at 03:49:26 UTC+1 [email protected] wrote:

> What did you do?
> I use Prometheus. Due to resource problems, some indicators always reach 
> the alarm threshold
>
> What did you expect to see?
> I hope that some alarms will continue to be reported without interruption
>
> What did you see instead? Under which circumstances?
> After a period of time, some of the alarms were interrupted, and the alarm 
> was re-alerted a few minutes later. In this process, the indicator has 
> always reached the alarm threshold
>
> Environment
> A multi-node federated cluster
>
> Prometheus version:
> 2.15  
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b3e89869-1b46-413f-b80e-34cab93fda8bn%40googlegroups.com.

[prometheus-users] Re: Alarm Refresh

Reply via email to