Re: [prometheus-users] Re: missing data point make alert resolved

Brian Brazil Mon, 08 Jun 2020 00:54:17 -0700

On Mon, 8 Jun 2020 at 08:47, Brian Candler <[email protected]> wrote:


> On 08/06/2020 08:23, Brian Brazil wrote:
> > As Ben said this is a case for avg_over_time or max_over_time. Looking
> > at just the last point would be too fragile, and once an alert fires
> > adding additional semantics is only rearranging the deckchairs. See
> > https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 and
>
> >
> https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped
>
> I understand, however I am still unconvinced by the asymmetry: an rule
> has to be firing "for:" X minutes before an alert is triggered, but if
> it dips below the threshold for one evaluation cycle then it's
> immediately cleared.
>
> If the use of avg_over_time or max_over_time was sufficient, there would
> be no need for the "for:" clause.
>

Not quite, for and *_over_time do different things. For example consider a
brand new target, avg_over_time could fire instantly off a single sample
whereas for on top of that gives us time for a bit more history to build
up. There's a few other races that for helps with, and in general I'd use a
for of at least 5m just for the sake of reducing false positives. When
working with gauges you need both.

-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLoL-7UhbydQAnZLtqEUwxgNbyUWQGye0p%2BHFXfTPfQdGw%40mail.gmail.com.

Re: [prometheus-users] Re: missing data point make alert resolved

Reply via email to