On Mon, 8 Jun 2020 at 08:47, Brian Candler <[email protected]> wrote:
> On 08/06/2020 08:23, Brian Brazil wrote: > > As Ben said this is a case for avg_over_time or max_over_time. Looking > > at just the last point would be too fragile, and once an alert fires > > adding additional semantics is only rearranging the deckchairs. See > > https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 and > > > > https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped > > I understand, however I am still unconvinced by the asymmetry: an rule > has to be firing "for:" X minutes before an alert is triggered, but if > it dips below the threshold for one evaluation cycle then it's > immediately cleared. > > If the use of avg_over_time or max_over_time was sufficient, there would > be no need for the "for:" clause. > Not quite, for and *_over_time do different things. For example consider a brand new target, avg_over_time could fire instantly off a single sample whereas for on top of that gives us time for a bit more history to build up. There's a few other races that for helps with, and in general I'd use a for of at least 5m just for the sake of reducing false positives. When working with gauges you need both. -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLoL-7UhbydQAnZLtqEUwxgNbyUWQGye0p%2BHFXfTPfQdGw%40mail.gmail.com.

