On 08/06/2020 08:23, Brian Brazil wrote:
As Ben said this is a case for avg_over_time or max_over_time. Looking at just the last point would be too fragile, and once an alert fires adding additional semantics is only rearranging the deckchairs. See https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 and https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped
I understand, however I am still unconvinced by the asymmetry: an rule has to be firing "for:" X minutes before an alert is triggered, but if it dips below the threshold for one evaluation cycle then it's immediately cleared.
If the use of avg_over_time or max_over_time was sufficient, there would be no need for the "for:" clause.
-- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/3106de90-3d17-917f-3cd3-adaad4649674%40pobox.com.

