Re: [prometheus-users] Re: Alert for high-frequency changes of a metric?

Moses Moore Mon, 02 Mar 2020 07:49:51 -0800

>Did you try something like "changes(foo[30m]) > 10" ?
> That would alert if the value changed 10 times in the last 30 minutes.

Good guess. If it were a boolean ([0, 1]) then your idea would detectflip-flopping (flapping?) over time.

A metric can be healthy while changing constantly, but gently -- asmoothly increasing slope or a long sine-wave. I'm hoping to detecthigh frequency of deviant amplitudes of change. Think of a temperaturesensor that rises and falls over the course of a day, but when somethingoverheats it will spike, then the device downgrades until thetemperature is under a threshold, then spikes again as it triesoperating at full capacity... every time it drops into the "healthy"range, so an alert looking merely for exceeding a threshold will neverleave the alertstate="pending".

Really wish I could post a picture to illustrate the pattern I'm lookingfor. It's obvious to a human eye.

Maybe something like a combination of changes() and rate() ? with some*_over_time aggregation? Seems too complex, I hope I'm overthinking it.


On 2020-02-28 11:26 a.m., Łukasz Mierzwa wrote:


On Friday, 28 February 2020 16:21:53 UTC, Moses Moore wrote:

    (Looks like my previous ask of this question got spamblocked
    because I included a screenshot.  c'est la vie.)

    I have alerts for when a metric's value passes above or below a
    threshold.  I can ask for the minimum or maximum over a time
    range, I can as for a prediction based on the slope of a graph.

    I have some resources that I know will fail soon after their
    metrics fluctuate wildly over a short period of time.  They may
    never exceed the absolute value of 85% during their fluctuations,
    or they may exceed this briefly but not long enough to cause
    concern if it was a smooth line.  I.E.  If the samples over time
    were [30, 30, 31, 70, 5, 69, 6, 71, 5, 69, null, null, null]  I
    want to detect it before the metric goes absent (because the
    resource crashed).

    Setting the threshold at ">69" doesn't work because the value
    drops below the threshold on the next scrape, closing the alert;
    besides, if it were at a steady 69 that would be healthy.
    Setting the threshold at "avg(metric[interval)" doesn't work
    because the average of an oscillating metric will be well within
    the healthy range.
    I thought of setting an alert for "max_over_time - min_over_time >
    50" but that would trigger on a smooth ascension -- a false positive.

    What's the question should I ask Prometheus to detect a metric
    that vibrates too much?

--

You received this message because you are subscribed to a topic in theGoogle Groups "Prometheus Users" group.To unsubscribe from this topic, visithttps://groups.google.com/d/topic/prometheus-users/6BCaoU4WCS8/unsubscribe.To unsubscribe from this group and all its topics, send an email to[email protected]<mailto:[email protected]>.To view this discussion on the web visithttps://groups.google.com/d/msgid/prometheus-users/5f7ef3b8-a311-4d98-8bcb-2594e6aaef80%40googlegroups.com<https://groups.google.com/d/msgid/prometheus-users/5f7ef3b8-a311-4d98-8bcb-2594e6aaef80%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9ab78b23-8eee-a201-213b-8ed979f237e9%40gmail.com.

Re: [prometheus-users] Re: Alert for high-frequency changes of a metric?

Reply via email to