>Did you try something like "changes(foo[30m]) > 10" ?
> That would alert if the value changed 10 times in the last 30 minutes.

Good guess.  If it were a boolean ([0, 1]) then your idea would detect flip-flopping (flapping?) over time.

A metric can be healthy while changing constantly, but gently -- a smoothly increasing slope or a long sine-wave.  I'm hoping to detect high frequency of deviant amplitudes of change.  Think of a temperature sensor that rises and falls over the course of a day, but when something overheats it will spike, then the device downgrades until the temperature is under a threshold, then spikes again as it tries operating at full capacity... every time it drops into the "healthy" range, so an alert looking merely for exceeding a threshold will never leave the alertstate="pending".

Really wish I could post a picture to illustrate the pattern I'm looking for.  It's obvious to a human eye.

Maybe something like a combination of changes() and rate() ? with some *_over_time aggregation?  Seems too complex, I hope I'm overthinking it.

On 2020-02-28 11:26 a.m., Łukasz Mierzwa wrote:

On Friday, 28 February 2020 16:21:53 UTC, Moses Moore wrote:

    (Looks like my previous ask of this question got spamblocked
    because I included a screenshot.  c'est la vie.)

    I have alerts for when a metric's value passes above or below a
    threshold.  I can ask for the minimum or maximum over a time
    range, I can as for a prediction based on the slope of a graph.

    I have some resources that I know will fail soon after their
    metrics fluctuate wildly over a short period of time.  They may
    never exceed the absolute value of 85% during their fluctuations,
    or they may exceed this briefly but not long enough to cause
    concern if it was a smooth line.  I.E.  If the samples over time
    were [30, 30, 31, 70, 5, 69, 6, 71, 5, 69, null, null, null]  I
    want to detect it before the metric goes absent (because the
    resource crashed).

    Setting the threshold at ">69" doesn't work because the value
    drops below the threshold on the next scrape, closing the alert;
    besides, if it were at a steady 69 that would be healthy.
    Setting the threshold at "avg(metric[interval)" doesn't work
    because the average of an oscillating metric will be well within
    the healthy range.
    I thought of setting an alert for "max_over_time - min_over_time >
    50" but that would trigger on a smooth ascension -- a false positive.

    What's the question should I ask Prometheus to detect a metric
    that vibrates too much?

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/6BCaoU4WCS8/unsubscribe. To unsubscribe from this group and all its topics, send an email to [email protected] <mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5f7ef3b8-a311-4d98-8bcb-2594e6aaef80%40googlegroups.com <https://groups.google.com/d/msgid/prometheus-users/5f7ef3b8-a311-4d98-8bcb-2594e6aaef80%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9ab78b23-8eee-a201-213b-8ed979f237e9%40gmail.com.

Reply via email to