>Did you try something like "changes(foo[30m]) > 10" ?
> That would alert if the value changed 10 times in the last 30 minutes.
Good guess. If it were a boolean ([0, 1]) then your idea would detect
flip-flopping (flapping?) over time.
A metric can be healthy while changing constantly, but gently -- a
smoothly increasing slope or a long sine-wave. I'm hoping to detect
high frequency of deviant amplitudes of change. Think of a temperature
sensor that rises and falls over the course of a day, but when something
overheats it will spike, then the device downgrades until the
temperature is under a threshold, then spikes again as it tries
operating at full capacity... every time it drops into the "healthy"
range, so an alert looking merely for exceeding a threshold will never
leave the alertstate="pending".
Really wish I could post a picture to illustrate the pattern I'm looking
for. It's obvious to a human eye.
Maybe something like a combination of changes() and rate() ? with some
*_over_time aggregation? Seems too complex, I hope I'm overthinking it.
On 2020-02-28 11:26 a.m., Łukasz Mierzwa wrote:
On Friday, 28 February 2020 16:21:53 UTC, Moses Moore wrote:
(Looks like my previous ask of this question got spamblocked
because I included a screenshot. c'est la vie.)
I have alerts for when a metric's value passes above or below a
threshold. I can ask for the minimum or maximum over a time
range, I can as for a prediction based on the slope of a graph.
I have some resources that I know will fail soon after their
metrics fluctuate wildly over a short period of time. They may
never exceed the absolute value of 85% during their fluctuations,
or they may exceed this briefly but not long enough to cause
concern if it was a smooth line. I.E. If the samples over time
were [30, 30, 31, 70, 5, 69, 6, 71, 5, 69, null, null, null] I
want to detect it before the metric goes absent (because the
resource crashed).
Setting the threshold at ">69" doesn't work because the value
drops below the threshold on the next scrape, closing the alert;
besides, if it were at a steady 69 that would be healthy.
Setting the threshold at "avg(metric[interval)" doesn't work
because the average of an oscillating metric will be well within
the healthy range.
I thought of setting an alert for "max_over_time - min_over_time >
50" but that would trigger on a smooth ascension -- a false positive.
What's the question should I ask Prometheus to detect a metric
that vibrates too much?
--
You received this message because you are subscribed to a topic in the
Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/prometheus-users/6BCaoU4WCS8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/5f7ef3b8-a311-4d98-8bcb-2594e6aaef80%40googlegroups.com
<https://groups.google.com/d/msgid/prometheus-users/5f7ef3b8-a311-4d98-8bcb-2594e6aaef80%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/9ab78b23-8eee-a201-213b-8ed979f237e9%40gmail.com.