firing

Brian Candler Thu, 07 May 2020 01:20:03 -0700

Firstly, comparison operators 
<https://prometheus.io/docs/prometheus/latest/querying/operators/#comparison-binary-operators>
 
don't work the way you imagine.  They are more like filters.  The 
expression "foo" is a vector of zero or more timeseries all with the metric 
name "foo".  So for example:

foo >= 80

returns all the timeseries for metric "foo" whose value is >= 80. If none
of the timeseries have this value, it returns nothing. Try it in the
PromQL browser in prometheus, and look at the graph view: you'll see
timeseries values at the times where they are over 80, and gaps where they
are below.

To filter to a range is therefore easy: you filter the results of the
filter.

foo >= 80 < 95

Secondly, an alert is generated if the timeseries is present with any
value. If there's no value, there's no alert. You can think of it as the
presence of any value is treated as "true" from the point of view of
generating an alert.

Thirdly:

expr: avg(...)
for: 5m

does not mean "taking average for 5 min" as you said. What it means is:

- the expression is tested every 1 minute (your "evaluation_interval" for
the rule group - defaults to global evaluation interval if not set)
- if the expression returns a value *every time* over a 5 minute period
(i.e. for 6 evaluations consecutively), the alert is generated
- if there are any gaps, the alert is not generated

Fourthly, the AND, OR and UNLESS logical operators don't work how you
imagine either; they are documented here
<https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators>.

For example:

foo AND bar

returns all the timeseries for metric "foo" for which there is a metric
"bar" with an exactly matching label set (disregarding the value of "bar").

Filling in "default" values is not straightforward, because a metric like
"foo" refers to a variable set of timeseries - each combination of labels
is a different timeseries, and these can come and go over time. So what
you need is some other metric which you know is always present with the
same set of labels, and can be used to force the missing value. For
example,

foo OR ((up * 0 + 1)

The metric "up" is generated on every scrape, with the value 1 if scrape is
successful and 0 if not successful, so it reflects all the labels in your
scrape job plus the "job" and "instance" labels added automatically. If
your metric foo has the same set of labels, then the expression above will
fill in gaps with the value 1.

For more information see:
https://www.robustperception.io/existential-issues-with-metrics
https://www.robustperception.io/left-joins-in-promql

However I *strongly* recommend you play around with this in the PromQL
expression browser - and try not to be distracted by pre-existing ideas
about how booleans work. Prometheus expressions work with vectors (i.e.
multiple timeseries with different labels), not individual values.

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/8973ffd1-0565-45aa-8774-7bf62417d131%40googlegroups.com.

Re: [prometheus-users] Re: Alert manager looping in firing -> resolved -> firing

Reply via email to