Glad it makes sense now. It was definitely a bump in the learning curve for 
me :-)

Regards, Brian.

On Friday, 4 March 2022 at 10:00:12 UTC Federico Buti wrote:

> Hi Brian.
>
> Thanks for the super-deep dive into the topic! This is simply awesome. And 
> sorry for the mails mismatch...too many mail accounts! :-D
>
> On Fri, 4 Mar 2022 at 09:46, Brian Candler <b.ca...@pobox.com> wrote:
>
>> > Assuming the second metric goes missing how is the binary expression 
>> evaluated exactly?
>>
>> The same as it always is.  Remember that the left-hand side and the 
>> right-hand side are both vectors, containing zero or more values, each 
>> value having a distinct set of labels. Noting the documentation here 
>> <https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators>
>> :
>>
>> *    vector1*
>> * and vector2 results in a vector consisting of the elements 
>> of vector1 for which there are elements in vector2 with exactly matching 
>> label sets. Other elements are dropped. The metric name and values are 
>> carried over from the left-hand side vector.*
>>
>> Therefore, if the RHS of "and" is an empty vector, then the result of the 
>> entire "and" expression is an empty vector - since there is nothing in 
>> vector2 for vector1 to match.
>>
>> > In the "normal" case, i.e. "foo and bar" we would not have points but 
>> in the case of "absent(foo) and bar", from my tests, it seems to me the 
>> "bar" filtering is simply ignored.
>>
>> I don't understand what mean by that. Can you give examples of the LHS 
>> and the RHS vectors, and the combined expression, which don't behave how 
>> you expect?
>>
>
> I was referring to "absent(foo) and bar", which was the source of my 
> original question. On the surface it seemed to me that  LHS was firing 
> even though RHS was empty. But your detailed explanation below forced me to 
> double-check again in the expression browser and now I see the RHS wasn't 
> really empty as I first (erroneously) reported. Which matches the 
> documentation you mentioned and makes everything click perfectly in my 
> head. Was dumb of me, but I guess stuff happens. Thanks a lot.
>
>
>
> Note that "foo and bar" and "absent(foo) and bar" will both be empty if 
>> bar is empty, as just described.
>>
>> "absent(foo)" is an unusual function:
>> - if the input vector has one or more values, i.e. any non-empty vector, 
>> its output is an empty vector (no values)
>> - if the input vector is empty, its output is one-element vector with a 
>> single value "1". The label set of that value depends on the exact form of 
>> the expression inside the parentheses; it tries to do "the right thing" but 
>> at worst you could have value 1 with empty label set {}
>>
>> In your case,
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"})
>>
>> will return
>>     {environment="pro",service="bar",stack="foo"} 1
>>
>> i.e. a single-element vector with empty metric name, those labels, and 
>> the value 1.
>>
>> Going back to the whole original expression:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(stack, environment) up{service="bar",source="app"} == 1
>>
>> ISTM that is saying you want to generate an alert if 
>> our_metric{environment="pro",service="bar",stack="foo"} is missing, but 
>> only if metric up{service="bar",source="app"} exists *and* has value 1. 
>> That means the alert is suppressed if either:
>> (a) up{service="bar",source="app"} exists but its value is not 1
>> (b) up{service="bar",source="app"} does not exist - i.e. that expression 
>> returns an empty vector. ("up" is a special metric in prometheus; if it 
>> doesn't exist, it means there is no configured scrape job with those labels)
>>
>
> Yes, I was interested in having (a). Then yesterday we experienced (b) 
> because of a provision problem and I wrote to the list to understand that 
> case better. Just to improve my knowledge. We do NOT want disappearance of 
> targets which would lead to (b) ofc, but that is an investigation we are 
> doing on our side to avoid the problem in the future.
>  
>
>
> If that's not what you want, then think about what you actually want, and 
>> then how to express that.  For example, if you want to suppress the alert 
>> in case (a) but not in case (b), then you can do this:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) 
>> unless on(stack, environment) up{service="bar",source="app"} != 1
>>
>> ------
>>
>
> Cool! I've always struggled a bit with "unless" but I can totally give it 
> a go for this case. As I should have mentioned I want to move away from the 
> absent altogether but that is something is not going to happen soon due to 
> the way the exporter is written atm, unfortunately.
>  
>
>
> If you don't mind, I will make an observation about the use of "and 
>> on(...)".  Since the LHS and RHS are vectors, an expression needs to 
>> identify corresponding values in the LHS vector and the RHS vector, to 
>> generate a vector of results. The on(...) part is when the LHS and RHS 
>> vectors don't have exactly the same label sets, and you need to ignore some 
>> when matching them up. I think you know all this already.
>>
>> I find your expression rather confusing, because:
>> - we know that any values in the LHS vector must have labels 
>> {environment="pro",service="bar",stack="foo"}
>> - we know that any values in the RHS vector must have labels 
>> {service="bar",source="app"}
>> - "on(stack,environment)" says to pair up LHS and RHS values where the 
>> "stack" and "environment" labels match
>> - therefore, the RHS vector must also have stack="foo" and 
>> environment="pro"
>> - as this a one-to-one vector match: it will fail if a particular pair of 
>> (stack,environment) labels returns multiple values for the LHS and one or 
>> more for the RHS, or vice versa. Therefore we know (stack,environment) must 
>> be a unique match for a given service (*)
>>
>> Therefore, implicitly I think all of (environment, service, stack) must 
>> match, i.e. this expression is the same as:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(environment, service, stack) 
>> up{environment="pro",service="bar",stack="foo",source="app"} == 1
>>
>> And this can be simplified to:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(environment, service, stack) up{source="app"} == 1
>>
>> I find the second version easier to read and reason about, because the 
>> environment/service/stack matching is all in one place, but you may 
>> disagree :-)
>>
>
> Not really sure why I should disagree here! :-D
> This is a great insight and a source of reflection for us to improve our 
> rule set. We have a few binary expressions using "and" for which the 
> reasoning applied here could be taken in account. If anything it 
> simplifies/shortens the expression a lot, which is always a plus, imo.
>
> Thanks a lot for your huge help!
> F.
>
>
>
>
> (*) This does provide another reason why an alert could fail to trigger.  
>> If the "and" expression returns multiple values for the same 
>> (stack,environment) pair on either the LHS or the RHS, with at least one 
>> match on the other side, then the whole expression will generate an error.
>>
>> However, I think it's unlikely in this particular case. We know the LHS 
>> can only possibly return a single-element vector, so this error condition 
>> could only occur if up{service="bar",source="app"} == 1 returns multiple 
>> values with the same pair of (stack,environment) labels. That is, it would 
>> only be a problem if you had something like this:
>> up{environment="pro",service="bar",stack="foo",source="app",xxx="yyy"} 1
>> up{environment="pro",service="bar",stack="foo",source="app",xxx="zzz"} 1
>>
>> On Friday, 4 March 2022 at 07:23:16 UTC baca...@gmail.com wrote:
>>
>>> Hi Brian,
>>>
>>> thanks a lot for your reply.
>>>
>>> I re-read my original mail and I recognize I should have probably 
>>> delivered less information and went straight to the point. That probably 
>>> created a bit of confusion. E.g. I never intended the up metric - or any 
>>> other metric - to be considered a boolean. My bad. I'll try to get straight 
>>> to the point this time.
>>>
>>> >This is *not* boolean.  Rather, it takes the vector of timeseries "foo" 
>>> and matches them up with the vector of timeseries "bar".  All those 
>>> elements of foo which have exactly matching label >sets with bar, are 
>>> passed through unchanged.  Anything else is dropped.
>>>
>>> Right, and my question is the following. Mostly to understand the 
>>> underlining behaviour, not because I have any particular problem to resolve.
>>> Assuming the second metric goes missing how is the binary expression 
>>> evaluated exactly? In the "normal" case, i.e. "foo and bar" we would not 
>>> have points but in the case of "absent(foo) and bar", from my tests, it 
>>> seems to me the "bar" filtering is simply ignored.
>>>
>>> I can guess that is because "absent" is not really a metric per se and 
>>> thus we are comparing two empty sets of labels - effectively reducing 
>>> "absent(foo) and bar" to "absent(foo)".
>>> I'd say, it would make sort of sense, right?
>>>
>>> Cheers,
>>> F.
>>>
>>> On Thursday, 3 March 2022 at 17:01:29 UTC+1 Brian Candler wrote:
>>>
>>>> You can use the PromQL browser in the prometheus web UI to debug this, 
>>>> since you can view the value of an expression at any previous point in 
>>>> time.
>>>>
>>>> Try the two halves separately:
>>>>
>>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) 
>>>>
>>>> up{service="bar",source="app"} == 1
>>>>
>>>> Then try the whole expression at that point in time.  Either view the 
>>>> graph, or view the instant query and set the instant time to when there 
>>>> was 
>>>> a problem.
>>>>
>>>> > As the node went missing the second operand of the binary operator 
>>>> could not be evaluated, simply because it was neither `1`, nor `0`
>>>>
>>>> The expression:
>>>>     up{service="bar",source="app"} == 1
>>>> can only ever have the value 1 or be missing.  metric == constant is a 
>>>> filter, not a boolean.  The value it returns is the value of the LHS, or 
>>>> no 
>>>> value if the filter condition is not met.
>>>>
>>>> Possibly you want to remove the "== 1" entirely:
>>>>
>>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>>>> on(stack, environment) up{service="bar",source="app"}
>>>>
>>>> "and" expressions behave in a corresponding way:
>>>>
>>>>     foo and bar
>>>>
>>>> This is *not* boolean.  Rather, it takes the vector of timeseries "foo" 
>>>> and matches them up with the vector of timeseries "bar".  All those 
>>>> elements of foo which have exactly matching label sets with bar, are 
>>>> passed 
>>>> through unchanged.  Anything else is dropped.
>>>>
>>>> So it's just a filter: "give me all values of foo, where there is also 
>>>> a value present for bar".  It does not have true/false values either as 
>>>> its 
>>>> input or its output.
>>>>
>>>> > Or, in other words, the following was holding true:
>>>> > 
>>>> > absent(up{service="bar",source="app"}) = 1
>>>>
>>>> How do you know?  The "up" metric is always present for a target, 
>>>> whether or not scraping is successful: it would only not be present if you 
>>>> removed the target from the scrape job.  This could be the case if you are 
>>>> using some dynamic service discovery, and the service went away.  But then 
>>>> your real problem is how to stop services vanishing from service discovery.
>>>>
>>>> Anyway, you can tell for sure by looking at historical values of these 
>>>> queries:
>>>>
>>>> up{service="bar",source="app"}
>>>> absent(up{service="bar",source="app"})
>>>>
>>>>
>>>> On Thursday, 3 March 2022 at 11:12:11 UTC Federico Buti wrote:
>>>>
>>>>> Hi list,
>>>>>
>>>>> For a monitored system we setup a rule as follows:
>>>>>
>>>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>>>>> on(stack, environment) up{service="bar",source="app"} == 1
>>>>>
>>>>> This is one of the few absence rules we have in our ruleset. This is 
>>>>> also a bit special because the exporter uses the absence of the metric to 
>>>>> indicate a problem - something that is discouraged from guidelines. But 
>>>>> that goes beyond my question anyway.
>>>>>
>>>>> Using a binary AND operator seems to work fine, cutting out the cases 
>>>>> in which the node is not scrapable. However this morning the node went 
>>>>> missing. We had probably a misconfiguration in our provisioning which we 
>>>>> are currently investigating.
>>>>>
>>>>> As the node went missing the second operand of the binary operator 
>>>>> could not be evaluated, simply because it was neither `1`, nor `0`. Or, 
>>>>> in 
>>>>> other words, the following was holding true:
>>>>>
>>>>> absent(up{service="bar",source="app"}) = 1
>>>>>
>>>>> I understand an alert can resolve if the related metric goes stale but 
>>>>> I'm not sure how the logic should translate in this case. On the surface 
>>>>> I 
>>>>> would not expect the AND expression to fire as we are not able to say the 
>>>>> "up" metric is really 1.
>>>>>
>>>>> But maybe I'm missing the point here?
>>>>>
>>>>> Thanks in advance,
>>>>> F.
>>>>>
>>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Prometheus Users" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/prometheus-users/pyTVLNKp3XM/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/f24239ac-aa22-4b1e-bcd9-92861bfa2976n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/f24239ac-aa22-4b1e-bcd9-92861bfa2976n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7485f8ca-2304-4d3c-81fe-a38b3a1d80f9n%40googlegroups.com.

Reply via email to