Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-04 Thread Dan Schanler
Thanks! Very much appreciated

On Mon, Oct 4, 2021, 21:34 Brian Candler  wrote:

> On Monday, 4 October 2021 at 19:25:01 UTC+1 Dan S wrote:
>
>> Brian, even better - great.
>>
>> Now that you mentioned how count() returns no labels - it relates to
>> another alert rule I was trying to implement.
>> If I wanted to alert anytime a counter is incremented (and have it self
>> resolve after x time), this seems to do it:
>>
>> *count(exception_total) - count(exception_total offset 1h) *
>> {}   0
>>
>
> At first glance, that expression will always alert, so you'll want to wrap
> it in () > 0
>
> But are you sure you want "count" there?  It implies that you will get
> multiple *timeseries* for exception_total.  If it's a single metric, then
> you want
>
> (metric_total - metric_total offset 1h) > 0
>
>
>
>> above returns a zero value when it has been incremented but no labels or
>> useful results otherwise, but this other query I happened upon returns
>> labels and I don't understand why
>>
>> *exception_total unless exception_total offset 1h*
>>
>> exception_total{pod="x"} 1
>> exception_total{pod="y"} 1
>> exception_total{pod="z"} 1
>>
>>
> Compare these two expressions separately:
>
> (A) exception_total
>
> (B) exception_total offset 1h
>
> You'll only get a result if (A) has a timeseries but (B) has no
> corresponding timeseries (meaning with exactly the same labels).
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/prometheus-users/gi8GtrBMMMk/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/8be84911-6b2f-4fdc-a7e7-007ea54de5dan%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAOdST4_4nvtgFA3A2jDa70%3DXnp3K-TdUiP4myORyAWVGhRCaVA%40mail.gmail.com.


Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-04 Thread Brian Candler
On Monday, 4 October 2021 at 19:25:01 UTC+1 Dan S wrote:

> Brian, even better - great.
>
> Now that you mentioned how count() returns no labels - it relates to 
> another alert rule I was trying to implement.
> If I wanted to alert anytime a counter is incremented (and have it self 
> resolve after x time), this seems to do it:
>
> *count(exception_total) - count(exception_total offset 1h) *
> {}   0
>

At first glance, that expression will always alert, so you'll want to wrap 
it in () > 0
 
But are you sure you want "count" there?  It implies that you will get 
multiple *timeseries* for exception_total.  If it's a single metric, then 
you want

(metric_total - metric_total offset 1h) > 0

 

> above returns a zero value when it has been incremented but no labels or 
> useful results otherwise, but this other query I happened upon returns 
> labels and I don't understand why
>
> *exception_total unless exception_total offset 1h*
>
> exception_total{pod="x"} 1
> exception_total{pod="y"} 1
> exception_total{pod="z"} 1
>
>
Compare these two expressions separately:

(A) exception_total

(B) exception_total offset 1h

You'll only get a result if (A) has a timeseries but (B) has no 
corresponding timeseries (meaning with exactly the same labels).

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8be84911-6b2f-4fdc-a7e7-007ea54de5dan%40googlegroups.com.


Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-04 Thread Dan Schanler
Brian, even better - great.

Now that you mentioned how count() returns no labels - it relates to
another alert rule I was trying to implement.
If I wanted to alert anytime a counter is incremented (and have it self
resolve after x time), this seems to do it:

*count(exception_total) - count(exception_total offset 1h) *
{}   0

above returns a zero value when it has been incremented but no labels or
useful results otherwise, but this other query I happened upon returns
labels and I don't understand why

*exception_total unless exception_total offset 1h*

exception_total{pod="x"} 1
exception_total{pod="y"} 1
exception_total{pod="z"} 1



--
Dan



On Mon, Oct 4, 2021 at 7:02 PM Brian Candler  wrote:

> count() returns no labels, and it also returns no timeseries when it has
> no input (rather than a timeseries with value zero, which I had naïvely
> expected).  So this is simpler again:
>
> absent(jenkins_up) and count(up{job="jenkins"})
>
> On Monday, 4 October 2021 at 15:29:14 UTC+1 Dan S wrote:
>
>> Thanks Brian for the advice! I found the `absent() and absent()` seemed
>> to work well.
>>
>> Also Ben - thank you - I did take your advice as well re: making multiple
>> layers of alerts, and didn't know about 
>> prometheus_target_scrape_pool_targets,
>> which could be useful in other ways as well.
>>
>> Appreciate it!
>>
>> Dan
>>
>>
>> On Mon, Oct 4, 2021 at 5:07 PM Brian Candler  wrote:
>>
>>> On Sunday, 3 October 2021 at 22:50:59 UTC+1 sup...@gmail.com wrote:
>>>
 Trying to manipulate alerts with absent() tends to behave badly.

>>>
>>> Aside: I found it a bit surprising at first that count() and sum()
>>> across an empty instant vector give an empty result, rather than 0.  I
>>> don't see that behaviour explicitly called out here
>>> ,
>>> but I guess it makes sense when you think about what "count by", "sum by"
>>> or "count_values" would have to do, when given no input.
>>>
>>> You can of course make it work the other way if required: e.g.
>>> "count(foo) or vector(0)"
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Prometheus Users" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/prometheus-users/gi8GtrBMMMk/unsubscribe
>>> .
>>> To unsubscribe from this group and all its topics, send an email to
>>> prometheus-use...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/4adf9702-adbc-43be-837e-794c6e009d2bn%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/prometheus-users/gi8GtrBMMMk/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/da11741e-52c5-4cf4-baa3-deb3daed7b38n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAOdST49PEh4vy2Z40cJLdyQfr6qxu-MNT0dbjsSmbaouPSjWhw%40mail.gmail.com.


Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-04 Thread Brian Candler
count() returns no labels, and it also returns no timeseries when it has no 
input (rather than a timeseries with value zero, which I had naïvely 
expected).  So this is simpler again:

absent(jenkins_up) and count(up{job="jenkins"})

On Monday, 4 October 2021 at 15:29:14 UTC+1 Dan S wrote:

> Thanks Brian for the advice! I found the `absent() and absent()` seemed to 
> work well.  
>
> Also Ben - thank you - I did take your advice as well re: making multiple 
> layers of alerts, and didn't know about 
> prometheus_target_scrape_pool_targets, 
> which could be useful in other ways as well.
>
> Appreciate it!
>
> Dan
>
>
> On Mon, Oct 4, 2021 at 5:07 PM Brian Candler  wrote:
>
>> On Sunday, 3 October 2021 at 22:50:59 UTC+1 sup...@gmail.com wrote:
>>
>>> Trying to manipulate alerts with absent() tends to behave badly.
>>>
>>
>> Aside: I found it a bit surprising at first that count() and sum() across 
>> an empty instant vector give an empty result, rather than 0.  I don't see 
>> that behaviour explicitly called out here 
>> ,
>>  
>> but I guess it makes sense when you think about what "count by", "sum by" 
>> or "count_values" would have to do, when given no input.
>>
>> You can of course make it work the other way if required: e.g. 
>> "count(foo) or vector(0)"
>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Prometheus Users" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/prometheus-users/gi8GtrBMMMk/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/4adf9702-adbc-43be-837e-794c6e009d2bn%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/da11741e-52c5-4cf4-baa3-deb3daed7b38n%40googlegroups.com.


Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-04 Thread Dan Schanler
Thanks Brian for the advice! I found the `absent() and absent()` seemed to
work well.

Also Ben - thank you - I did take your advice as well re: making multiple
layers of alerts, and didn't know about prometheus_target_scrape_pool_targets,
which could be useful in other ways as well.

Appreciate it!

Dan


On Mon, Oct 4, 2021 at 5:07 PM Brian Candler  wrote:

> On Sunday, 3 October 2021 at 22:50:59 UTC+1 sup...@gmail.com wrote:
>
>> Trying to manipulate alerts with absent() tends to behave badly.
>>
>
> Aside: I found it a bit surprising at first that count() and sum() across
> an empty instant vector give an empty result, rather than 0.  I don't see
> that behaviour explicitly called out here
> ,
> but I guess it makes sense when you think about what "count by", "sum by"
> or "count_values" would have to do, when given no input.
>
> You can of course make it work the other way if required: e.g. "count(foo)
> or vector(0)"
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/prometheus-users/gi8GtrBMMMk/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/4adf9702-adbc-43be-837e-794c6e009d2bn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAOdST493xuCc5_AqN-CPLd%2B%3Dpk2SSgjymtTy9OqEbrUD%2BSehew%40mail.gmail.com.


Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-04 Thread Brian Candler
On Sunday, 3 October 2021 at 22:50:59 UTC+1 sup...@gmail.com wrote:

> Trying to manipulate alerts with absent() tends to behave badly.
>

Aside: I found it a bit surprising at first that count() and sum() across 
an empty instant vector give an empty result, rather than 0.  I don't see 
that behaviour explicitly called out here 
,
 
but I guess it makes sense when you think about what "count by", "sum by" 
or "count_values" would have to do, when given no input.

You can of course make it work the other way if required: e.g. "count(foo) 
or vector(0)"

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4adf9702-adbc-43be-837e-794c6e009d2bn%40googlegroups.com.


Re: [prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-03 Thread Ben Kochie
Rather than use absent(), you can use the Prometheus metamon
metric prometheus_target_scrape_pool_targets.

Prometheus alerts are meant to be done in layers, where you have separate
alerts on `jenkins_up`, `up`, and `prometheus_target_scrape_pool_targets`.

Trying to manipulate alerts with absent() tends to behave badly.

On Sun, Oct 3, 2021 at 9:21 PM Brian Candler  wrote:

> Or slightly weirder:
>
> absent(jenkins_up) and absent(absent(up{job="jenkins"}))
>
> absent(absent(...)) being a way to get the RHS to have no labels, to match
> the LHS.
>
> On Sunday, 3 October 2021 at 20:14:59 UTC+1 Brian Candler wrote:
>
>> This might be an XY problem, because it is often better to have a defined
>> "up/down" metric (with value 1/0), which tells you whether something worked
>> or not, rather than alerting on presence or absence of a metric.
>>
>> However, to answer your question directly, I think you would need to
>> include some condition saying whether that metric *should* be there or not
>> - which is the presence of some other metric.  The "up" metric added by all
>> scrape jobs can be useful for this.  In this case, I expect
>> *up{job="jenkins"}* will exist, if and only if you have a 'jenkins'
>> scrape job in that region.  Therefore maybe something like this will do
>> what you want:
>>
>> absent(jenkins_up{job="jenkins"}) unless on (job)
>> absent(up{job="jenkins"})
>>
>> which I think may simplify, if the 'jenkins_up' metric is only scraped by
>> the 'jenkins' job, to this (not sure):
>>
>> absent(jenkins_up) unless on () absent(up{job="jenkins"})
>>
>> On Sunday, 3 October 2021 at 19:40:23 UTC+1 Dan S wrote:
>>
>>> Hi,
>>>
>>> Looking for some general advice about shared prom alert rules between
>>> regions.  We currently push the same alert rules to all regions, and
>>> sometimes we run into situations where we have a specific job in region X
>>> but not Y.
>>>
>>> This is fine for basic cases, such as *up{job="jenkins"} == 0* which
>>> will be ignored in regions where there's no jenkins job present (or could
>>> easily specify region="X").
>>>
>>> But in some situations I'd like to use absent on a metric that often has
>>> gaps for example
>>> *absent(jenkins_up{job="jenkins"})*
>>> This would trigger in all regions, whether or not there's a job
>>> "jenkins" (obviously because it's triggering on the missing metrics) even
>>> if I try to be more specific: *absent(jenkins_up{job="jenkins",
>>> region="US"}).*
>>>
>>> Any suggestions how I can craft an alert query using absent() in on
>>> metrics that don't appear in all regions?  So that if region="US" has
>>> job="jenkins" and I watch to catch gaps here, it won't also fire in
>>> region="EU" which never has job="jenkins" ?
>>>
>>> Thanks for any advice.
>>>
>>> Dan
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/8533225b-267c-4dd1-af32-5967966d5156n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmqNZ4REUnhaMxoC%2B%2BcnFBdUHOPnOqKC_f7TfLTOkHJw0Q%40mail.gmail.com.


[prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-03 Thread Brian Candler
Or slightly weirder:

absent(jenkins_up) and absent(absent(up{job="jenkins"}))

absent(absent(...)) being a way to get the RHS to have no labels, to match 
the LHS.

On Sunday, 3 October 2021 at 20:14:59 UTC+1 Brian Candler wrote:

> This might be an XY problem, because it is often better to have a defined 
> "up/down" metric (with value 1/0), which tells you whether something worked 
> or not, rather than alerting on presence or absence of a metric.
>
> However, to answer your question directly, I think you would need to 
> include some condition saying whether that metric *should* be there or not 
> - which is the presence of some other metric.  The "up" metric added by all 
> scrape jobs can be useful for this.  In this case, I expect 
> *up{job="jenkins"}* will exist, if and only if you have a 'jenkins' 
> scrape job in that region.  Therefore maybe something like this will do 
> what you want:
>
> absent(jenkins_up{job="jenkins"}) unless on (job) absent(up{job="jenkins"})
>
> which I think may simplify, if the 'jenkins_up' metric is only scraped by 
> the 'jenkins' job, to this (not sure):
>
> absent(jenkins_up) unless on () absent(up{job="jenkins"})
>
> On Sunday, 3 October 2021 at 19:40:23 UTC+1 Dan S wrote:
>
>> Hi,
>>
>> Looking for some general advice about shared prom alert rules between 
>> regions.  We currently push the same alert rules to all regions, and 
>> sometimes we run into situations where we have a specific job in region X 
>> but not Y.
>>
>> This is fine for basic cases, such as *up{job="jenkins"} == 0* which 
>> will be ignored in regions where there's no jenkins job present (or could 
>> easily specify region="X").
>>
>> But in some situations I'd like to use absent on a metric that often has 
>> gaps for example
>> *absent(jenkins_up{job="jenkins"})*
>> This would trigger in all regions, whether or not there's a job "jenkins" 
>> (obviously because it's triggering on the missing metrics) even if I try to 
>> be more specific: *absent(jenkins_up{job="jenkins", region="US"}).*
>>
>> Any suggestions how I can craft an alert query using absent() in on 
>> metrics that don't appear in all regions?  So that if region="US" has 
>> job="jenkins" and I watch to catch gaps here, it won't also fire in 
>> region="EU" which never has job="jenkins" ?
>>
>> Thanks for any advice.
>>
>> Dan
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8533225b-267c-4dd1-af32-5967966d5156n%40googlegroups.com.


[prometheus-users] Re: alert rules between regions - to avoid triggering absent metric

2021-10-03 Thread Brian Candler
This might be an XY problem, because it is often better to have a defined 
"up/down" metric (with value 1/0), which tells you whether something worked 
or not, rather than alerting on presence or absence of a metric.

However, to answer your question directly, I think you would need to 
include some condition saying whether that metric *should* be there or not 
- which is the presence of some other metric.  The "up" metric added by all 
scrape jobs can be useful for this.  In this case, I expect 
*up{job="jenkins"}* will exist, if and only if you have a 'jenkins' scrape 
job in that region.  Therefore maybe something like this will do what you 
want:

absent(jenkins_up{job="jenkins"}) unless on (job) absent(up{job="jenkins"})

which I think may simplify, if the 'jenkins_up' metric is only scraped by 
the 'jenkins' job, to this (not sure):

absent(jenkins_up) unless on () absent(up{job="jenkins"})

On Sunday, 3 October 2021 at 19:40:23 UTC+1 Dan S wrote:

> Hi,
>
> Looking for some general advice about shared prom alert rules between 
> regions.  We currently push the same alert rules to all regions, and 
> sometimes we run into situations where we have a specific job in region X 
> but not Y.
>
> This is fine for basic cases, such as *up{job="jenkins"} == 0* which will 
> be ignored in regions where there's no jenkins job present (or could easily 
> specify region="X").
>
> But in some situations I'd like to use absent on a metric that often has 
> gaps for example
> *absent(jenkins_up{job="jenkins"})*
> This would trigger in all regions, whether or not there's a job "jenkins" 
> (obviously because it's triggering on the missing metrics) even if I try to 
> be more specific: *absent(jenkins_up{job="jenkins", region="US"}).*
>
> Any suggestions how I can craft an alert query using absent() in on 
> metrics that don't appear in all regions?  So that if region="US" has 
> job="jenkins" and I watch to catch gaps here, it won't also fire in 
> region="EU" which never has job="jenkins" ?
>
> Thanks for any advice.
>
> Dan
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/91375704-0de4-4c1e-ada8-55ef78cc83b0n%40googlegroups.com.