[prometheus-users] Re: Monitor number of seconds since metric change as prometheus time series

t1hom7as Wed, 09 Sep 2020 03:41:32 -0700

I am actually trying to do something very similar, but I can't really tell 
if it is the same or not.
Basically, I have a metric that gives me the status of up or down, being 1 
or 0 respectively in the value field.


I would like to somehow find out from when the value went FROM 0 TO 1, so 
how long it has been. 
In this case, how long since it changed to 1 to the current timestamp, 
therefore I should be able to measure the uptime value of that metric.  

Open to ideas, as I can't seem to get this working, eventually I would like 
to present this into grafana so I can show the uptime of that metric.  

On Friday, 3 April 2020 at 10:01:52 UTC+1 [email protected] wrote:

> ANSWERED! 
> From Stackoverflow:
>
> Summing up our discussion: the evaluation interval is too big; after 5 
> minutes, a metric becomes [stale][1]. This means that when the expression 
> is evaluated, the right hand side of your `OR` expression is no longer 
> considered by Prometheus and thus is always empty.
>
> Your second issue is that your record rule is adding some labels to the 
> original metric and you get some complaint by Prometheus. This is not 
> because the labels already exists: in [recording rules][3], labels 
> overwrite the existing labels.
>
> The issue is your `OR` expression: it should specify an `ignoring()` 
> [matching clause][2] for ignoring the added labels or you will get the 
> labels from both sides of the `OR` expression:
>
> > `vector1 or vector2` results in a vector that contains all original 
> elements (label sets + values) of vector1 and additionally all elements of 
> vector2 ***which do not have matching label sets in vector1***.
>
> Since you get both side of the `OR`, when Prometheus tries to add the 
> labels to the left hand side, it conflicts with the right hand side which 
> already exists.
>
> Your expression should be something like:
> ```yaml
>     expr: |
>       timestamp(changes(metric-name[450s]) > 0)
>         or ignoring(stat,monitor)
>       last-update
> ```
> Or use an `ON(label1,label2,...)` clause on a discriminating label set 
> which avoids changing the expression whenever you change the labels.
>
>
>   [1]: 
> https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness
>   [2]: 
> https://prometheus.io/docs/prometheus/latest/querying/operators/#one-to-one-vector-matches
>   [3]: 
> https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule
>
>
> On Wednesday, April 1, 2020 at 5:41:19 AM UTC-4, Weston Greene wrote:
>>
>> In the stackoverflow post about this same topic, I was encouraged to 
>> reduce my evaluation frequency since `last-update` was likely going stale 
>> by the default TTL (Time To Live) of 5 minutes.
>>
>> Now I can't get passed the `vector contains metrics with the same 
>> labelset after applying rule labels`.
>>
>> I do add labels in the recording rule:
>> ```
>>                   stat: true
>>                   monitor: false
>> ```
>>
>> I believe this is because `last-update` already has all the labels that 
>> `metric-name` has plus the labels that the recording rule adds, so when the 
>> `or` is triggered `last-update` conflicts since it already has the labels.
>>
>> How do I get around this? Thank you again for your creativity!
>>
>>
>> On Monday, March 30, 2020 at 10:23:20 AM UTC-4, Weston Greene wrote:
>>>
>>> This was already partially answered in 
>>> https://stackoverflow.com/questions/54148451
>>>
>>> But not sufficiently, so I'm asking here and in the Stack Overflow: 
>>> https://stackoverflow.com/questions/60928468
>>>
>>> Here is the image of the graph: 
>>>
>>> [image: Screen Shot 2020-03-30 at 06.18.07.png]
>>>
>>>
>>>
>>> On Monday, March 30, 2020 at 10:21:01 AM UTC-4, Weston Greene wrote:
>>>>
>>>>
>>>> I have the Recording rule pattern:
>>>> ```yaml
>>>>   - record: last-update
>>>>     expr: |
>>>>       timestamp(changes(metric-name[450s]) > 0)
>>>>         or
>>>>       last-update
>>>> ```
>>>>
>>>> However, that doesn't work. The `or last-update` part doesn't return a 
>>>> value.
>>>>
>>>> I have tried using an offset,
>>>> ` or (last-update offset 450s)`, 
>>>> to no avail.
>>>>
>>>>
>>>> My evaluation frequency is 5 minutes (the frequency that prometheus 
>>>> runs my Recording rules). I tried the 7.5 minutes offset because I 
>>>> theorized that the OR was attempting to write last-update as last-update 
>>>> but last-update was null in that second; if the OR were to attempt writing 
>>>> last-update as the value it was during it's previous evaluation, then it 
>>>> should find a value in last-update, but that returned no value as well.
>>>>
>>>>
>>>> This is what the metric looks like graphed: 
>>>>
>>>> [choppy rather than a complete staircase][1] (I don't have enough 
>>>> reputation to post pictures...)
>>>>
>>>>
>>>>
>>>> Thank you in advance for your help.
>>>>
>>>> Why I care:
>>>> If a time series plateaus for an extended period of time then I want to 
>>>> know as that may mean it has begun to fail to return accurate data.
>>>>
>>>>
>>>>   [1]: I think the image link is preventing me from posting
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5f304ce1-52f4-4fd2-af30-b8a56fe47438n%40googlegroups.com.

[prometheus-users] Re: Monitor number of seconds since metric change as prometheus time series

Reply via email to