[prometheus-users] Re: Monitor number of seconds since metric change as prometheus time series

Weston Greene Sun, 13 Sep 2020 01:46:16 -0700

I feel like this answer gives directly what you need minus one step, so 
forgive me if I'm misunderstanding. The one step it doesn't explicitly say 
is a second rule for `time() - stat__change__timestamp`. 
Here is an example directly from my working solution:


```rules.yaml
              - record: stat__change__timestamp  
                # timestamp of when the metric last changed  
                expr: 
                  timestamp(changes({exported_job=~"visor_.*", 
alertname="", offset="", original_name!="", 
original_stat=""}[${SCRAPE_INTERVAL_AND_A_HALF}]) > 0)
                    or ignoring(stat, monitor, original_stat)
                  stat__change__timestamp
                labels:
                  stat: true
                  original_stat: stat__change__timestamp  # This keeps the 
stat__offset of this metric unique from the original
    
              - record: stat__change__seconds_since  
                # number of seconds since the metric value changed  # this 
will highlight whether a script is not recording correctly or if a metric 
is stagnant
                expr: 
                  time() - stat__change__timestamp
                labels:
                  stat: true
                  original_stat: stat__change__seconds_since  # This keeps 
the stat__offset of this metric unique from the original
```

An alternative to `changes()` (pulled from a different prometheus server I 
manage, hence the different label criteria):
```rules.yaml
                timestamp(
                  (
                      
kafka_consumer_group_lag{topic!~".*verification_id|.*submission_id|.*__leader|.*-changelog|.*_Internal.*",
 
group!="BifrostMonitor_Bifrost_MongoTopicDumper"}
                       -
                      
 
kafka_consumer_group_lag{topic!~".*verification_id|.*submission_id|.*__leader|.*-changelog|.*_Internal.*",
 
group!="BifrostMonitor_Bifrost_MongoTopicDumper"} offset 
${SCRAPE_INTERVAL_DOUBLE}
                     ) != 0
                   )
```

When I say `SCRAPE_INTERVAL`, I mean 
```prometheus.yaml
  global:
    scrape_interval: ${SCRAPE_INTERVAL} # Default is every minute.
    evaluation_interval: ${EVALUATION_INTERVAL} # default is every minute.
  alerting:
     ...
```

I can't remember why I chose `_AND_A_HALF` for `changes()` and yet 
`_DOUBLE` for subtracting the offset. Don't think it much matters.

On Wednesday, September 9, 2020 at 6:41:13 AM UTC-4 t1hom7as wrote:

> I am actually trying to do something very similar, but I can't really tell 
> if it is the same or not.
> Basically, I have a metric that gives me the status of up or down, being 1 
> or 0 respectively in the value field. 
>
> I would like to somehow find out from when the value went FROM 0 TO 1, so 
> how long it has been. 
> In this case, how long since it changed to 1 to the current timestamp, 
> therefore I should be able to measure the uptime value of that metric.  
>
> Open to ideas, as I can't seem to get this working, eventually I would 
> like to present this into grafana so I can show the uptime of that metric.  
>
> On Friday, 3 April 2020 at 10:01:52 UTC+1 [email protected] wrote:
>
>> ANSWERED! 
>> From Stackoverflow:
>>
>> Summing up our discussion: the evaluation interval is too big; after 5 
>> minutes, a metric becomes [stale][1]. This means that when the expression 
>> is evaluated, the right hand side of your `OR` expression is no longer 
>> considered by Prometheus and thus is always empty.
>>
>> Your second issue is that your record rule is adding some labels to the 
>> original metric and you get some complaint by Prometheus. This is not 
>> because the labels already exists: in [recording rules][3], labels 
>> overwrite the existing labels.
>>
>> The issue is your `OR` expression: it should specify an `ignoring()` 
>> [matching clause][2] for ignoring the added labels or you will get the 
>> labels from both sides of the `OR` expression:
>>
>> > `vector1 or vector2` results in a vector that contains all original 
>> elements (label sets + values) of vector1 and additionally all elements of 
>> vector2 ***which do not have matching label sets in vector1***.
>>
>> Since you get both side of the `OR`, when Prometheus tries to add the 
>> labels to the left hand side, it conflicts with the right hand side which 
>> already exists.
>>
>> Your expression should be something like:
>> ```yaml
>>     expr: |
>>       timestamp(changes(metric-name[450s]) > 0)
>>         or ignoring(stat,monitor)
>>       last-update
>> ```
>> Or use an `ON(label1,label2,...)` clause on a discriminating label set 
>> which avoids changing the expression whenever you change the labels.
>>
>>
>>   [1]: 
>> https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness
>>   [2]: 
>> https://prometheus.io/docs/prometheus/latest/querying/operators/#one-to-one-vector-matches
>>   [3]: 
>> https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule
>>
>>
>> On Wednesday, April 1, 2020 at 5:41:19 AM UTC-4, Weston Greene wrote:
>>>
>>> In the stackoverflow post about this same topic, I was encouraged to 
>>> reduce my evaluation frequency since `last-update` was likely going stale 
>>> by the default TTL (Time To Live) of 5 minutes.
>>>
>>> Now I can't get passed the `vector contains metrics with the same 
>>> labelset after applying rule labels`.
>>>
>>> I do add labels in the recording rule:
>>> ```
>>>                   stat: true
>>>                   monitor: false
>>> ```
>>>
>>> I believe this is because `last-update` already has all the labels that 
>>> `metric-name` has plus the labels that the recording rule adds, so when the 
>>> `or` is triggered `last-update` conflicts since it already has the labels.
>>>
>>> How do I get around this? Thank you again for your creativity!
>>>
>>>
>>> On Monday, March 30, 2020 at 10:23:20 AM UTC-4, Weston Greene wrote:
>>>>
>>>> This was already partially answered in 
>>>> https://stackoverflow.com/questions/54148451
>>>>
>>>> But not sufficiently, so I'm asking here and in the Stack Overflow: 
>>>> https://stackoverflow.com/questions/60928468
>>>>
>>>> Here is the image of the graph: 
>>>>
>>>> [image: Screen Shot 2020-03-30 at 06.18.07.png]
>>>>
>>>>
>>>>
>>>> On Monday, March 30, 2020 at 10:21:01 AM UTC-4, Weston Greene wrote:
>>>>>
>>>>>
>>>>> I have the Recording rule pattern:
>>>>> ```yaml
>>>>>   - record: last-update
>>>>>     expr: |
>>>>>       timestamp(changes(metric-name[450s]) > 0)
>>>>>         or
>>>>>       last-update
>>>>> ```
>>>>>
>>>>> However, that doesn't work. The `or last-update` part doesn't return a 
>>>>> value.
>>>>>
>>>>> I have tried using an offset,
>>>>> ` or (last-update offset 450s)`, 
>>>>> to no avail.
>>>>>
>>>>>
>>>>> My evaluation frequency is 5 minutes (the frequency that prometheus 
>>>>> runs my Recording rules). I tried the 7.5 minutes offset because I 
>>>>> theorized that the OR was attempting to write last-update as last-update 
>>>>> but last-update was null in that second; if the OR were to attempt 
>>>>> writing 
>>>>> last-update as the value it was during it's previous evaluation, then it 
>>>>> should find a value in last-update, but that returned no value as well.
>>>>>
>>>>>
>>>>> This is what the metric looks like graphed: 
>>>>>
>>>>> [choppy rather than a complete staircase][1] (I don't have enough 
>>>>> reputation to post pictures...)
>>>>>
>>>>>
>>>>>
>>>>> Thank you in advance for your help.
>>>>>
>>>>> Why I care:
>>>>> If a time series plateaus for an extended period of time then I want 
>>>>> to know as that may mean it has begun to fail to return accurate data.
>>>>>
>>>>>
>>>>>   [1]: I think the image link is preventing me from posting
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/595ae51c-c75a-468c-901a-70b54215aa1bn%40googlegroups.com.

[prometheus-users] Re: Monitor number of seconds since metric change as prometheus time series

Reply via email to