I feel like this answer gives directly what you need minus one step, so
forgive me if I'm misunderstanding. The one step it doesn't explicitly say
is a second rule for `time() - stat__change__timestamp`.
Here is an example directly from my working solution:
```rules.yaml
- record: stat__change__timestamp
# timestamp of when the metric last changed
expr:
timestamp(changes({exported_job=~"visor_.*",
alertname="", offset="", original_name!="",
original_stat=""}[${SCRAPE_INTERVAL_AND_A_HALF}]) > 0)
or ignoring(stat, monitor, original_stat)
stat__change__timestamp
labels:
stat: true
original_stat: stat__change__timestamp # This keeps the
stat__offset of this metric unique from the original
- record: stat__change__seconds_since
# number of seconds since the metric value changed # this
will highlight whether a script is not recording correctly or if a metric
is stagnant
expr:
time() - stat__change__timestamp
labels:
stat: true
original_stat: stat__change__seconds_since # This keeps
the stat__offset of this metric unique from the original
```
An alternative to `changes()` (pulled from a different prometheus server I
manage, hence the different label criteria):
```rules.yaml
timestamp(
(
kafka_consumer_group_lag{topic!~".*verification_id|.*submission_id|.*__leader|.*-changelog|.*_Internal.*",
group!="BifrostMonitor_Bifrost_MongoTopicDumper"}
-
kafka_consumer_group_lag{topic!~".*verification_id|.*submission_id|.*__leader|.*-changelog|.*_Internal.*",
group!="BifrostMonitor_Bifrost_MongoTopicDumper"} offset
${SCRAPE_INTERVAL_DOUBLE}
) != 0
)
```
When I say `SCRAPE_INTERVAL`, I mean
```prometheus.yaml
global:
scrape_interval: ${SCRAPE_INTERVAL} # Default is every minute.
evaluation_interval: ${EVALUATION_INTERVAL} # default is every minute.
alerting:
...
```
I can't remember why I chose `_AND_A_HALF` for `changes()` and yet
`_DOUBLE` for subtracting the offset. Don't think it much matters.
On Wednesday, September 9, 2020 at 6:41:13 AM UTC-4 t1hom7as wrote:
> I am actually trying to do something very similar, but I can't really tell
> if it is the same or not.
> Basically, I have a metric that gives me the status of up or down, being 1
> or 0 respectively in the value field.
>
> I would like to somehow find out from when the value went FROM 0 TO 1, so
> how long it has been.
> In this case, how long since it changed to 1 to the current timestamp,
> therefore I should be able to measure the uptime value of that metric.
>
> Open to ideas, as I can't seem to get this working, eventually I would
> like to present this into grafana so I can show the uptime of that metric.
>
> On Friday, 3 April 2020 at 10:01:52 UTC+1 [email protected] wrote:
>
>> ANSWERED!
>> From Stackoverflow:
>>
>> Summing up our discussion: the evaluation interval is too big; after 5
>> minutes, a metric becomes [stale][1]. This means that when the expression
>> is evaluated, the right hand side of your `OR` expression is no longer
>> considered by Prometheus and thus is always empty.
>>
>> Your second issue is that your record rule is adding some labels to the
>> original metric and you get some complaint by Prometheus. This is not
>> because the labels already exists: in [recording rules][3], labels
>> overwrite the existing labels.
>>
>> The issue is your `OR` expression: it should specify an `ignoring()`
>> [matching clause][2] for ignoring the added labels or you will get the
>> labels from both sides of the `OR` expression:
>>
>> > `vector1 or vector2` results in a vector that contains all original
>> elements (label sets + values) of vector1 and additionally all elements of
>> vector2 ***which do not have matching label sets in vector1***.
>>
>> Since you get both side of the `OR`, when Prometheus tries to add the
>> labels to the left hand side, it conflicts with the right hand side which
>> already exists.
>>
>> Your expression should be something like:
>> ```yaml
>> expr: |
>> timestamp(changes(metric-name[450s]) > 0)
>> or ignoring(stat,monitor)
>> last-update
>> ```
>> Or use an `ON(label1,label2,...)` clause on a discriminating label set
>> which avoids changing the expression whenever you change the labels.
>>
>>
>> [1]:
>> https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness
>> [2]:
>> https://prometheus.io/docs/prometheus/latest/querying/operators/#one-to-one-vector-matches
>> [3]:
>> https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule
>>
>>
>> On Wednesday, April 1, 2020 at 5:41:19 AM UTC-4, Weston Greene wrote:
>>>
>>> In the stackoverflow post about this same topic, I was encouraged to
>>> reduce my evaluation frequency since `last-update` was likely going stale
>>> by the default TTL (Time To Live) of 5 minutes.
>>>
>>> Now I can't get passed the `vector contains metrics with the same
>>> labelset after applying rule labels`.
>>>
>>> I do add labels in the recording rule:
>>> ```
>>> stat: true
>>> monitor: false
>>> ```
>>>
>>> I believe this is because `last-update` already has all the labels that
>>> `metric-name` has plus the labels that the recording rule adds, so when the
>>> `or` is triggered `last-update` conflicts since it already has the labels.
>>>
>>> How do I get around this? Thank you again for your creativity!
>>>
>>>
>>> On Monday, March 30, 2020 at 10:23:20 AM UTC-4, Weston Greene wrote:
>>>>
>>>> This was already partially answered in
>>>> https://stackoverflow.com/questions/54148451
>>>>
>>>> But not sufficiently, so I'm asking here and in the Stack Overflow:
>>>> https://stackoverflow.com/questions/60928468
>>>>
>>>> Here is the image of the graph:
>>>>
>>>> [image: Screen Shot 2020-03-30 at 06.18.07.png]
>>>>
>>>>
>>>>
>>>> On Monday, March 30, 2020 at 10:21:01 AM UTC-4, Weston Greene wrote:
>>>>>
>>>>>
>>>>> I have the Recording rule pattern:
>>>>> ```yaml
>>>>> - record: last-update
>>>>> expr: |
>>>>> timestamp(changes(metric-name[450s]) > 0)
>>>>> or
>>>>> last-update
>>>>> ```
>>>>>
>>>>> However, that doesn't work. The `or last-update` part doesn't return a
>>>>> value.
>>>>>
>>>>> I have tried using an offset,
>>>>> ` or (last-update offset 450s)`,
>>>>> to no avail.
>>>>>
>>>>>
>>>>> My evaluation frequency is 5 minutes (the frequency that prometheus
>>>>> runs my Recording rules). I tried the 7.5 minutes offset because I
>>>>> theorized that the OR was attempting to write last-update as last-update
>>>>> but last-update was null in that second; if the OR were to attempt
>>>>> writing
>>>>> last-update as the value it was during it's previous evaluation, then it
>>>>> should find a value in last-update, but that returned no value as well.
>>>>>
>>>>>
>>>>> This is what the metric looks like graphed:
>>>>>
>>>>> [choppy rather than a complete staircase][1] (I don't have enough
>>>>> reputation to post pictures...)
>>>>>
>>>>>
>>>>>
>>>>> Thank you in advance for your help.
>>>>>
>>>>> Why I care:
>>>>> If a time series plateaus for an extended period of time then I want
>>>>> to know as that may mean it has begun to fail to return accurate data.
>>>>>
>>>>>
>>>>> [1]: I think the image link is preventing me from posting
>>>>>
>>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/595ae51c-c75a-468c-901a-70b54215aa1bn%40googlegroups.com.