[prometheus-users] Re: Alert Resetting on Every Evaluation Cycle

Alexander Diyakov Mon, 20 Jan 2025 11:28:14 -0800


Thank you so much for pointing this out!


I completely overlooked the fact that including value in the labels would 
create distinct alerts for every change in the metric's value. Your 
explanation about $value causing the old alert to resolve and a new one to 
fire makes perfect sense.

I’ve now removed the value label and kept it only in the annotations, as 
you suggested. After testing, the alert behaves exactly as expected and no 
longer resets on each evaluation cycle.

This was an invaluable insight—thank you again for taking the time to help 
me resolve this issue!

Best regards,

Alexander

понедельник, 20 января 2025 г. в 17:26:27 UTC+3, Brian Candler: 

> I can see from your ALERTS graph that your alerts are all different (they 
> have a different combination of labels), which in turn comes from here:
>
>     labels:
>       metric: rail_temp
>       severity: warning
>       threshold: 0
>       threshold_type: global
>       value: '{{ $value }}'     <<< HERE
>
> Just remove that label, and you should be good.  You can use $value in 
> annotations, but you should not use it in labels, for this very reason.
>
> What's happening is that $value changes, and so the old alert (with 
> value="old") resolves, and a new alert fires (with value="new")
>
> On Monday, 20 January 2025 at 10:02:12 UTC Alexander Diyakov wrote:
>
>> Hello Prometheus Users,
>>
>> I'm facing an issue with my alert rules where the alerts are resetting on 
>> every evaluation cycle. I have simplified the setup as much as possible, 
>> but the problem persists. Here's the context:
>>
>>    1. *Metric :* 
>>
>> Metric rail_temp is continuously increasing or decreasing and is always 
>> greater than 0.
>>
>> The metric is exposed via an HTTP server using 
>> the start_http_server function from prometheus_client. It updates every 
>> second.
>>
>>    2. *Alert Rule:* 
>>
>> groups:
>> - name: rail_temp_alerts
>>   rules:
>>   - alert: rail_temp_Warning
>>     annotations:
>>       description: rail_temp is above the warning threshold 
>> (rail_temp_th_W_G)
>>       summary: rail_temp exceeded warning threshold
>>     expr: rail_temp > 0
>>     for: 10s
>>     labels:
>>       metric: rail_temp
>>       severity: warning
>>       threshold: 0
>>       threshold_type: global
>>       value: '{{ $value }}'
>>
>>    3. *Prometheus Global Configuration* 
>>
>> global:
>>   scrape_interval: 7s  
>>   evaluation_interval: 4s  
>>   # scrape_timeout is set to the global default (10s).
>>
>> rule_files:
>>    - "alert_rules.yml"
>>
>> scrape_configs:
>>
>>   - job_name: "pushgateway"
>>     scrape_interval: 1s
>>     static_configs:
>>       - targets: ["localhost:9091"]  # URL Pushgateway
>>
>>    4. *Observations:*    
>>
>>  
>>
>> The rail_temp metric has no gaps and updates correctly, as seen in the 
>> screenshot
>>
>> However, the alert constantly resets  on each evaluation cycle 
>> (evaluation_interval: 4s), even though the for duration is set to 10 
>> seconds. And alert newer goes to Firing, otherwise for=0.
>>
>>          There's two graphs of the ALERTS prometheus internal metric and 
>> Alerts tab.
>>
>>  
>>
>>    5. *What* *I've Tried:* 
>>
>>        
>> Verified that the metric updates correctly without any gaps.
>> Used both push_to_gateway and start_http_server to expose metrics, but 
>> the behavior remains the same.
>> Increased the for duration and adjusted the 
>> scrape_interval and evaluation_interval, but it didn't help.
>>
>>
>>    6. *Expected Behavior:* 
>>
>> The alert should transition to firing after the for duration is met 
>> without resetting on each evaluation cycle.
>>
>>    7. *Current Behavior:* 
>>
>> The alert resets to pending every 4 seconds (matching 
>> the evaluation_interval) instead of transitioning to firing.
>>
>> I believe this could be a bug or misconfiguration, but I'm not sure how 
>> to further debug this. Any insights or suggestions on resolving this would 
>> be greatly appreciated.
>>
>> Thank you in advance!
>>
>> Best regards,
>>
>> Alexander
>>
>>  [image: Screenshot 2025-01-20 122907.png]
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/7df44ed0-7915-4e59-8ce3-34e26d8e32f9n%40googlegroups.com.

[prometheus-users] Re: Alert Resetting on Every Evaluation Cycle

Reply via email to