[prometheus-users] Re: Alert Resetting on Every Evaluation Cycle

'Brian Candler' via Prometheus Users Mon, 20 Jan 2025 06:35:29 -0800

I can see from your ALERTS graph that your alerts are all different (they 
have a different combination of labels), which in turn comes from here:


    labels:
      metric: rail_temp
      severity: warning
      threshold: 0
      threshold_type: global
      value: '{{ $value }}'     <<< HERE

Just remove that label, and you should be good.  You can use $value in 
annotations, but you should not use it in labels, for this very reason.

What's happening is that $value changes, and so the old alert (with 
value="old") resolves, and a new alert fires (with value="new")

On Monday, 20 January 2025 at 10:02:12 UTC Alexander Diyakov wrote:

> Hello Prometheus Users,
>
> I'm facing an issue with my alert rules where the alerts are resetting on 
> every evaluation cycle. I have simplified the setup as much as possible, 
> but the problem persists. Here's the context:
>
>    1. *Metric :* 
>
> Metric rail_temp is continuously increasing or decreasing and is always 
> greater than 0.
>
> The metric is exposed via an HTTP server using 
> the start_http_server function from prometheus_client. It updates every 
> second.
>
>    2. *Alert Rule:* 
>
> groups:
> - name: rail_temp_alerts
>   rules:
>   - alert: rail_temp_Warning
>     annotations:
>       description: rail_temp is above the warning threshold 
> (rail_temp_th_W_G)
>       summary: rail_temp exceeded warning threshold
>     expr: rail_temp > 0
>     for: 10s
>     labels:
>       metric: rail_temp
>       severity: warning
>       threshold: 0
>       threshold_type: global
>       value: '{{ $value }}'
>
>    3. *Prometheus Global Configuration* 
>
> global:
>   scrape_interval: 7s  
>   evaluation_interval: 4s  
>   # scrape_timeout is set to the global default (10s).
>
> rule_files:
>    - "alert_rules.yml"
>
> scrape_configs:
>
>   - job_name: "pushgateway"
>     scrape_interval: 1s
>     static_configs:
>       - targets: ["localhost:9091"]  # URL Pushgateway
>
>    4. *Observations:*    
>
>  
>
> The rail_temp metric has no gaps and updates correctly, as seen in the 
> screenshot
>
> However, the alert constantly resets  on each evaluation cycle 
> (evaluation_interval: 4s), even though the for duration is set to 10 
> seconds. And alert newer goes to Firing, otherwise for=0.
>
>          There's two graphs of the ALERTS prometheus internal metric and 
> Alerts tab.
>
>  
>
>    5. *What* *I've Tried:* 
>
>        
> Verified that the metric updates correctly without any gaps.
> Used both push_to_gateway and start_http_server to expose metrics, but the 
> behavior remains the same.
> Increased the for duration and adjusted the 
> scrape_interval and evaluation_interval, but it didn't help.
>
>
>    6. *Expected Behavior:* 
>
> The alert should transition to firing after the for duration is met 
> without resetting on each evaluation cycle.
>
>    7. *Current Behavior:* 
>
> The alert resets to pending every 4 seconds (matching 
> the evaluation_interval) instead of transitioning to firing.
>
> I believe this could be a bug or misconfiguration, but I'm not sure how to 
> further debug this. Any insights or suggestions on resolving this would be 
> greatly appreciated.
>
> Thank you in advance!
>
> Best regards,
>
> Alexander
>
>  [image: Screenshot 2025-01-20 122907.png]
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/058188f2-cf1d-4374-8ea6-6fa0501293acn%40googlegroups.com.

[prometheus-users] Re: Alert Resetting on Every Evaluation Cycle

Reply via email to