I think the starting point is to look at your alerting expressions, how 
they change over time in the PromQL GUI (graph view), and the synthetic 
metric "ALERTS 
<https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#inspecting-alerts-during-runtime>
".

If an alert expression drops out for even a single rule evaluation 
interval, then the alert is immediately resolved; and then it will re-fire 
on the next cycle (or after the "for:" period if present)

There is a change in prometheus-2.42.0 
<https://github.com/prometheus/prometheus/releases/tag/v2.42.0> which may 
help address this:

   - [FEATURE] Add 'keep_firing_for' field to alerting rules. #11827 
   <https://github.com/prometheus/prometheus/pull/11827>


On Thursday, 9 March 2023 at 21:17:24 UTC Russ Robinson wrote:

>   I have alertmanager configured to send "critical" alerts to Pagerduty 
> over Events v2 api.  If the prometheus rule has an alert that lasts longer 
> than 20 minutes or so; the pagerduty alert will be resolved and then 
> re-triggers a new event.
>
>   I have tried disabling grouping (with "group_by [...]").  The pagerduty 
> alert's log just says: "Resolved through the integration API.".
>
>   However, the alert still shows in Alertmanager.  In addition, I have 
> messages going to slack.  The alert message shows up there; but never a 
> resolved message either.
>
>   Any ideas why alertmanager would close/resolve the pagerduty incident 
> and then re-trigger/open one again?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b9c8b139-36b4-4087-90d0-67ffe5030b86n%40googlegroups.com.

Reply via email to