[prometheus-users] Re: Alerts are getting fire after every minute

Amol Nagotkar Tue, 04 Mar 2025 23:50:31 -0800


Thank you for the reply.



answers for above points-

1. i checked expression "up == 0" is firing rarely and all my targets are 
being scraped.

2. for not to get alerts every minutes, now i kept  *evaluation_interval as 
5m* 

3. i have removed keep_firing_for as it is not suitable for my use case.


Updated:

I am using prometheus alerting for rabbitmq. Below is the configuration I 
am using.


*prometheus.yml file*

global:

  scrape_interval: 15s # Set the scrape interval to every 15 seconds. 
Default is every 1 minute.

  evaluation_interval: 5m # Evaluate rules every 15 seconds. The default is 
every 1 minute.

  # scrape_timeout is set to the global default (10s).


alerting:

   alertmanagers:

       - static_configs:

           - targets:

               - ip:port

rule_files:

- "alerts_rules.yml"

scrape_configs:

- job_name: "prometheus"

  static_configs:

  - targets: ["ip:port"]


*alerts_rules.yml file*

groups:

- name: instance_alerts

  rules:

  - alert: "Instance Down"

    expr: up == 0

    for: 30s

    # keep_firing_for: 30s

    labels:

      severity: "Critical"

    annotations:

      summary: "Endpoint {{ $labels.instance }} down"

      description: "{{ $labels.instance }} of job {{ $labels.job }} has 
been down for more than 30 sec."


- name: rabbitmq_alerts

  rules:

    - alert: "Consumer down for last 1 min"

      expr: rabbitmq_queue_consumers == 0

      for: 30s

      # keep_firing_for: 30s

      labels:

        severity: Critical

      annotations:

        summary: "shortify | '{{ $labels.queue }}' has no consumers"

        description: "The queue '{{ $labels.queue }}' in vhost '{{ 
$labels.vhost }}' has zero consumers for more than 30 sec. Immediate 
attention is required."



    - alert: "Total Messages > 10k in last 1 min"

      expr: rabbitmq_queue_messages > 10000

      for: 30s

      # keep_firing_for: 30s

      labels:

        severity: Critical

      annotations:

        summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages 
for more than 1 min."

        description: |

          Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} 
messages for more than 1 min.


Event if there is no data in queue, it sends me alerts. I have kept 
*evaluation_interval: 
5m* ( Prometheus evaluates alert rules every 5 minutes) and *for: 30s* (Ensures 
the alert fires only if the condition persists for 30s).

I guess *for* is not working for me.

By the way* i am not using alertmanager*
(https://github.com/prometheus/alertmanager/releases/latest/download/alertmanager-0.28.0.linux-amd64.tar.gz)

i am just using *prometheus*
 
(https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz)

https://prometheus.io/download/

How can i solve this. Thank you in advance.

On Saturday, February 15, 2025 at 12:13:01 AM UTC+5:30 Brian Candler wrote:

> > even if application is not down, it sends alerts every 1 min. how to 
> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>
> You need to show the actual alerts, from the Prometheus web interface 
> and/or the notifications, and then describe how these are different from 
> what you expect.
>
> I very much doubt that the expression "up == 0" is firing unless there is 
> at least one target which is not being scraped, and therefore the "up" 
> metric has a value of 0 for a particular timeseries (metric with a given 
> set of labels).
>
> > if the threshold cross and value changes, it fires multiple alerts 
> having same alert rule thats fine. But with same '{{ $value }}' it should 
> fire alerts after 5 min. same alert rule with same value should not get 
> fire for next 5 min. how to get this ??
>
> I cannot work out what problem you are trying to describe. As long as you 
> only use '{{ $value }}' in annotations, not labels, then the same alert 
> will just continue firing.
>
> Whether you get repeated *notifications* about that ongoing alert is a 
> different matter. With "repeat_interval: 15m" you should get them every 15 
> minutes at least. You may get additional notifications if a new alert is 
> added into the same alert group, or one is resolved from the alert group.
>
> > whats is for, keep_firing_for and evaluation_interval ?
>
> keep_firing_for is debouncing: once the alert condition has gone away, it 
> will continue firing for this period of time. This is so that if the alert 
> condition vanishes briefly but reappears, it doesn't cause the alert to be 
> resolved and then retriggered.
>
> evaluation_interval is how often the alerting expression is evaluated.
>
>
> On Friday, 14 February 2025 at 15:53:24 UTC Amol Nagotkar wrote:
>
>> Hi all,
>> i want same alert(alert rule) to be fire after 5 min, currently i am 
>> getting same alert (alert rule) after every one minute for same '{{ $value 
>> }}'.
>> if the threshold cross and value changes, it fires multiple alerts having 
>> same alert rule thats fine. But with same '{{ $value }}' it should fire 
>> alerts after 5 min. same alert rule with same value should not get fire for 
>> next 5 min. how to get this ??
>> even if application is not down, it sends alerts every 1 min. how to 
>> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>> whats is for, keep_firing_for and evaluation_interval ?
>> prometheus.yml
>>
>> global:
>> scrape_interval: 15s # Set the scrape interval to every 15 seconds. 
>> Default is every 1 minute.
>> evaluation_interval: 15s # Evaluate rules every 15 seconds. The default 
>> is every 1 minute.
>>
>> alerting:
>> alertmanagers:
>>
>> - static_configs:
>> - targets:
>> - ip:port
>>
>> rule_files:
>>
>> - "alerts_rules.yml"
>>
>> scrape_configs:
>>
>> - job_name: "prometheus"
>>   static_configs:
>>   - targets: ["ip:port"]
>>
>> alertmanager.yml
>> global:
>> resolve_timeout: 5m
>> route:
>> group_wait: 5s
>> group_interval: 5m
>> repeat_interval: 15m
>> receiver: webhook_receiver
>> receivers:
>>
>> - name: webhook_receiver
>>   webhook_configs:
>>   - url: 'http://ip:port'
>>     send_resolved: false
>>
>> alerts_rules.yml
>>
>>
>> groups:
>> - name: instance_alerts
>>   rules:
>>   - alert: "Instance Down"
>>     expr: up == 0
>>     # for: 30s
>>     # keep_firing_for: 30s
>>     labels:
>>       severity: "Critical"
>>     annotations:
>>       summary: "Endpoint {{ $labels.instance }} down"
>>       description: "{{ $labels.instance }} of job {{ $labels.job }} has 
>> been down for more than 30 sec."
>>
>> - name: rabbitmq_alerts
>>   rules:
>>     - alert: "Consumer down for last 1 min"
>>       expr: rabbitmq_queue_consumers == 0
>>       # for: 1m
>>       # keep_firing_for: 30s
>>       labels:
>>         severity: Critical
>>       annotations:
>>         summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>         description: "The queue '{{ $labels.queue }}' in vhost '{{ 
>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate 
>> attention is required."
>>
>>
>>     - alert: "Total Messages > 10k in last 1 min"
>>       expr: rabbitmq_queue_messages > 10000
>>       # for: 1m
>>       # keep_firing_for: 30s
>>       labels:
>>         severity: Critical
>>       annotations:
>>         summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages 
>> for more than 1 min."
>>         description: |
>>           Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} 
>> messages for more than 1 min.
>>
>>
>> Thank you in advance.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/368256ba-f5b2-4414-a4f7-aef3c2ddf5b1n%40googlegroups.com.

[prometheus-users] Re: Alerts are getting fire after every minute

Reply via email to