I'm trying to use the group_wait parameter in order to allow Alertmanager
to wait for all the alerts received from Prometheus, group them and send a
single notification.
I have the following configuration:
route:
receiver: default-receiver
group_by:
- alertname
- environment
continue: false
group_wait: 5m
group_interval: 20m
repeat_interval: 1d
receivers:
- name: default-receiver
email_configs:
- send_resolved: true
to: [email protected]
from: [email protected]
hello: localhost
smarthost: smptserver:25
Although the group_wait parameter is set to 5 minutes, as soon as
Alertmanager receives the alerts from Prometheus, it flushes the alerts and
also sends a notification to the configured receiver. I would expect
Alertmanager to delay the notification message and send it after 5 minutes
(value of group_wait parameter).
ts=2022-11-22T12:37:19.367Z caller=cluster.go:705 level=info
component=cluster msg="gossip not settled" polls=0 before=0 now=1
elapsed=2.000781422s
ts=2022-11-22T12:37:21.368Z caller=cluster.go:702 level=debug
component=cluster msg="gossip looks settled" elapsed=4.001197371s
ts=2022-11-22T12:37:23.368Z caller=cluster.go:702 level=debug
component=cluster msg="gossip looks settled" elapsed=6.001883916s
ts=2022-11-22T12:37:25.369Z caller=cluster.go:702 level=debug
component=cluster msg="gossip looks settled" elapsed=8.00222292s
ts=2022-11-22T12:37:27.369Z caller=cluster.go:697 level=info
component=cluster msg="gossip settled; proceeding" elapsed=10.002782746s
ts=2022-11-22T12:37:42.811Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=file_not_processed[c0e2772][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=file_not_processed[64605a5][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=file_not_processed[e70ae18][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=file_not_processed[7325965][active]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:517 level=debug
component=dispatcher aggrGroup="{}:{alertname=\"file_not_processed\",
environment=\"ACC\"}" msg=flushing
alerts=[file_not_processed[c0e2772][active]]
ts=2022-11-22T12:37:42.812Z caller=dispatch.go:517 level=debug
component=dispatcher aggrGroup="{}:{alertname=\"file_not_processed\",
environment=\"DEV\"}" msg=flushing
alerts="[file_not_processed[64605a5][active]
file_not_processed[e70ae18][active] file_not_processed[7325965][active]]"
ts=2022-11-22T12:37:42.883Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=webhook[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:42.914Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=webhook[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.031Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=email[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.031Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=email[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.660Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=locked_oracle_accounts[bcc49ac][active]
ts=2022-11-22T12:37:43.660Z caller=dispatch.go:517 level=debug
component=dispatcher aggrGroup="{}:{alertname=\"locked_oracle_accounts\",
environment=\"DEV\"}" msg=flushing
alerts=[locked_oracle_accounts[bcc49ac][active]]
ts=2022-11-22T12:37:43.704Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=webhook[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:43.840Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=email[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:58.355Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=sdl_critical_services_down[7b9c988][active]
ts=2022-11-22T12:37:58.355Z caller=dispatch.go:517 level=debug
component=dispatcher
aggrGroup="{}:{alertname=\"sdl_critical_services_down\",
environment=\"TST\"}" msg=flushing
alerts=[sdl_critical_services_down[7b9c988][active]]
ts=2022-11-22T12:37:58.398Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=webhook[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:37:58.416Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=sdl_critical_services_down[7b9c988][active]
ts=2022-11-22T12:37:58.494Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=email[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:38:02.724Z caller=dispatch.go:165 level=debug
component=dispatcher msg="Received alert"
alert=edl_instance_down[49003d1][active]
ts=2022-11-22T12:38:02.724Z caller=dispatch.go:517 level=debug
component=dispatcher aggrGroup="{}:{alertname=\"edl_instance_down\",
environment=\"ACC\"}" msg=flushing
alerts=[edl_instance_down[49003d1][active]]
ts=2022-11-22T12:38:02.765Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=webhook[0]
msg="Notify success" attempts=1
ts=2022-11-22T12:38:02.876Z caller=notify.go:743 level=debug
component=dispatcher receiver=default-receiver integration=email[0]
msg="Notify success" attempts=1
I expect Alertmanager to group the alerts from Prometheus and send after 5
minutes (group_wait value) 1 single notification that contains all the
grouped alerts. In my case it seems like group_wait parameter is not
considered and as soon as the alert is received from Prometheus, a
notification to the receiver is sent immediately after. Due to this
behavior, alertmanager won't have time to group all the alerts of the same
type (based on my group_by filters) and i will have multiple notifications
for the same alerts at a new evaluation interval period (group_interval).
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/987ea2b3-8283-467f-a73e-d8c0bc3abde4n%40googlegroups.com.