Hi
I am using alertmanager to post alerts on slack. Here is the configuration
of my alert:
expr: <a query that takes 5 seconds>
for: 60m
Here are the settings on my alertmanager:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster']
group_interval: 5m
group_wait: 30s
receiver: "slack"
repeat_interval: 12h
To enhance performance, I had created a recording rule so that the 5 second
query takes 100ms.
I have two issues:
1. I was running into an issue where I was getting "toggling" on the slack
channel, meaning that the alert would be in an unresolved state, quickly be
resolved, then go back into an unresolved state. In this case, the alert was
not actually being resolved. When viewing prometheus, the alert would show up,
but when viewing the alertmanager, the alert would periodically disappear than
reappear. Why would the alertmanager lose the alert only to have it reappear
seconds later?
2. What is the behavior for slack to send messages? I would assume that it
would send messages on the following situations:
1. Alert goes into alarm
2. Alert goes out of alarm
3. num_firing on alert either increases or decreases
When I look at my slack channel, despite the alertmanager settings above,
I would see messages posted at the following times:
1. 12:02AM
2. 12:08AM
3. 1:02AM
4. 1:08AM
5. 1:52AM
6. 2:53AM
7. 2:58AM
8. 3:18AM
9. 3:38AM
10. 4:23AM
11. 6:23AM
12. 6:43AM
13. 6:48AM
14. 6:53AM
15. 6:59AM
16. 8:39AM
17. 8:54AM
18. 9:04AM
19. 9:19AM
In summary, I had 2 questions:
1. Why would alertmanager be dropping alerts?
2. Why is the alertmanager sending messages to slack at non-determinant
times?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/82c1788f-e3be-4c28-b79b-f4fa7bcc1265%40googlegroups.com.