alertmanager: 0.21.0
prometheus: 2.30.3

I am trying to get my head around some unexpected alertmanager behaviour.

I am alerting on the following metrics:

client_disconnect{appenv="testbed",conn="2",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="3",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="4",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="5",compid="CLIENT-A"} 0

and have the rule below defined:

    - alert: Client Disconnect
      expr: client_disconnect == 1
      for: 2s
      labels:
        severity: critical
        notification: slack
      annotations:
        summary: "Appenv {{ $labels.appenv }} on connection {{ $labels.conn 
}} compid {{ $labels.compid }} down"
        description: "{{ $labels.instance }} disconnect: {{ $labels.appenv 
}} on connection {{ $labels.conn }} compid {{ $labels.compid }}"

My alertmanager config is as below:

global:
  slack_api_url: 'https://hooks.slack.com/services/REDACTED'

route:
  group_wait: 5s
  group_interval: 5s
  group_by: ['section','env']
  repeat_interval: 10m
  receiver: 'default_receiver'

  routes:
    - match:
        notification: slack
      receiver: slack_receiver
      group_by: ['appenv','compid']

receivers:
- name: 'slack_receiver'
  slack_configs:
    - channel: 'monitoring'
      send_resolved: true
      title: '{{ template "custom_title" . }}'
      text: '{{ template "custom_slack_message" . }}'

- name: 'default_receiver'
  webhook_configs:
    - url: http://pi4-1.home:5000
      send_resolved: true

templates:
  - /etc/alertmanager/notifications.tmpl

My custom template results in a message as formatted below being display in 
Slack:

[image: slack1.PNG]
as expected this repeats every 10 mins.

If one of these client_disconnects subsequently resolves, such that the 
metric now looks like this:

client_disconnect{appenv="testbed",conn="2",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="3",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="4",compid="CLIENT-A"} 0
client_disconnect{appenv="testbed",conn="5",compid="CLIENT-A"} 0

Then I receive the following messages:
[image: slack2.PNG]
When the repeat interval comes round (10 mins later) I receive the 
following messages:
[image: slack3.PNG]
The second firing line comes in at 22:02 and the third firing line at 22:03 
(sorry the timestamps only show through a hover over in Slack).

I can't understand this behaviour. I am running single unclustered 
instances of prometheus and alertmanager.

Is anyone in a position to explain this behaviour to me. I get a very 
similar situation if I simply use the webhook instead of slack.

The subsequent repeat (after the last message) shows the current state:
[image: slack4.PNG]

Many thanks.

For reference, my slack templates are below:

{{ define "__single_message_title" }}{{ range .Alerts.Firing }}{{ 
.Labels.alertname }} on {{ .Annotations.identifier }}{{ end }}{{ range 
.Alerts.Resolved }}{{ .Labels.alertname }} on {{ .Annotations.identifier 
}}{{ end }}{{ end }}

{{ define "custom_title" }}[{{ .Status | toUpper }}{{ if eq .Status 
"firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ if or (and (eq (len 
.Alerts.Firing) 1) (eq (len .Alerts.Resolved) 0)) (and (eq (len 
.Alerts.Firing) 0) (eq (len .Alerts.Resolved) 1)) }}{{ template 
"__single_message_title" . }}{{ end }}{{ end }}

{{ define "custom_slack_message" }}
{{ if or (and (eq (len .Alerts.Firing) 1) (eq (len .Alerts.Resolved) 0)) 
(and (eq (len .Alerts.Firing) 0) (eq (len .Alerts.Resolved) 1)) }}
{{ range .Alerts.Firing }}{{ .Annotations.description }}{{ end }}{{ range 
.Alerts.Resolved }}{{ .Annotations.description }}{{ end }}
{{ else }}
{{ if gt (len .Alerts.Firing) 0 }}
*Alerts Firing:*
Client disconnect: {{ .CommonLabels.appenv }} for {{ .CommonLabels.compid 
}}. Connections: {{ range .Alerts.Firing }}{{ .Labels.conn }} {{ end }}have 
failed.
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
*Alerts Resolved:*
Client disconnect: {{ .CommonLabels.appenv }} for {{ .CommonLabels.compid 
}}. Connections: {{ range .Alerts.Resolved }}{{ .Labels.conn }} {{ end 
}}have failed.
{{ end }}
{{ end }}
{{ end }}


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/34e68aff-831f-4ac0-b278-250bec1987a2n%40googlegroups.com.

Reply via email to