[prometheus-users] Alertmanager not firing when alert is active

Danny de Waard Tue, 07 Apr 2020 01:52:26 -0700

I'm having som troubles setting up the alertmanager.

I have set up a rules file in prometheus (see blow) and a setting file for 
alertmanager (aslo below)
In Alertmanager i see the active alert for swapusage java


instance="lsrv0008"+
1 alert

   - 06:49:37, 2020-04-07 (UTC)InfoSource 
   
<http://lsrv2289.linux.rabobank.nl:9090/graph?g0.expr=swapusage_stats%7Bapplication%3D%22java%22%7D+%3E+500000&g0.tab=1>
   Silence 
   
<http://lsrv2289.linux.rabobank.nl:9093/#/silences/new?filter=%7Balertname%3D%22swap_usage_java_high%22%2C%20application%3D%22java%22%2C%20exportertype%3D%22node_exporter%22%2C%20host%3D%22lsrv0008%22%2C%20instance%3D%22lsrv0008%22%2C%20job%3D%22PROD%22%2C%20monitor%3D%22codelab-monitor%22%2C%20quantity%3D%22kB%22%2C%20severity%3D%22warning%22%7D>
   alertname="swap_usage_java_high"+
   application="java"+
   exportertype="node_exporter"+
   host="lsrv0008"+
   job="PROD"+
   monitor="codelab-monitor"+
   quantity="kB"+
   severity="warning"+
   
But the mail is not send by alertmanager…. what am i missing?

Prometheus rules file
groups:
- name: targets
  rules:
  - alert: monitor_service_down
    expr: up == 0
    for: 40s
    labels:
      severity: critical
    annotations:
      summary: "Monitor service non-operational"
      description: "Service {{ $labels.instance }} is down."
  - alert: server_down
    expr: probe_success == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Server is down (no probes are up)"
      description: "Server {{ $labels.instance }} is down."
  - alert: loadbalancer_down
    expr: loadbalancer_stats < 1
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "A loadbalancer is down"
      description: "Loadbalancer for {{ $labels.instance }} is down."
- name: host
  rules:
  - alert: high_cpu_load1
    expr: node_load1 > 8.0
    for: 300s
    labels:
      severity: warning
    annotations:
      summary: "Server under high load (load 1m) for 5 minutes"
      description: "Host is under high load, the avg load 1m is at {{ 
$value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job 
}}."
  - alert: high_cpu_load5
    expr: node_load5 > 5.0
    for: 600s
    labels:
      severity: warning
    annotations:
      summary: "Server under high load (load 5m) for 10 minutes."
      description: "Host is under high load, the avg load 5m is at {{ 
$value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job 
}}."
  - alert: high_cpu_load15
    expr: node_load15 > 4.5
    for: 900s
    labels:
      severity: critical
    annotations:
      summary: "Server under high load (load 15m) for 15 minutes."
      description: "Host is under high load, the avg load 15m is at {{ 
$value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job 
}}."
  - alert: high_volume_workers_prod
    expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 325
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Number of workers above 325 for 30s"
      description: "The Apache workers are over 325 for 30s. Current value 
is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ 
$labels.job }}."
  - alert: medium_volume_workers_prod
    expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 300
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Number of workers above 300 for 30s"
      description: "The Apache workers are over 300 for 30s. Current value 
is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ 
$labels.job }}."
  - alert: swap_usage_java_high
    expr: swapusage_stats{application="java"} > 500000
    for: 300s
    labels:
      severity: warning
    annotations:
      summary: "Swap usage for Java is high for the last 5 minutes"
      description: "The swap usage for the java process are hig. Current 
value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ 
$labels.job }}."



Alertmanager setupfile
global:
  resolve_timeout: 5m
  http_config: {}
  smtp_from: [email protected]
  smtp_hello: localhost
  smtp_smarthost: localhost:25
  smtp_require_tls: true
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  hipchat_api_url: https://api.hipchat.com/
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https:
//alert.victorops.com/integrations/generic/20131114/alert/
route:
  receiver: default
  group_by:
  - instance
  routes:
  - receiver: mail
    match:
      severity: warning
  - receiver: all
    match:
      severity: critical
  group_wait: 1s
  group_interval: 1s
receivers:
- name: default
- name: mail
  email_configs:
  - send_resolved: true
    to: [email protected]
    from: [email protected]
    hello: localhost
    smarthost: localhost:25
    headers:
      From: [email protected]
      Subject: '{{ template "email.default.subject" . }}'
      To: [email protected]
    html: '{{ template "email.default.html" . }}'
    require_tls: false
- name: all
  email_configs:
  - send_resolved: true
    to: [email protected]
    from: [email protected]
    hello: localhost
    smarthost: localhost:25
    headers:
      From: [email protected]
      Subject: '{{ template "email.default.subject" . }}'
      To: [email protected]
    html: '{{ template "email.default.html" . }}'
    require_tls: false
  - send_resolved: true
    to: [email protected]
    from: [email protected]
    hello: localhost
    smarthost: localhost:25
    headers:
      From: [email protected]
      Subject: '{{ template "email.default.subject" . }}'
      To: [email protected]
    html: '{{ template "email.default.html" . }}'
    require_tls: false
- name: webhook
  webhook_configs:
  - send_resolved: true
    http_config: {}
    url: http://127.0.0.1:9000
templates: []


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com.

[prometheus-users] Alertmanager not firing when alert is active

Reply via email to