Re: [prometheus-users] Alertmanager not firing when alert is active

Matthias Rampke Tue, 07 Apr 2020 09:01:18 -0700

What do the alertmanager logs say? If you don't see anything, increase
verbosity until you can see Alertmanager receiving the alert and trying to
send the notification. At sufficient verbosity, you should be able to trace
exactly what it is trying and/or failing to do.


/MR

On Tue, Apr 7, 2020 at 8:52 AM Danny de Waard <[email protected]> wrote:

> I'm having som troubles setting up the alertmanager.
>
> I have set up a rules file in prometheus (see blow) and a setting file for
> alertmanager (aslo below)
> In Alertmanager i see the active alert for swapusage java
>
> instance="lsrv0008"+
> 1 alert
>
>    - 06:49:37, 2020-04-07 (UTC)InfoSource
>    
> <http://lsrv2289.linux.rabobank.nl:9090/graph?g0.expr=swapusage_stats%7Bapplication%3D%22java%22%7D+%3E+500000&g0.tab=1>
>    Silence
>    
> <http://lsrv2289.linux.rabobank.nl:9093/#/silences/new?filter=%7Balertname%3D%22swap_usage_java_high%22%2C%20application%3D%22java%22%2C%20exportertype%3D%22node_exporter%22%2C%20host%3D%22lsrv0008%22%2C%20instance%3D%22lsrv0008%22%2C%20job%3D%22PROD%22%2C%20monitor%3D%22codelab-monitor%22%2C%20quantity%3D%22kB%22%2C%20severity%3D%22warning%22%7D>
>    alertname="swap_usage_java_high"+
>    application="java"+
>    exportertype="node_exporter"+
>    host="lsrv0008"+
>    job="PROD"+
>    monitor="codelab-monitor"+
>    quantity="kB"+
>    severity="warning"+
>
> But the mail is not send by alertmanager…. what am i missing?
>
> Prometheus rules file
> groups:
> - name: targets
>   rules:
>   - alert: monitor_service_down
>     expr: up == 0
>     for: 40s
>     labels:
>       severity: critical
>     annotations:
>       summary: "Monitor service non-operational"
>       description: "Service {{ $labels.instance }} is down."
>   - alert: server_down
>     expr: probe_success == 0
>     for: 30s
>     labels:
>       severity: critical
>     annotations:
>       summary: "Server is down (no probes are up)"
>       description: "Server {{ $labels.instance }} is down."
>   - alert: loadbalancer_down
>     expr: loadbalancer_stats < 1
>     for: 30s
>     labels:
>       severity: critical
>     annotations:
>       summary: "A loadbalancer is down"
>       description: "Loadbalancer for {{ $labels.instance }} is down."
> - name: host
>   rules:
>   - alert: high_cpu_load1
>     expr: node_load1 > 8.0
>     for: 300s
>     labels:
>       severity: warning
>     annotations:
>       summary: "Server under high load (load 1m) for 5 minutes"
>       description: "Host is under high load, the avg load 1m is at {{
> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
> }}."
>   - alert: high_cpu_load5
>     expr: node_load5 > 5.0
>     for: 600s
>     labels:
>       severity: warning
>     annotations:
>       summary: "Server under high load (load 5m) for 10 minutes."
>       description: "Host is under high load, the avg load 5m is at {{
> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
> }}."
>   - alert: high_cpu_load15
>     expr: node_load15 > 4.5
>     for: 900s
>     labels:
>       severity: critical
>     annotations:
>       summary: "Server under high load (load 15m) for 15 minutes."
>       description: "Host is under high load, the avg load 15m is at {{
> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
> }}."
>   - alert: high_volume_workers_prod
>     expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 325
>     for: 30s
>     labels:
>       severity: warning
>     annotations:
>       summary: "Number of workers above 325 for 30s"
>       description: "The Apache workers are over 325 for 30s. Current value
> is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
> $labels.job }}."
>   - alert: medium_volume_workers_prod
>     expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 300
>     for: 30s
>     labels:
>       severity: warning
>     annotations:
>       summary: "Number of workers above 300 for 30s"
>       description: "The Apache workers are over 300 for 30s. Current value
> is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
> $labels.job }}."
>   - alert: swap_usage_java_high
>     expr: swapusage_stats{application="java"} > 500000
>     for: 300s
>     labels:
>       severity: warning
>     annotations:
>       summary: "Swap usage for Java is high for the last 5 minutes"
>       description: "The swap usage for the java process are hig. Current
> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
> $labels.job }}."
>
>
>
> Alertmanager setupfile
> global:
>   resolve_timeout: 5m
>   http_config: {}
>   smtp_from: [email protected]
>   smtp_hello: localhost
>   smtp_smarthost: localhost:25
>   smtp_require_tls: true
>   pagerduty_url: https://events.pagerduty.com/v2/enqueue
>   hipchat_api_url: https://api.hipchat.com/
>   opsgenie_api_url: https://api.opsgenie.com/
>   wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
>   victorops_api_url: https://
> alert.victorops.com/integrations/generic/20131114/alert/
> route:
>   receiver: default
>   group_by:
>   - instance
>   routes:
>   - receiver: mail
>     match:
>       severity: warning
>   - receiver: all
>     match:
>       severity: critical
>   group_wait: 1s
>   group_interval: 1s
> receivers:
> - name: default
> - name: mail
>   email_configs:
>   - send_resolved: true
>     to: [email protected]
>     from: [email protected]
>     hello: localhost
>     smarthost: localhost:25
>     headers:
>       From: [email protected]
>       Subject: '{{ template "email.default.subject" . }}'
>       To: [email protected]
>     html: '{{ template "email.default.html" . }}'
>     require_tls: false
> - name: all
>   email_configs:
>   - send_resolved: true
>     to: [email protected]
>     from: [email protected]
>     hello: localhost
>     smarthost: localhost:25
>     headers:
>       From: [email protected]
>       Subject: '{{ template "email.default.subject" . }}'
>       To: [email protected]
>     html: '{{ template "email.default.html" . }}'
>     require_tls: false
>   - send_resolved: true
>     to: [email protected]
>     from: [email protected]
>     hello: localhost
>     smarthost: localhost:25
>     headers:
>       From: [email protected]
>       Subject: '{{ template "email.default.subject" . }}'
>       To: [email protected]
>     html: '{{ template "email.default.html" . }}'
>     require_tls: false
> - name: webhook
>   webhook_configs:
>   - send_resolved: true
>     http_config: {}
>     url: http://127.0.0.1:9000
> templates: []
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gZJA0d8cDPbqK5rw%2BAvb5rpa%3DzbZP%3DFByr5t4FqSg1M3w%40mail.gmail.com.

Re: [prometheus-users] Alertmanager not firing when alert is active

Reply via email to