Re: [prometheus-users] Alertmanager not firing when alert is active

Danny de Waard Wed, 08 Apr 2020 02:37:54 -0700

Okay.

Did some digging on the internet, changed my yml file and it works now.
For as far as i can see ;)


global:
route:
  group_by: [instance,severity]
  receiver: 'default'
  routes:
   - match:
      severity: warning
     receiver: 'mail'
   - match:
      severity: critical
     receiver: 'all'
receivers:
  - name: 'default'
    email_configs:
     - to: '[email protected]' ##fill in your email
       from: '[email protected]'
       smarthost: 'localhost:25'
       require_tls: false
  - name: 'mail'
    email_configs:
     - to: '[email protected]' ##fill in your email
       from: '[email protected]'
       smarthost: 'localhost:25'
       require_tls: false
  - name: 'all'
    email_configs:
     - to: '[email protected]' ##fill in your email
       from: '[email protected]'
       smarthost: 'localhost:25'
       require_tls: false
  - name: 'webhook'
    webhook_configs:
      - url: 'http://127.0.0.1:9000'

Now there are some things left that i need to figure out like: Sending to 
multiple email adresses (or recievers) and using the webhook in a correct 
way (for instance if node is down then webhook with parameters)

Op woensdag 8 april 2020 07:11:12 UTC+2 schreef Danny de Waard:

> Okay i think i got some log. Just not sure what it means....
>
> level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:104 
> component=dispatcher msg="Received alert" 
> alert=swap_usage_java_high[d346adb][active]
> level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:432 
> component=dispatcher 
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing 
> alerts=[swap_usage_java_high[d346adb][active]]
> level=debug ts=2020-04-08T05:08:38.630Z caller=dispatch.go:432 
> component=dispatcher 
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing 
> alerts=[swap_usage_java_high[d346adb][active]]
> level=debug ts=2020-04-08T05:08:39.630Z caller=dispatch.go:432 
> component=dispatcher 
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing 
> alerts=[swap_usage_java_high[d346adb][active]]
> level=debug ts=2020-04-08T05:08:40.630Z caller=dispatch.go:432 
> component=dispatcher 
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing 
> alerts=[swap_usage_java_high[d346adb][active]]
> and this last line keeps comming.
>
> Op dinsdag 7 april 2020 18:00:59 UTC+2 schreef Matthias Rampke:
>>
>> What do the alertmanager logs say? If you don't see anything, increase 
>> verbosity until you can see Alertmanager receiving the alert and trying to 
>> send the notification. At sufficient verbosity, you should be able to trace 
>> exactly what it is trying and/or failing to do.
>>
>> /MR
>>
>> On Tue, Apr 7, 2020 at 8:52 AM Danny de Waard <[email protected]> wrote:
>>
>>> I'm having som troubles setting up the alertmanager.
>>>
>>> I have set up a rules file in prometheus (see blow) and a setting file 
>>> for alertmanager (aslo below)
>>> In Alertmanager i see the active alert for swapusage java
>>>
>>> instance="lsrv0008"+
>>> 1 alert
>>>
>>>    - 06:49:37, 2020-04-07 (UTC)InfoSource 
>>>    
>>> <http://lsrv2289.linux.rabobank.nl:9090/graph?g0.expr=swapusage_stats%7Bapplication%3D%22java%22%7D+%3E+500000&g0.tab=1>
>>>    Silence 
>>>    
>>> <http://lsrv2289.linux.rabobank.nl:9093/#/silences/new?filter=%7Balertname%3D%22swap_usage_java_high%22%2C%20application%3D%22java%22%2C%20exportertype%3D%22node_exporter%22%2C%20host%3D%22lsrv0008%22%2C%20instance%3D%22lsrv0008%22%2C%20job%3D%22PROD%22%2C%20monitor%3D%22codelab-monitor%22%2C%20quantity%3D%22kB%22%2C%20severity%3D%22warning%22%7D>
>>>    alertname="swap_usage_java_high"+
>>>    application="java"+
>>>    exportertype="node_exporter"+
>>>    host="lsrv0008"+
>>>    job="PROD"+
>>>    monitor="codelab-monitor"+
>>>    quantity="kB"+
>>>    severity="warning"+
>>>    
>>> But the mail is not send by alertmanager…. what am i missing?
>>>
>>> Prometheus rules file
>>> groups:
>>> - name: targets
>>>   rules:
>>>   - alert: monitor_service_down
>>>     expr: up == 0
>>>     for: 40s
>>>     labels:
>>>       severity: critical
>>>     annotations:
>>>       summary: "Monitor service non-operational"
>>>       description: "Service {{ $labels.instance }} is down."
>>>   - alert: server_down
>>>     expr: probe_success == 0
>>>     for: 30s
>>>     labels:
>>>       severity: critical
>>>     annotations:
>>>       summary: "Server is down (no probes are up)"
>>>       description: "Server {{ $labels.instance }} is down."
>>>   - alert: loadbalancer_down
>>>     expr: loadbalancer_stats < 1
>>>     for: 30s
>>>     labels:
>>>       severity: critical
>>>     annotations:
>>>       summary: "A loadbalancer is down"
>>>       description: "Loadbalancer for {{ $labels.instance }} is down."
>>> - name: host
>>>   rules:
>>>   - alert: high_cpu_load1
>>>     expr: node_load1 > 8.0
>>>     for: 300s
>>>     labels:
>>>       severity: warning
>>>     annotations:
>>>       summary: "Server under high load (load 1m) for 5 minutes"
>>>       description: "Host is under high load, the avg load 1m is at {{ 
>>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job 
>>> }}."
>>>   - alert: high_cpu_load5
>>>     expr: node_load5 > 5.0
>>>     for: 600s
>>>     labels:
>>>       severity: warning
>>>     annotations:
>>>       summary: "Server under high load (load 5m) for 10 minutes."
>>>       description: "Host is under high load, the avg load 5m is at {{ 
>>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job 
>>> }}."
>>>   - alert: high_cpu_load15
>>>     expr: node_load15 > 4.5
>>>     for: 900s
>>>     labels:
>>>       severity: critical
>>>     annotations:
>>>       summary: "Server under high load (load 15m) for 15 minutes."
>>>       description: "Host is under high load, the avg load 15m is at {{ 
>>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job 
>>> }}."
>>>   - alert: high_volume_workers_prod
>>>     expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 325
>>>     for: 30s
>>>     labels:
>>>       severity: warning
>>>     annotations:
>>>       summary: "Number of workers above 325 for 30s"
>>>       description: "The Apache workers are over 325 for 30s. Current 
>>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ 
>>> $labels.job }}."
>>>   - alert: medium_volume_workers_prod
>>>     expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 300
>>>     for: 30s
>>>     labels:
>>>       severity: warning
>>>     annotations:
>>>       summary: "Number of workers above 300 for 30s"
>>>       description: "The Apache workers are over 300 for 30s. Current 
>>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ 
>>> $labels.job }}."
>>>   - alert: swap_usage_java_high
>>>     expr: swapusage_stats{application="java"} > 500000
>>>     for: 300s
>>>     labels:
>>>       severity: warning
>>>     annotations:
>>>       summary: "Swap usage for Java is high for the last 5 minutes"
>>>       description: "The swap usage for the java process are hig. Current 
>>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ 
>>> $labels.job }}."
>>>
>>>
>>>
>>> Alertmanager setupfile
>>> global:
>>>   resolve_timeout: 5m
>>>   http_config: {}
>>>   smtp_from: [email protected]
>>>   smtp_hello: localhost
>>>   smtp_smarthost: localhost:25
>>>   smtp_require_tls: true
>>>   pagerduty_url: https://events.pagerduty.com/v2/enqueue
>>>   hipchat_api_url: https://api.hipchat.com/
>>>   opsgenie_api_url: https://api.opsgenie.com/
>>>   wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
>>>   victorops_api_url: https://
>>> alert.victorops.com/integrations/generic/20131114/alert/
>>> route:
>>>   receiver: default
>>>   group_by:
>>>   - instance
>>>   routes:
>>>   - receiver: mail
>>>     match:
>>>       severity: warning
>>>   - receiver: all
>>>     match:
>>>       severity: critical
>>>   group_wait: 1s
>>>   group_interval: 1s
>>> receivers:
>>> - name: default
>>> - name: mail
>>>   email_configs:
>>>   - send_resolved: true
>>>     to: [email protected]
>>>     from: [email protected]
>>>     hello: localhost
>>>     smarthost: localhost:25
>>>     headers:
>>>       From: [email protected]
>>>       Subject: '{{ template "email.default.subject" . }}'
>>>       To: [email protected]
>>>     html: '{{ template "email.default.html" . }}'
>>>     require_tls: false
>>> - name: all
>>>   email_configs:
>>>   - send_resolved: true
>>>     to: [email protected]
>>>     from: [email protected]
>>>     hello: localhost
>>>     smarthost: localhost:25
>>>     headers:
>>>       From: [email protected]
>>>       Subject: '{{ template "email.default.subject" . }}'
>>>       To: [email protected]
>>>     html: '{{ template "email.default.html" . }}'
>>>     require_tls: false
>>>   - send_resolved: true
>>>     to: [email protected]
>>>     from: [email protected]
>>>     hello: localhost
>>>     smarthost: localhost:25
>>>     headers:
>>>       From: [email protected]
>>>       Subject: '{{ template "email.default.subject" . }}'
>>>       To: [email protected]
>>>     html: '{{ template "email.default.html" . }}'
>>>     require_tls: false
>>> - name: webhook
>>>   webhook_configs:
>>>   - send_resolved: true
>>>     http_config: {}
>>>     url: http://127.0.0.1:9000
>>> templates: []
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6ae61aac-db17-4a5f-adee-ece8a0733af5%40googlegroups.com.

Re: [prometheus-users] Alertmanager not firing when alert is active

Reply via email to