Okay.
Did some digging on the internet, changed my yml file and it works now.
For as far as i can see ;)
global:
route:
group_by: [instance,severity]
receiver: 'default'
routes:
- match:
severity: warning
receiver: 'mail'
- match:
severity: critical
receiver: 'all'
receivers:
- name: 'default'
email_configs:
- to: '[email protected]' ##fill in your email
from: '[email protected]'
smarthost: 'localhost:25'
require_tls: false
- name: 'mail'
email_configs:
- to: '[email protected]' ##fill in your email
from: '[email protected]'
smarthost: 'localhost:25'
require_tls: false
- name: 'all'
email_configs:
- to: '[email protected]' ##fill in your email
from: '[email protected]'
smarthost: 'localhost:25'
require_tls: false
- name: 'webhook'
webhook_configs:
- url: 'http://127.0.0.1:9000'
Now there are some things left that i need to figure out like: Sending to
multiple email adresses (or recievers) and using the webhook in a correct
way (for instance if node is down then webhook with parameters)
Op woensdag 8 april 2020 07:11:12 UTC+2 schreef Danny de Waard:
> Okay i think i got some log. Just not sure what it means....
>
> level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:104
> component=dispatcher msg="Received alert"
> alert=swap_usage_java_high[d346adb][active]
> level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:432
> component=dispatcher
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
> alerts=[swap_usage_java_high[d346adb][active]]
> level=debug ts=2020-04-08T05:08:38.630Z caller=dispatch.go:432
> component=dispatcher
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
> alerts=[swap_usage_java_high[d346adb][active]]
> level=debug ts=2020-04-08T05:08:39.630Z caller=dispatch.go:432
> component=dispatcher
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
> alerts=[swap_usage_java_high[d346adb][active]]
> level=debug ts=2020-04-08T05:08:40.630Z caller=dispatch.go:432
> component=dispatcher
> aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
> alerts=[swap_usage_java_high[d346adb][active]]
> and this last line keeps comming.
>
> Op dinsdag 7 april 2020 18:00:59 UTC+2 schreef Matthias Rampke:
>>
>> What do the alertmanager logs say? If you don't see anything, increase
>> verbosity until you can see Alertmanager receiving the alert and trying to
>> send the notification. At sufficient verbosity, you should be able to trace
>> exactly what it is trying and/or failing to do.
>>
>> /MR
>>
>> On Tue, Apr 7, 2020 at 8:52 AM Danny de Waard <[email protected]> wrote:
>>
>>> I'm having som troubles setting up the alertmanager.
>>>
>>> I have set up a rules file in prometheus (see blow) and a setting file
>>> for alertmanager (aslo below)
>>> In Alertmanager i see the active alert for swapusage java
>>>
>>> instance="lsrv0008"+
>>> 1 alert
>>>
>>> - 06:49:37, 2020-04-07 (UTC)InfoSource
>>>
>>> <http://lsrv2289.linux.rabobank.nl:9090/graph?g0.expr=swapusage_stats%7Bapplication%3D%22java%22%7D+%3E+500000&g0.tab=1>
>>> Silence
>>>
>>> <http://lsrv2289.linux.rabobank.nl:9093/#/silences/new?filter=%7Balertname%3D%22swap_usage_java_high%22%2C%20application%3D%22java%22%2C%20exportertype%3D%22node_exporter%22%2C%20host%3D%22lsrv0008%22%2C%20instance%3D%22lsrv0008%22%2C%20job%3D%22PROD%22%2C%20monitor%3D%22codelab-monitor%22%2C%20quantity%3D%22kB%22%2C%20severity%3D%22warning%22%7D>
>>> alertname="swap_usage_java_high"+
>>> application="java"+
>>> exportertype="node_exporter"+
>>> host="lsrv0008"+
>>> job="PROD"+
>>> monitor="codelab-monitor"+
>>> quantity="kB"+
>>> severity="warning"+
>>>
>>> But the mail is not send by alertmanager…. what am i missing?
>>>
>>> Prometheus rules file
>>> groups:
>>> - name: targets
>>> rules:
>>> - alert: monitor_service_down
>>> expr: up == 0
>>> for: 40s
>>> labels:
>>> severity: critical
>>> annotations:
>>> summary: "Monitor service non-operational"
>>> description: "Service {{ $labels.instance }} is down."
>>> - alert: server_down
>>> expr: probe_success == 0
>>> for: 30s
>>> labels:
>>> severity: critical
>>> annotations:
>>> summary: "Server is down (no probes are up)"
>>> description: "Server {{ $labels.instance }} is down."
>>> - alert: loadbalancer_down
>>> expr: loadbalancer_stats < 1
>>> for: 30s
>>> labels:
>>> severity: critical
>>> annotations:
>>> summary: "A loadbalancer is down"
>>> description: "Loadbalancer for {{ $labels.instance }} is down."
>>> - name: host
>>> rules:
>>> - alert: high_cpu_load1
>>> expr: node_load1 > 8.0
>>> for: 300s
>>> labels:
>>> severity: warning
>>> annotations:
>>> summary: "Server under high load (load 1m) for 5 minutes"
>>> description: "Host is under high load, the avg load 1m is at {{
>>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
>>> }}."
>>> - alert: high_cpu_load5
>>> expr: node_load5 > 5.0
>>> for: 600s
>>> labels:
>>> severity: warning
>>> annotations:
>>> summary: "Server under high load (load 5m) for 10 minutes."
>>> description: "Host is under high load, the avg load 5m is at {{
>>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
>>> }}."
>>> - alert: high_cpu_load15
>>> expr: node_load15 > 4.5
>>> for: 900s
>>> labels:
>>> severity: critical
>>> annotations:
>>> summary: "Server under high load (load 15m) for 15 minutes."
>>> description: "Host is under high load, the avg load 15m is at {{
>>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
>>> }}."
>>> - alert: high_volume_workers_prod
>>> expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 325
>>> for: 30s
>>> labels:
>>> severity: warning
>>> annotations:
>>> summary: "Number of workers above 325 for 30s"
>>> description: "The Apache workers are over 325 for 30s. Current
>>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
>>> $labels.job }}."
>>> - alert: medium_volume_workers_prod
>>> expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 300
>>> for: 30s
>>> labels:
>>> severity: warning
>>> annotations:
>>> summary: "Number of workers above 300 for 30s"
>>> description: "The Apache workers are over 300 for 30s. Current
>>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
>>> $labels.job }}."
>>> - alert: swap_usage_java_high
>>> expr: swapusage_stats{application="java"} > 500000
>>> for: 300s
>>> labels:
>>> severity: warning
>>> annotations:
>>> summary: "Swap usage for Java is high for the last 5 minutes"
>>> description: "The swap usage for the java process are hig. Current
>>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
>>> $labels.job }}."
>>>
>>>
>>>
>>> Alertmanager setupfile
>>> global:
>>> resolve_timeout: 5m
>>> http_config: {}
>>> smtp_from: [email protected]
>>> smtp_hello: localhost
>>> smtp_smarthost: localhost:25
>>> smtp_require_tls: true
>>> pagerduty_url: https://events.pagerduty.com/v2/enqueue
>>> hipchat_api_url: https://api.hipchat.com/
>>> opsgenie_api_url: https://api.opsgenie.com/
>>> wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
>>> victorops_api_url: https://
>>> alert.victorops.com/integrations/generic/20131114/alert/
>>> route:
>>> receiver: default
>>> group_by:
>>> - instance
>>> routes:
>>> - receiver: mail
>>> match:
>>> severity: warning
>>> - receiver: all
>>> match:
>>> severity: critical
>>> group_wait: 1s
>>> group_interval: 1s
>>> receivers:
>>> - name: default
>>> - name: mail
>>> email_configs:
>>> - send_resolved: true
>>> to: [email protected]
>>> from: [email protected]
>>> hello: localhost
>>> smarthost: localhost:25
>>> headers:
>>> From: [email protected]
>>> Subject: '{{ template "email.default.subject" . }}'
>>> To: [email protected]
>>> html: '{{ template "email.default.html" . }}'
>>> require_tls: false
>>> - name: all
>>> email_configs:
>>> - send_resolved: true
>>> to: [email protected]
>>> from: [email protected]
>>> hello: localhost
>>> smarthost: localhost:25
>>> headers:
>>> From: [email protected]
>>> Subject: '{{ template "email.default.subject" . }}'
>>> To: [email protected]
>>> html: '{{ template "email.default.html" . }}'
>>> require_tls: false
>>> - send_resolved: true
>>> to: [email protected]
>>> from: [email protected]
>>> hello: localhost
>>> smarthost: localhost:25
>>> headers:
>>> From: [email protected]
>>> Subject: '{{ template "email.default.subject" . }}'
>>> To: [email protected]
>>> html: '{{ template "email.default.html" . }}'
>>> require_tls: false
>>> - name: webhook
>>> webhook_configs:
>>> - send_resolved: true
>>> http_config: {}
>>> url: http://127.0.0.1:9000
>>> templates: []
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/6ae61aac-db17-4a5f-adee-ece8a0733af5%40googlegroups.com.