Okay i think i got some log. Just not sure what it means....
level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:104
component=dispatcher msg="Received alert"
alert=swap_usage_java_high[d346adb][active]
level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:432
component=dispatcher
aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
alerts=[swap_usage_java_high[d346adb][active]]
level=debug ts=2020-04-08T05:08:38.630Z caller=dispatch.go:432
component=dispatcher
aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
alerts=[swap_usage_java_high[d346adb][active]]
level=debug ts=2020-04-08T05:08:39.630Z caller=dispatch.go:432
component=dispatcher
aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
alerts=[swap_usage_java_high[d346adb][active]]
level=debug ts=2020-04-08T05:08:40.630Z caller=dispatch.go:432
component=dispatcher
aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing
alerts=[swap_usage_java_high[d346adb][active]]
and this last line keeps comming.
Op dinsdag 7 april 2020 18:00:59 UTC+2 schreef Matthias Rampke:
>
> What do the alertmanager logs say? If you don't see anything, increase
> verbosity until you can see Alertmanager receiving the alert and trying to
> send the notification. At sufficient verbosity, you should be able to trace
> exactly what it is trying and/or failing to do.
>
> /MR
>
> On Tue, Apr 7, 2020 at 8:52 AM Danny de Waard <[email protected]
> <javascript:>> wrote:
>
>> I'm having som troubles setting up the alertmanager.
>>
>> I have set up a rules file in prometheus (see blow) and a setting file
>> for alertmanager (aslo below)
>> In Alertmanager i see the active alert for swapusage java
>>
>> instance="lsrv0008"+
>> 1 alert
>>
>> - 06:49:37, 2020-04-07 (UTC)InfoSource
>>
>> <http://lsrv2289.linux.rabobank.nl:9090/graph?g0.expr=swapusage_stats%7Bapplication%3D%22java%22%7D+%3E+500000&g0.tab=1>
>> Silence
>>
>> <http://lsrv2289.linux.rabobank.nl:9093/#/silences/new?filter=%7Balertname%3D%22swap_usage_java_high%22%2C%20application%3D%22java%22%2C%20exportertype%3D%22node_exporter%22%2C%20host%3D%22lsrv0008%22%2C%20instance%3D%22lsrv0008%22%2C%20job%3D%22PROD%22%2C%20monitor%3D%22codelab-monitor%22%2C%20quantity%3D%22kB%22%2C%20severity%3D%22warning%22%7D>
>> alertname="swap_usage_java_high"+
>> application="java"+
>> exportertype="node_exporter"+
>> host="lsrv0008"+
>> job="PROD"+
>> monitor="codelab-monitor"+
>> quantity="kB"+
>> severity="warning"+
>>
>> But the mail is not send by alertmanager…. what am i missing?
>>
>> Prometheus rules file
>> groups:
>> - name: targets
>> rules:
>> - alert: monitor_service_down
>> expr: up == 0
>> for: 40s
>> labels:
>> severity: critical
>> annotations:
>> summary: "Monitor service non-operational"
>> description: "Service {{ $labels.instance }} is down."
>> - alert: server_down
>> expr: probe_success == 0
>> for: 30s
>> labels:
>> severity: critical
>> annotations:
>> summary: "Server is down (no probes are up)"
>> description: "Server {{ $labels.instance }} is down."
>> - alert: loadbalancer_down
>> expr: loadbalancer_stats < 1
>> for: 30s
>> labels:
>> severity: critical
>> annotations:
>> summary: "A loadbalancer is down"
>> description: "Loadbalancer for {{ $labels.instance }} is down."
>> - name: host
>> rules:
>> - alert: high_cpu_load1
>> expr: node_load1 > 8.0
>> for: 300s
>> labels:
>> severity: warning
>> annotations:
>> summary: "Server under high load (load 1m) for 5 minutes"
>> description: "Host is under high load, the avg load 1m is at {{
>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
>> }}."
>> - alert: high_cpu_load5
>> expr: node_load5 > 5.0
>> for: 600s
>> labels:
>> severity: warning
>> annotations:
>> summary: "Server under high load (load 5m) for 10 minutes."
>> description: "Host is under high load, the avg load 5m is at {{
>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
>> }}."
>> - alert: high_cpu_load15
>> expr: node_load15 > 4.5
>> for: 900s
>> labels:
>> severity: critical
>> annotations:
>> summary: "Server under high load (load 15m) for 15 minutes."
>> description: "Host is under high load, the avg load 15m is at {{
>> $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job
>> }}."
>> - alert: high_volume_workers_prod
>> expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 325
>> for: 30s
>> labels:
>> severity: warning
>> annotations:
>> summary: "Number of workers above 325 for 30s"
>> description: "The Apache workers are over 325 for 30s. Current
>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
>> $labels.job }}."
>> - alert: medium_volume_workers_prod
>> expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 300
>> for: 30s
>> labels:
>> severity: warning
>> annotations:
>> summary: "Number of workers above 300 for 30s"
>> description: "The Apache workers are over 300 for 30s. Current
>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
>> $labels.job }}."
>> - alert: swap_usage_java_high
>> expr: swapusage_stats{application="java"} > 500000
>> for: 300s
>> labels:
>> severity: warning
>> annotations:
>> summary: "Swap usage for Java is high for the last 5 minutes"
>> description: "The swap usage for the java process are hig. Current
>> value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{
>> $labels.job }}."
>>
>>
>>
>> Alertmanager setupfile
>> global:
>> resolve_timeout: 5m
>> http_config: {}
>> smtp_from: [email protected]
>> smtp_hello: localhost
>> smtp_smarthost: localhost:25
>> smtp_require_tls: true
>> pagerduty_url: https://events.pagerduty.com/v2/enqueue
>> hipchat_api_url: https://api.hipchat.com/
>> opsgenie_api_url: https://api.opsgenie.com/
>> wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
>> victorops_api_url: https://
>> alert.victorops.com/integrations/generic/20131114/alert/
>> route:
>> receiver: default
>> group_by:
>> - instance
>> routes:
>> - receiver: mail
>> match:
>> severity: warning
>> - receiver: all
>> match:
>> severity: critical
>> group_wait: 1s
>> group_interval: 1s
>> receivers:
>> - name: default
>> - name: mail
>> email_configs:
>> - send_resolved: true
>> to: [email protected]
>> from: [email protected]
>> hello: localhost
>> smarthost: localhost:25
>> headers:
>> From: [email protected]
>> Subject: '{{ template "email.default.subject" . }}'
>> To: [email protected]
>> html: '{{ template "email.default.html" . }}'
>> require_tls: false
>> - name: all
>> email_configs:
>> - send_resolved: true
>> to: [email protected]
>> from: [email protected]
>> hello: localhost
>> smarthost: localhost:25
>> headers:
>> From: [email protected]
>> Subject: '{{ template "email.default.subject" . }}'
>> To: [email protected]
>> html: '{{ template "email.default.html" . }}'
>> require_tls: false
>> - send_resolved: true
>> to: [email protected]
>> from: [email protected]
>> hello: localhost
>> smarthost: localhost:25
>> headers:
>> From: [email protected]
>> Subject: '{{ template "email.default.subject" . }}'
>> To: [email protected]
>> html: '{{ template "email.default.html" . }}'
>> require_tls: false
>> - name: webhook
>> webhook_configs:
>> - send_resolved: true
>> http_config: {}
>> url: http://127.0.0.1:9000
>> templates: []
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/890fe817-5cdb-44ef-8446-70b9a0e93e76%40googlegroups.com.