Hi,
I am new to prometheus and looking for some guidance on how to get my
prometheus rule work for the below requirement.
In my environment, all Linux servers are connected to Router-1 group and
all Windows server are connected to Router-2 group. I have configured the
prometheus rules based on the below requirement.
1. When there is a complete outage on a site, it needs to tell just the
site numbers where all the targets are down. So I have configured a rule "
*PowerOutageAlert*" and this is working fine as expected.
2. When the Linux server is down in a site, it needs to show which site
linux servers are down. So I have configured a rule "*LinuxGroup*" and
this is also working fine as expected.
3. When the Windows server is down in a site, it needs to show which
site Windows servers are down. So I have configured a rule "
*WindowsGroup*" and this is also working fine as expected.
*prometheus_rules.yml:*
groups:
- name: PowerOutageAlert
rules:
- alert: *PowerOutageAlert*
expr: |
sum(probe_success{job="blackbox_linux"} or
probe_success{job="blackbox_windows"} or
probe_success{job="blackbox_router-1"} or
probe_success{job="blackbox_router-2"} by (Site) == 0
for: 1m
- name: *LinuxGroup*
rules:
- alert: Linux Servers Down
expr: |
sum(probe_success{job="blackbox_linux"} or
probe_success{job="blackbox_router-1"} by (Site) == 0
for: 1m
- name: *WindowsGroup*
rules:
- alert: Windows Servers Down
expr: |
sum(probe_success{job="blackbox_windows"} or
probe_success{job="blackbox_router-2"}) by (Site) == 0
for: 1m
*Alertmanager.yml:*
route:
group_by: ['alertname']
receiver: ms-teams
group_wait: 1m
group_interval: 1m
repeat_interval: 1m
receivers:
- name: ms-teams
webhook_configs:
- url: 'http://xx.xx.xx.xx:2000/alertmanager'
send_resolved: false
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['Site','instance']
The issue I am facing now is:
1. When there is a complete outage on a site, I am getting 3 alerts (
*PowerOutageAlert*/*LinuxGroup*/*WindowsGroup*) for the same targets
based on the above configuration. Is there a way I can ignore the matched
targets from "*PowerOutageAlert*" on the "*LinuxGroup*/*WindowsGroup*"
alerts?
2. As per the above setup for "*LinuxGroup*/*WindowsGroup*", it will
throw alert only if the "blackbox_router-1/blackbox_linux" (or)
"blackbox_router-2/blackbox_windows" server both goes down. And it wont
alert if just the Linux/Windows server are down. How can I achieve it
getting all alerts even if routers are up?
On a Shell script I can achieve this by using "if else" conditions but I am
not sure how to use the same logics in the prometheus. Any help is really
appreciated.
Thanks
Sandosh
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/ba126a60-0c10-47d1-8e14-4b4833e15dd1n%40googlegroups.com.