We are using blackbox exporter on a remote location to monitor gateway
routers, hypervisors and virtual machines (router —> hypervisor —> virtual
machines). We are looking for something like below.
*Example 1:*
If a gateway router is down and alertmanager is firing, it should stop
alerting on hypervisor hosts and servers
*Example2:*
If a hypervisor is down, it should not alert on the virtual machines
On prometheus,we group routers in one group, hypervisor on another group
and also virtual machines as a single group .
*Example*
job_name: 'blackbox_icmp-routers
job_name: 'blackbox_icmp-hypervisors
job_name: 'blackbox_icmp-virtualmachines
Alertmanager rules are defined based on each job
- name: RouterDown
rules:
- alert: R-InstanceDown
expr: probe_success{job="blackbox_icmp-routers} == 0
for: 1m
- name: HypervisorDown
rules:
- alert: H-InstanceDown
expr: probe_success{job="blackbox_icmp-hypervisors} == 0
for: 1m
- name: VirtualMachinesDown
rules:
- alert: V-InstanceDown
expr: probe_success{job="blackbox_icmp-virtualmachines} == 0
for: 1m
Alertmanager config is below:
route:
group_by: ['alertname']
receiver: ms-teams
repeat_interval: 5m
receivers:
- name: ms-teams
webhook_configs:
- url: 'http://monitoring:2000/alertmanager'
send_resolved: false
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Any help is much appreciated.
Thanks
Sandosh
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/22fa441a-eed5-4017-b847-94e4e6d9c160n%40googlegroups.com.