We are using blackbox exporter on a remote location to monitor gateway 
routers, hypervisors and virtual machines (router —> hypervisor —> virtual 
machines). We are looking for something like below.


*Example 1:*

If a gateway router is down and alertmanager is firing, it should stop 
alerting on hypervisor hosts and servers

*Example2:*

If a hypervisor is down, it should not alert on the virtual machines


On prometheus,we group routers in one group, hypervisor on another group 
and also virtual machines as a single group . 

*Example*

job_name: 'blackbox_icmp-routers

job_name: 'blackbox_icmp-hypervisors

job_name: 'blackbox_icmp-virtualmachines


Alertmanager rules are defined based on each job

- name: RouterDown

   rules:

   - alert: R-InstanceDown

     expr: probe_success{job="blackbox_icmp-routers} == 0

     for: 1m


- name: HypervisorDown

   rules:

   - alert: H-InstanceDown

     expr: probe_success{job="blackbox_icmp-hypervisors} == 0

     for: 1m


- name: VirtualMachinesDown

   rules:

   - alert: V-InstanceDown

     expr: probe_success{job="blackbox_icmp-virtualmachines} == 0

     for: 1m


Alertmanager config is below:

route:

  group_by: ['alertname']

  receiver: ms-teams

  repeat_interval: 5m

receivers:

- name: ms-teams

  webhook_configs:

    - url: 'http://monitoring:2000/alertmanager'

      send_resolved: false


inhibit_rules:

  - source_match:

      severity: 'critical'

    target_match:

      severity: 'warning'

    equal: ['alertname', 'dev', 'instance']


Any help is much appreciated.


Thanks

Sandosh

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/22fa441a-eed5-4017-b847-94e4e6d9c160n%40googlegroups.com.

Reply via email to