[prometheus-users] Re: Query on Inhibit rules

Brian Candler Wed, 24 Aug 2022 08:48:59 -0700

You'll need to set some common labels - and if they are target labels, make 
sure they propagate through to the alert (i.e. don't write your alerting 
'expr' in such a way that it aggregates these labels away).


For example: your gateway and all the servers in a particular site can have 
{site="site123"}.  Then you can write an inhibit rule to suppress alerts 
for 'device down' (target alert) if there's an active alert for 'gateway 
down' (source alert) and the 'network' label is the same (equal).  You may 
need additional labels to identify "device down" versus "gateway down" 
alerts, or to distinguish the gateway from a non-gateway device.

Similarly, your VMs and your hypervisor can have some shared label like 
{cluster="vm123"}.  Then you can suppress alerts for 'VM down' if there's 
an alert for 'hypervisor down' with an equal 'cluster' label.

For more info:
https://prometheus.io/docs/alerting/latest/configuration/#inhibit_rule

On Wednesday, 24 August 2022 at 14:04:46 UTC+1 [email protected] wrote:

> We are using blackbox exporter on a remote location to monitor gateway 
> routers, hypervisors and virtual machines (router —> hypervisor —> virtual 
> machines). We are looking for something like below.
>
>
> *Example 1:*
>
> If a gateway router is down and alertmanager is firing, it should stop 
> alerting on hypervisor hosts and servers
>
> *Example2:*
>
> If a hypervisor is down, it should not alert on the virtual machines
>
>
> On prometheus,we group routers in one group, hypervisor on another group 
> and also virtual machines as a single group . 
>
> *Example*
>
> job_name: 'blackbox_icmp-routers
>
> job_name: 'blackbox_icmp-hypervisors
>
> job_name: 'blackbox_icmp-virtualmachines
>
>
> Alertmanager rules are defined based on each job
>
> - name: RouterDown
>
>    rules:
>
>    - alert: R-InstanceDown
>
>      expr: probe_success{job="blackbox_icmp-routers} == 0
>
>      for: 1m
>
>
> - name: HypervisorDown
>
>    rules:
>
>    - alert: H-InstanceDown
>
>      expr: probe_success{job="blackbox_icmp-hypervisors} == 0
>
>      for: 1m
>
>
> - name: VirtualMachinesDown
>
>    rules:
>
>    - alert: V-InstanceDown
>
>      expr: probe_success{job="blackbox_icmp-virtualmachines} == 0
>
>      for: 1m
>
>
> Alertmanager config is below:
>
> route:
>
>   group_by: ['alertname']
>
>   receiver: ms-teams
>
>   repeat_interval: 5m
>
> receivers:
>
> - name: ms-teams
>
>   webhook_configs:
>
>     - url: 'http://monitoring:2000/alertmanager'
>
>       send_resolved: false
>
>
> inhibit_rules:
>
>   - source_match:
>
>       severity: 'critical'
>
>     target_match:
>
>       severity: 'warning'
>
>     equal: ['alertname', 'dev', 'instance']
>
>
> Any help is much appreciated.
>
>
> Thanks
>
> Sandosh
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/042b43ea-c753-4c4e-aff1-6a7474e417e4n%40googlegroups.com.

[prometheus-users] Re: Query on Inhibit rules

Reply via email to