[prometheus-users] Re: Query on Inhibit rules

Sandosh Kumar P Thu, 25 Aug 2022 06:40:02 -0700

Hi Brian,

Thanks for your response. I created common labels for each category 
something like below and I see 3 groupings in the alertmanager now.


Since our targets has unique naming per cluster (For eg: router111, router 
112, hypervisor111, hypervisor112, instance111, instance112), is there a 
way to group them based on their naming? Like all nodes which has 111 
grouped together and 112 grouped together etc... Please let me know. 

As per the below configuration, we are seeing only Router Down alerts if 
anything is added to Router group and it is suppressing even the valid 
alerts. Not sure what we are missing. 

*Rules:*

- name: RouterDown

   rules:

   - alert: R-InstanceDown

     expr: probe_success{job="blackbox_icmp-routers} == 0

     for: 1m

     labels:

       Category: 'Site'

       Type: 'Router'


- name: HypervisorDown

   rules:

   - alert: H-InstanceDown

     expr: probe_success{job="blackbox_icmp-hypervisors} == 0

     for: 1m

     labels:

       Category: 'Site'

       Type: 'Hypervisor'


- name: VirtualMachinesDown

   rules:

   - alert: V-InstanceDown

     expr: probe_success{job="blackbox_icmp-virtualmachines} == 0

     for: 1m

     labels:

       Category: 'Site'

       Type: 'Instance'


*Alertmanager conf:*

route:

  group_by: ['Type']

  receiver: ms-teams

  repeat_interval: 5m

receivers:

- name: ms-teams

  webhook_configs:

    - url: 'http://monitoring:2000/alertmanager'

      send_resolved: false

  routes:

    - match:

       alertname: "R-InstanceDown"

      receiver: ms-teams

      routes:

        - match:

           alertname: "H-InstanceDown"

          receiver: ms-teams

        - match:

           alertname: "V-InstanceDown"

          receiver: ms-teams

inhibit_rules:

  - source_match:

      severity: 'critical'

    target_match:

      severity: 'warning'

    equal: ['alertname', 'dev', 'instance']



Thanks
Sandosh

On Wednesday, August 24, 2022 at 10:48:54 AM UTC-5 Brian Candler wrote:

> You'll need to set some common labels - and if they are target labels, 
> make sure they propagate through to the alert (i.e. don't write your 
> alerting 'expr' in such a way that it aggregates these labels away).
>
> For example: your gateway and all the servers in a particular site can 
> have {site="site123"}.  Then you can write an inhibit rule to suppress 
> alerts for 'device down' (target alert) if there's an active alert for 
> 'gateway down' (source alert) and the 'network' label is the same (equal).  
> You may need additional labels to identify "device down" versus "gateway 
> down" alerts, or to distinguish the gateway from a non-gateway device.
>
> Similarly, your VMs and your hypervisor can have some shared label like 
> {cluster="vm123"}.  Then you can suppress alerts for 'VM down' if there's 
> an alert for 'hypervisor down' with an equal 'cluster' label.
>
> For more info:
> https://prometheus.io/docs/alerting/latest/configuration/#inhibit_rule
>
> On Wednesday, 24 August 2022 at 14:04:46 UTC+1 [email protected] wrote:
>
>> We are using blackbox exporter on a remote location to monitor gateway 
>> routers, hypervisors and virtual machines (router —> hypervisor —> virtual 
>> machines). We are looking for something like below.
>>
>>
>> *Example 1:*
>>
>> If a gateway router is down and alertmanager is firing, it should stop 
>> alerting on hypervisor hosts and servers
>>
>> *Example2:*
>>
>> If a hypervisor is down, it should not alert on the virtual machines
>>
>>
>> On prometheus,we group routers in one group, hypervisor on another group 
>> and also virtual machines as a single group . 
>>
>> *Example*
>>
>> job_name: 'blackbox_icmp-routers
>>
>> job_name: 'blackbox_icmp-hypervisors
>>
>> job_name: 'blackbox_icmp-virtualmachines
>>
>>
>> Alertmanager rules are defined based on each job
>>
>> - name: RouterDown
>>
>>    rules:
>>
>>    - alert: R-InstanceDown
>>
>>      expr: probe_success{job="blackbox_icmp-routers} == 0
>>
>>      for: 1m
>>
>>
>> - name: HypervisorDown
>>
>>    rules:
>>
>>    - alert: H-InstanceDown
>>
>>      expr: probe_success{job="blackbox_icmp-hypervisors} == 0
>>
>>      for: 1m
>>
>>
>> - name: VirtualMachinesDown
>>
>>    rules:
>>
>>    - alert: V-InstanceDown
>>
>>      expr: probe_success{job="blackbox_icmp-virtualmachines} == 0
>>
>>      for: 1m
>>
>>
>> Alertmanager config is below:
>>
>> route:
>>
>>   group_by: ['alertname']
>>
>>   receiver: ms-teams
>>
>>   repeat_interval: 5m
>>
>> receivers:
>>
>> - name: ms-teams
>>
>>   webhook_configs:
>>
>>     - url: 'http://monitoring:2000/alertmanager'
>>
>>       send_resolved: false
>>
>>
>> inhibit_rules:
>>
>>   - source_match:
>>
>>       severity: 'critical'
>>
>>     target_match:
>>
>>       severity: 'warning'
>>
>>     equal: ['alertname', 'dev', 'instance']
>>
>>
>> Any help is much appreciated.
>>
>>
>> Thanks
>>
>> Sandosh
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/162242b2-b28a-4f05-8267-33a8670b2346n%40googlegroups.com.

[prometheus-users] Re: Query on Inhibit rules

Reply via email to