It might be possible to simplify this a bit, if:

1. An active but "inhibited" alert is still able to inhibit another alert 
(I don't know if this is true, I have not tested it)
2. All devices have unique instance names
3. You know for sure that whenever a device fails at depth N, the failure 
*will* cascade and cause all its children and their descendants to alert.

In this case, a device's alert only needs to be inhibited by its immediate 
parent, which means you only need the *lowest two levels*, where level(N-1) 
is the parent and level(N) is the device itself, and (N) is the depth in 
the tree.

up{instance="coresw1",level1="coresw1"}
up{instance="host1",level1="coresw1",level2="host1"}
up{instance="host2",level1="coresw1",level2="host2"}
up{instance="l1sw1",level1="coresw1",level2="l1sw1""}
up{instance="host3",level2="l1sw1",level3="host3"}
up{instance="host4",level2="l1sw1",level3="host4"}
up{instance="l2sw1",level2="l1sw1",level3="l2sw1"}
up{instance="host5",level3="l2sw1",level4="host5"}
up{instance="l1sw2",level1="coresw1",level2="l1sw2"}
up{instance="host6",level2="l1sw2",level3="host6"}

inhibit_rules:
  - source_matchers:
      - level1=~'.+'
      - level2=''
    target_matchers:
      - level2=~'.+'
    equal: ['level1']
  - source_matchers:
      - level2=~'.+'
      - level3=''
    target_matchers:
      - level3=~'.+'
    equal: ['level2']
  - source_matchers:
      - level3=~'.+'
      - level4=''
    target_matchers:
      - level4=~'.+'
    equal: ['level3']
... etc

Now consider what happens if coresw1 fails.  All the other devices will 
raise alerts, due to the way they are interconnected.
The alert for coresw1 will inhibit the alerts for host1, host2 and l1sw1
The alert for l1sw1 will inhibit the alerts for host3, host4 and l1sw2
The alert for l1sw2 will inhibit the alert for host5.

This isn't as robust as the previous example, where the alert for coresw1 
will *directly* inhibit all the other dependent resources; but it is 
slightly easier to create the instance labels.  You just need to know what 
depth it is in the tree, and you need to add level(N-1)=parent and 
level(N)=self.

On Tuesday, 6 September 2022 at 11:20:56 UTC+1 Brian Candler wrote:

> On Tuesday, 19 February 2019 at 09:58:40 UTC [email protected] wrote:
>
>>   I'm new to prometheus and alertmanager. I'm trying to find a way how
>> to setup alertmanager to suppress (inhibit) alerts in network tree 
>> structure
>> at any level. Something like
>>
>>    - root switch
>>       - host 1
>>       - host 2
>>       - level 1 switch 1
>>       - host 3
>>          - host 4
>>          - level 2 switch
>>             - host 5
>>          - level 1 switch 2
>>          - host 6
>>       
>> I want to receive only notification about root switch if it fails (no 
>> other host/switch).
>> I want to receive only notification about level 1 switch 1 (and no host 
>> 3-5 or level 2 switch).
>> and so on.
>>
>> What is the best way? I was thinking about using some prefix form in label
>> net (e.g.
>> net: root
>> net: root_host1,
>> net: root_lev2sw
>> net: root_lev2sw_host5,
>> but I find no way how to use source label in target match. I do not want 
>> to write
>> static inhibit rule for every switch node.
>>
>
> I think you're on the right lines.
>
> Since the inhibit rules can do nothing more sophisticated than "equal" 
> matching, I would go with multiple labels to represent levels 1/2/3 etc of 
> the hierarchy. The slightly tricky part is to determine the difference 
> between parent and child (remembering that one node can be both).
>
> This is what I came up with:
>
> up{instance="coresw1",level1="coresw1"}
> up{instance="host1",level1="coresw1",level2="host1"}
> up{instance="host2",level1="coresw1",level2="host2"}
> up{instance="l1sw1",level1="coresw1",level2="l1sw1""}
> up{instance="host3",level1="coresw1",level2="l1sw1",level3="host3"}
> up{instance="host4",level1="coresw1",level2="l1sw1",level3="host4"}
> up{instance="l2sw1",level1="coresw1",level2="l1sw1",level3="l2sw1"}
>
> up{instance="host5",level1="coresw1",level2="l1sw1",level3="l2sw1",level4="host5"}
> up{instance="l1sw2",level1="coresw1",level2="l1sw1"}
> up{instance="host6",level1="coresw1",level2="l1sw1",level3="host6"}
>
> The rule is simply that the lowest "level" label is equal to the 
> "instance" label, and the "depth" in the tree is equal to the number of 
> "level" labels.
>
> Then inhibit rules something like this:
>
> inhibit_rules:
>   - source_matchers:
>       - level1=~'.+'
>       - level2=''
>     target_matchers:
>       - level2=~'.+'
>     equal: ['level1']
>   - source_matchers:
>       - level1=~'.+'
>       - level2=~'.+'
>       - level3=''
>     target_matchers:
>       - level3=~'.+'
>     equal: ['level1','level2']
>   - source_matchers:
>       - level1=~'.+'
>       - level2=~'.+'
>       - level3=~'.+'
>       - level4=''
>     target_matchers:
>       - level4=~'.+'
>     equal: ['level1','level2','level3']
> ... etc
>
> This means that:
> * An alert with level1="foo" (but no level2, i.e. it's at depth 1 in the 
> tree) will suppress any alert for something with depth>1 and level1="foo"
> * An alert with level1="foo",level2="bar" (but no level3, i.e. it's at 
> depth 2 in the tree)  will suppress any alert for something with depth>2, 
> level1="foo" and level2="bar"
> * etc
>
> Untested, but you get the idea.  Let me know if something like this works 
> for you.
>
> Generating those labels by hand is tedious, but you could write a script 
> which reads in a set of targets with "instance" and "parent" attributes, 
> and rewrites them to depth/level1/level2 etc.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4edb1f7d-ddfc-4178-878e-ac21a517f32cn%40googlegroups.com.

Reply via email to