In my case there are multiple sites located in different locations and each 
site as a unique number per that site added to the targets of hypervisor, 
router and instances. When I create an additional label in the rules files 
like in the previous configuration I have shared, it is grouping all the 
sites routers together, hypervisor together and all instances together. 

What I am trying to achieve is to group all the targets with the same site 
numbers together and then on top of that I need to separate targets based 
on hypervisor, router & instances. Since I am new to prometheus I am 
getting stuck on how to separate them based on the unique number first and 
then later by the type. 

And for the inhibit rules,  I will definitely make the said changes based 
on your recommendations. Let me know how can I achieve the above. 


On Thursday, August 25, 2022 at 10:25:52 AM UTC-5 Brian Candler wrote:

> On Thursday, 25 August 2022 at 14:39:57 UTC+1 [email protected] wrote:
>
>>
>> Since our targets has unique naming per cluster (For eg: router111, 
>> router 112, hypervisor111, hypervisor112, instance111, instance112), is 
>> there a way to group them based on their naming? Like all nodes which has 
>> 111 grouped together and 112 grouped together etc... Please let me know. 
>>
>>
> You can use the label_replace 
> <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>
>  
> function to extract the substring of interest into a new label.
>
> However I don't really understand what you're trying to do, because 
> presumably these are N:1 relationships (i.e. N VMs sharing one hypervisor; 
> and N hypervisors sharing one gateway router). If you have router111, it 
> won't be serving just a single hypervisor111 running a single instance111.
>
>  
>
>> As per the below configuration, we are seeing only Router Down alerts if 
>> anything is added to Router group and it is suppressing even the valid 
>> alerts. Not sure what we are missing. 
>>
>> *...*
>>
>  
>
>> inhibit_rules:
>>
>>   - source_match:
>>
>>       severity: 'critical'
>>
>>     target_match:
>>
>>       severity: 'warning'
>>
>>     equal: ['alertname', 'dev', 'instance']
>>
>>
> The problem is that you haven't thought about your inhibit rules.
>
> All that you've written says: suppress any alert with label 
> severity="warning", if there is any active alert with label 
> severity="critical" and matching values of alertname, dev and instance 
> labels.
>
> What you want is something different: e.g. suppress any alert with label 
> alertname="H-InstanceDown", if there is any active alert with label 
> alertname="R-InstanceDown" and matching values of whatever label you have 
> set to identify the "site" that both the router and the hypervisor are in.  
> It's up to you to write that in the form of an inhibit rule.
>
> Note that you can set additional labels on an alert, in the alerting rule 
> itself, if you need extra labels to be available to alertmanager.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4d5effba-a508-4dfb-8150-14cb54422676n%40googlegroups.com.

Reply via email to