Hi Brian,

This is how the target file looks like.

*Hypervisor.yml:*
hyper.111.com
hyper.211.com
hyper.311.com

*Instance.yml:*
linux.111.com
win.111.com
linux.211.com
win.211.com
linux.311.com
win.311.com

*Router.yml:*
Router.111.com
Router.211.com
Router.311.com


Based on the above targets it need to group the targets like below based on 
site number:
*Group111:*
hyper.111.com
linux.111.com
win.111.com
Router.111.com

*Group 211:*
hyper.211.com
linux.211.com
win.211.com
Router.211.com

*Group 311:*
hyper.311.com
linux.311.com
win.311.com
Router.311.com


And then it needs to alert it based on their category 
(hypervisor/router/instances).

   - If router is down on Group111 then it need to suppress hypervisor and 
   instance alerts.
   - If hypervisor is down on Group 111 then it need to suppress the 
   instance alerts.
   - If more than one group routers are down then it need to consolidate 
   all and send one alert for those groups.


I am going through the documentation to understand label_replace & other 
stuffs but I am not finding more examples or use cases that fit my scenario.


Thanks
Sandosh
On Monday, August 29, 2022 at 3:53:36 PM UTC-5 Brian Candler wrote:

> What do you mean by "added to the targets"?  Can you give some examples?
>
> If the instance label contains both the instance name and the site name 
> and the structure is clearly demarked, then you can can use the function 
> label_replace(), as I said before, to extract the part of interest.
>
> e.g. if the hypervisor's instance label is "hyper3-site1" then you can use 
> label_replace to match the pattern "-site<N>" and return just the "site<N>" 
> part.  But the exact details of how to do this depend on exactly what 
> you're doing.
>
> See the documentation here: 
> https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace
> The example given matches a label like *service="xxx:yyy"* and adds a new 
> label *foo="xxx"*.  That's pretty much exactly what you're trying to do, 
> if I understand you correctly.
>
> Use the PromQL browser in the Prometheus web interface to test your 
> expressions as you write them.
>
> On Monday, 29 August 2022 at 17:41:33 UTC+1 [email protected] wrote:
>
>> In my case there are multiple sites located in different locations and 
>> each site as a unique number per that site added to the targets of 
>> hypervisor, router and instances. When I create an additional label in the 
>> rules files like in the previous configuration I have shared, it is 
>> grouping all the sites routers together, hypervisor together and all 
>> instances together. 
>>
>> What I am trying to achieve is to group all the targets with the same 
>> site numbers together and then on top of that I need to separate targets 
>> based on hypervisor, router & instances. Since I am new to prometheus I am 
>> getting stuck on how to separate them based on the unique number first and 
>> then later by the type. 
>>
>> And for the inhibit rules,  I will definitely make the said changes based 
>> on your recommendations. Let me know how can I achieve the above. 
>>
>>
>> On Thursday, August 25, 2022 at 10:25:52 AM UTC-5 Brian Candler wrote:
>>
>>> On Thursday, 25 August 2022 at 14:39:57 UTC+1 [email protected] wrote:
>>>
>>>>
>>>> Since our targets has unique naming per cluster (For eg: router111, 
>>>> router 112, hypervisor111, hypervisor112, instance111, instance112), is 
>>>> there a way to group them based on their naming? Like all nodes which has 
>>>> 111 grouped together and 112 grouped together etc... Please let me know. 
>>>>
>>>>
>>> You can use the label_replace 
>>> <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>
>>>  
>>> function to extract the substring of interest into a new label.
>>>
>>> However I don't really understand what you're trying to do, because 
>>> presumably these are N:1 relationships (i.e. N VMs sharing one hypervisor; 
>>> and N hypervisors sharing one gateway router). If you have router111, it 
>>> won't be serving just a single hypervisor111 running a single instance111.
>>>
>>>  
>>>
>>>> As per the below configuration, we are seeing only Router Down alerts 
>>>> if anything is added to Router group and it is suppressing even the valid 
>>>> alerts. Not sure what we are missing. 
>>>>
>>>> *...*
>>>>
>>>  
>>>
>>>> inhibit_rules:
>>>>
>>>>   - source_match:
>>>>
>>>>       severity: 'critical'
>>>>
>>>>     target_match:
>>>>
>>>>       severity: 'warning'
>>>>
>>>>     equal: ['alertname', 'dev', 'instance']
>>>>
>>>>
>>> The problem is that you haven't thought about your inhibit rules.
>>>
>>> All that you've written says: suppress any alert with label 
>>> severity="warning", if there is any active alert with label 
>>> severity="critical" and matching values of alertname, dev and instance 
>>> labels.
>>>
>>> What you want is something different: e.g. suppress any alert with label 
>>> alertname="H-InstanceDown", if there is any active alert with label 
>>> alertname="R-InstanceDown" and matching values of whatever label you have 
>>> set to identify the "site" that both the router and the hypervisor are in.  
>>> It's up to you to write that in the form of an inhibit rule.
>>>
>>> Note that you can set additional labels on an alert, in the alerting 
>>> rule itself, if you need extra labels to be available to alertmanager.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1630c9ab-8ea0-4e0b-b982-d3676ef47617n%40googlegroups.com.

Reply via email to