In my case there are multiple sites located in different locations and each site as a unique number per that site added to the targets of hypervisor, router and instances. When I create an additional label in the rules files like in the previous configuration I have shared, it is grouping all the sites routers together, hypervisor together and all instances together.
What I am trying to achieve is to group all the targets with the same site numbers together and then on top of that I need to separate targets based on hypervisor, router & instances. Since I am new to prometheus I am getting stuck on how to separate them based on the unique number first and then later by the type. And for the inhibit rules, I will definitely make the said changes based on your recommendations. Let me know how can I achieve the above. On Thursday, August 25, 2022 at 10:25:52 AM UTC-5 Brian Candler wrote: > On Thursday, 25 August 2022 at 14:39:57 UTC+1 [email protected] wrote: > >> >> Since our targets has unique naming per cluster (For eg: router111, >> router 112, hypervisor111, hypervisor112, instance111, instance112), is >> there a way to group them based on their naming? Like all nodes which has >> 111 grouped together and 112 grouped together etc... Please let me know. >> >> > You can use the label_replace > <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace> > > function to extract the substring of interest into a new label. > > However I don't really understand what you're trying to do, because > presumably these are N:1 relationships (i.e. N VMs sharing one hypervisor; > and N hypervisors sharing one gateway router). If you have router111, it > won't be serving just a single hypervisor111 running a single instance111. > > > >> As per the below configuration, we are seeing only Router Down alerts if >> anything is added to Router group and it is suppressing even the valid >> alerts. Not sure what we are missing. >> >> *...* >> > > >> inhibit_rules: >> >> - source_match: >> >> severity: 'critical' >> >> target_match: >> >> severity: 'warning' >> >> equal: ['alertname', 'dev', 'instance'] >> >> > The problem is that you haven't thought about your inhibit rules. > > All that you've written says: suppress any alert with label > severity="warning", if there is any active alert with label > severity="critical" and matching values of alertname, dev and instance > labels. > > What you want is something different: e.g. suppress any alert with label > alertname="H-InstanceDown", if there is any active alert with label > alertname="R-InstanceDown" and matching values of whatever label you have > set to identify the "site" that both the router and the hypervisor are in. > It's up to you to write that in the form of an inhibit rule. > > Note that you can set additional labels on an alert, in the alerting rule > itself, if you need extra labels to be available to alertmanager. > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4d5effba-a508-4dfb-8150-14cb54422676n%40googlegroups.com.

