Hi Brian, This is how the target file looks like.
*Hypervisor.yml:* hyper.111.com hyper.211.com hyper.311.com *Instance.yml:* linux.111.com win.111.com linux.211.com win.211.com linux.311.com win.311.com *Router.yml:* Router.111.com Router.211.com Router.311.com Based on the above targets it need to group the targets like below based on site number: *Group111:* hyper.111.com linux.111.com win.111.com Router.111.com *Group 211:* hyper.211.com linux.211.com win.211.com Router.211.com *Group 311:* hyper.311.com linux.311.com win.311.com Router.311.com And then it needs to alert it based on their category (hypervisor/router/instances). - If router is down on Group111 then it need to suppress hypervisor and instance alerts. - If hypervisor is down on Group 111 then it need to suppress the instance alerts. - If more than one group routers are down then it need to consolidate all and send one alert for those groups. I am going through the documentation to understand label_replace & other stuffs but I am not finding more examples or use cases that fit my scenario. Thanks Sandosh On Monday, August 29, 2022 at 3:53:36 PM UTC-5 Brian Candler wrote: > What do you mean by "added to the targets"? Can you give some examples? > > If the instance label contains both the instance name and the site name > and the structure is clearly demarked, then you can can use the function > label_replace(), as I said before, to extract the part of interest. > > e.g. if the hypervisor's instance label is "hyper3-site1" then you can use > label_replace to match the pattern "-site<N>" and return just the "site<N>" > part. But the exact details of how to do this depend on exactly what > you're doing. > > See the documentation here: > https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace > The example given matches a label like *service="xxx:yyy"* and adds a new > label *foo="xxx"*. That's pretty much exactly what you're trying to do, > if I understand you correctly. > > Use the PromQL browser in the Prometheus web interface to test your > expressions as you write them. > > On Monday, 29 August 2022 at 17:41:33 UTC+1 [email protected] wrote: > >> In my case there are multiple sites located in different locations and >> each site as a unique number per that site added to the targets of >> hypervisor, router and instances. When I create an additional label in the >> rules files like in the previous configuration I have shared, it is >> grouping all the sites routers together, hypervisor together and all >> instances together. >> >> What I am trying to achieve is to group all the targets with the same >> site numbers together and then on top of that I need to separate targets >> based on hypervisor, router & instances. Since I am new to prometheus I am >> getting stuck on how to separate them based on the unique number first and >> then later by the type. >> >> And for the inhibit rules, I will definitely make the said changes based >> on your recommendations. Let me know how can I achieve the above. >> >> >> On Thursday, August 25, 2022 at 10:25:52 AM UTC-5 Brian Candler wrote: >> >>> On Thursday, 25 August 2022 at 14:39:57 UTC+1 [email protected] wrote: >>> >>>> >>>> Since our targets has unique naming per cluster (For eg: router111, >>>> router 112, hypervisor111, hypervisor112, instance111, instance112), is >>>> there a way to group them based on their naming? Like all nodes which has >>>> 111 grouped together and 112 grouped together etc... Please let me know. >>>> >>>> >>> You can use the label_replace >>> <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace> >>> >>> function to extract the substring of interest into a new label. >>> >>> However I don't really understand what you're trying to do, because >>> presumably these are N:1 relationships (i.e. N VMs sharing one hypervisor; >>> and N hypervisors sharing one gateway router). If you have router111, it >>> won't be serving just a single hypervisor111 running a single instance111. >>> >>> >>> >>>> As per the below configuration, we are seeing only Router Down alerts >>>> if anything is added to Router group and it is suppressing even the valid >>>> alerts. Not sure what we are missing. >>>> >>>> *...* >>>> >>> >>> >>>> inhibit_rules: >>>> >>>> - source_match: >>>> >>>> severity: 'critical' >>>> >>>> target_match: >>>> >>>> severity: 'warning' >>>> >>>> equal: ['alertname', 'dev', 'instance'] >>>> >>>> >>> The problem is that you haven't thought about your inhibit rules. >>> >>> All that you've written says: suppress any alert with label >>> severity="warning", if there is any active alert with label >>> severity="critical" and matching values of alertname, dev and instance >>> labels. >>> >>> What you want is something different: e.g. suppress any alert with label >>> alertname="H-InstanceDown", if there is any active alert with label >>> alertname="R-InstanceDown" and matching values of whatever label you have >>> set to identify the "site" that both the router and the hypervisor are in. >>> It's up to you to write that in the form of an inhibit rule. >>> >>> Note that you can set additional labels on an alert, in the alerting >>> rule itself, if you need extra labels to be available to alertmanager. >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1630c9ab-8ea0-4e0b-b982-d3676ef47617n%40googlegroups.com.

