Thanks Brian. "label_replace" did the magic and I am able to separate the
site numbers from the target using the below conf and in the alertmanager I
grouped them by "SNumber". But not sure how to use the inhibit rules to
suppress the alerts. Can you help?
And I want to use the "SNumber" grouping to suppress the alerts only when
Router/Hypervisor is down and in all other cases it needs to be
consolidated alerts for all "SNumber". Is that something can be achieved?
Please let me know.
- source_labels: [__param_target]
target_label: SNumber
regex: 'hyper.(.*)com'
replacement: '${1}'
- source_labels: [__param_target]
target_label: SNumber
regex: 'linux.(.*)com'
replacement: '${1}'
- source_labels: [__param_target]
target_label: SNumber
regex: 'win.(.*)com'
replacement: '${1}'
- source_labels: [__param_target]
target_label: SNumber
regex: 'Router.(.*)com'
replacement: '${1}'
Thanks
Sandosh
On Monday, August 29, 2022 at 4:46:33 PM UTC-5 Sandosh Kumar P wrote:
> Hi Brian,
>
> This is how the target file looks like.
>
> *Hypervisor.yml:*
> hyper.111.com
> hyper.211.com
> hyper.311.com
>
> *Instance.yml:*
> linux.111.com
> win.111.com
> linux.211.com
> win.211.com
> linux.311.com
> win.311.com
>
> *Router.yml:*
> Router.111.com
> Router.211.com
> Router.311.com
>
>
> Based on the above targets it need to group the targets like below based
> on site number:
> *Group111:*
> hyper.111.com
> linux.111.com
> win.111.com
> Router.111.com
>
> *Group 211:*
> hyper.211.com
> linux.211.com
> win.211.com
> Router.211.com
>
> *Group 311:*
> hyper.311.com
> linux.311.com
> win.311.com
> Router.311.com
>
>
> And then it needs to alert it based on their category
> (hypervisor/router/instances).
>
> - If router is down on Group111 then it need to suppress hypervisor
> and instance alerts.
> - If hypervisor is down on Group 111 then it need to suppress the
> instance alerts.
> - If more than one group routers are down then it need to consolidate
> all and send one alert for those groups.
>
>
> I am going through the documentation to understand label_replace & other
> stuffs but I am not finding more examples or use cases that fit my scenario.
>
>
> Thanks
> Sandosh
> On Monday, August 29, 2022 at 3:53:36 PM UTC-5 Brian Candler wrote:
>
>> What do you mean by "added to the targets"? Can you give some examples?
>>
>> If the instance label contains both the instance name and the site name
>> and the structure is clearly demarked, then you can can use the function
>> label_replace(), as I said before, to extract the part of interest.
>>
>> e.g. if the hypervisor's instance label is "hyper3-site1" then you can
>> use label_replace to match the pattern "-site<N>" and return just the
>> "site<N>" part. But the exact details of how to do this depend on exactly
>> what you're doing.
>>
>> See the documentation here:
>> https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace
>> The example given matches a label like *service="xxx:yyy"* and adds a
>> new label *foo="xxx"*. That's pretty much exactly what you're trying to
>> do, if I understand you correctly.
>>
>> Use the PromQL browser in the Prometheus web interface to test your
>> expressions as you write them.
>>
>> On Monday, 29 August 2022 at 17:41:33 UTC+1 [email protected] wrote:
>>
>>> In my case there are multiple sites located in different locations and
>>> each site as a unique number per that site added to the targets of
>>> hypervisor, router and instances. When I create an additional label in the
>>> rules files like in the previous configuration I have shared, it is
>>> grouping all the sites routers together, hypervisor together and all
>>> instances together.
>>>
>>> What I am trying to achieve is to group all the targets with the same
>>> site numbers together and then on top of that I need to separate targets
>>> based on hypervisor, router & instances. Since I am new to prometheus I am
>>> getting stuck on how to separate them based on the unique number first and
>>> then later by the type.
>>>
>>> And for the inhibit rules, I will definitely make the said changes
>>> based on your recommendations. Let me know how can I achieve the above.
>>>
>>>
>>> On Thursday, August 25, 2022 at 10:25:52 AM UTC-5 Brian Candler wrote:
>>>
>>>> On Thursday, 25 August 2022 at 14:39:57 UTC+1 [email protected] wrote:
>>>>
>>>>>
>>>>> Since our targets has unique naming per cluster (For eg: router111,
>>>>> router 112, hypervisor111, hypervisor112, instance111, instance112), is
>>>>> there a way to group them based on their naming? Like all nodes which has
>>>>> 111 grouped together and 112 grouped together etc... Please let me know.
>>>>>
>>>>>
>>>> You can use the label_replace
>>>> <https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace>
>>>>
>>>> function to extract the substring of interest into a new label.
>>>>
>>>> However I don't really understand what you're trying to do, because
>>>> presumably these are N:1 relationships (i.e. N VMs sharing one hypervisor;
>>>> and N hypervisors sharing one gateway router). If you have router111, it
>>>> won't be serving just a single hypervisor111 running a single instance111.
>>>>
>>>>
>>>>
>>>>> As per the below configuration, we are seeing only Router Down alerts
>>>>> if anything is added to Router group and it is suppressing even the valid
>>>>> alerts. Not sure what we are missing.
>>>>>
>>>>> *...*
>>>>>
>>>>
>>>>
>>>>> inhibit_rules:
>>>>>
>>>>> - source_match:
>>>>>
>>>>> severity: 'critical'
>>>>>
>>>>> target_match:
>>>>>
>>>>> severity: 'warning'
>>>>>
>>>>> equal: ['alertname', 'dev', 'instance']
>>>>>
>>>>>
>>>> The problem is that you haven't thought about your inhibit rules.
>>>>
>>>> All that you've written says: suppress any alert with label
>>>> severity="warning", if there is any active alert with label
>>>> severity="critical" and matching values of alertname, dev and instance
>>>> labels.
>>>>
>>>> What you want is something different: e.g. suppress any alert with
>>>> label alertname="H-InstanceDown", if there is any active alert with label
>>>> alertname="R-InstanceDown" and matching values of whatever label you have
>>>> set to identify the "site" that both the router and the hypervisor are in.
>>>>
>>>> It's up to you to write that in the form of an inhibit rule.
>>>>
>>>> Note that you can set additional labels on an alert, in the alerting
>>>> rule itself, if you need extra labels to be available to alertmanager.
>>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/d8de83dc-e9d1-4e04-bb6c-ff48f89db253n%40googlegroups.com.