Hi Brian,
Thanks for your response. I created common labels for each category
something like below and I see 3 groupings in the alertmanager now.
Since our targets has unique naming per cluster (For eg: router111, router
112, hypervisor111, hypervisor112, instance111, instance112), is there a
way to group them based on their naming? Like all nodes which has 111
grouped together and 112 grouped together etc... Please let me know.
As per the below configuration, we are seeing only Router Down alerts if
anything is added to Router group and it is suppressing even the valid
alerts. Not sure what we are missing.
*Rules:*
- name: RouterDown
rules:
- alert: R-InstanceDown
expr: probe_success{job="blackbox_icmp-routers} == 0
for: 1m
labels:
Category: 'Site'
Type: 'Router'
- name: HypervisorDown
rules:
- alert: H-InstanceDown
expr: probe_success{job="blackbox_icmp-hypervisors} == 0
for: 1m
labels:
Category: 'Site'
Type: 'Hypervisor'
- name: VirtualMachinesDown
rules:
- alert: V-InstanceDown
expr: probe_success{job="blackbox_icmp-virtualmachines} == 0
for: 1m
labels:
Category: 'Site'
Type: 'Instance'
*Alertmanager conf:*
route:
group_by: ['Type']
receiver: ms-teams
repeat_interval: 5m
receivers:
- name: ms-teams
webhook_configs:
- url: 'http://monitoring:2000/alertmanager'
send_resolved: false
routes:
- match:
alertname: "R-InstanceDown"
receiver: ms-teams
routes:
- match:
alertname: "H-InstanceDown"
receiver: ms-teams
- match:
alertname: "V-InstanceDown"
receiver: ms-teams
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Thanks
Sandosh
On Wednesday, August 24, 2022 at 10:48:54 AM UTC-5 Brian Candler wrote:
> You'll need to set some common labels - and if they are target labels,
> make sure they propagate through to the alert (i.e. don't write your
> alerting 'expr' in such a way that it aggregates these labels away).
>
> For example: your gateway and all the servers in a particular site can
> have {site="site123"}. Then you can write an inhibit rule to suppress
> alerts for 'device down' (target alert) if there's an active alert for
> 'gateway down' (source alert) and the 'network' label is the same (equal).
> You may need additional labels to identify "device down" versus "gateway
> down" alerts, or to distinguish the gateway from a non-gateway device.
>
> Similarly, your VMs and your hypervisor can have some shared label like
> {cluster="vm123"}. Then you can suppress alerts for 'VM down' if there's
> an alert for 'hypervisor down' with an equal 'cluster' label.
>
> For more info:
> https://prometheus.io/docs/alerting/latest/configuration/#inhibit_rule
>
> On Wednesday, 24 August 2022 at 14:04:46 UTC+1 [email protected] wrote:
>
>> We are using blackbox exporter on a remote location to monitor gateway
>> routers, hypervisors and virtual machines (router —> hypervisor —> virtual
>> machines). We are looking for something like below.
>>
>>
>> *Example 1:*
>>
>> If a gateway router is down and alertmanager is firing, it should stop
>> alerting on hypervisor hosts and servers
>>
>> *Example2:*
>>
>> If a hypervisor is down, it should not alert on the virtual machines
>>
>>
>> On prometheus,we group routers in one group, hypervisor on another group
>> and also virtual machines as a single group .
>>
>> *Example*
>>
>> job_name: 'blackbox_icmp-routers
>>
>> job_name: 'blackbox_icmp-hypervisors
>>
>> job_name: 'blackbox_icmp-virtualmachines
>>
>>
>> Alertmanager rules are defined based on each job
>>
>> - name: RouterDown
>>
>> rules:
>>
>> - alert: R-InstanceDown
>>
>> expr: probe_success{job="blackbox_icmp-routers} == 0
>>
>> for: 1m
>>
>>
>> - name: HypervisorDown
>>
>> rules:
>>
>> - alert: H-InstanceDown
>>
>> expr: probe_success{job="blackbox_icmp-hypervisors} == 0
>>
>> for: 1m
>>
>>
>> - name: VirtualMachinesDown
>>
>> rules:
>>
>> - alert: V-InstanceDown
>>
>> expr: probe_success{job="blackbox_icmp-virtualmachines} == 0
>>
>> for: 1m
>>
>>
>> Alertmanager config is below:
>>
>> route:
>>
>> group_by: ['alertname']
>>
>> receiver: ms-teams
>>
>> repeat_interval: 5m
>>
>> receivers:
>>
>> - name: ms-teams
>>
>> webhook_configs:
>>
>> - url: 'http://monitoring:2000/alertmanager'
>>
>> send_resolved: false
>>
>>
>> inhibit_rules:
>>
>> - source_match:
>>
>> severity: 'critical'
>>
>> target_match:
>>
>> severity: 'warning'
>>
>> equal: ['alertname', 'dev', 'instance']
>>
>>
>> Any help is much appreciated.
>>
>>
>> Thanks
>>
>> Sandosh
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/162242b2-b28a-4f05-8267-33a8670b2346n%40googlegroups.com.