[prometheus-users] Re: Alertmanager configuration: routes

Thomas Schneider Fri, 10 Sep 2021 04:53:52 -0700

Thanks for this information.

If my understanding is correct, the alert name, specified in in file 
*rules.yml* with parameter -alert: <alertname> must be used in file 
alertmanager.yml with parameter -service=<alertname>. Optionally one could 
add labels in rules.yml, e.g. team: oncall and then use this with 
-service="team: 
oncall".


Is this correct?

Brian Candler schrieb am Freitag, 3. September 2021 um 20:05:37 UTC+2:

> And I forgot to say: given an alerting rule like
>
>   - alert: UpDown
>     expr: up == 0
>     for: 3m
>
> then the label alertname="UpDown" is also added automatically (similar to 
> how "job" and "instance" labels are added automatically at scrape time).
>
> So at the end, you have a mixture of labels from the exporter, plus 
> system-generated labels like "job" and "instance" and "alertname", plus any 
> labels you've chosen to add yourself.  The "matchers" in alertmanager can 
> match any of these.
>
> On Friday, 3 September 2021 at 18:47:22 UTC+1 Brian Candler wrote:
>
>> No, definitely not. There is no such thing as "service" in Prometheus - 
>> Alertmanager config.
>>
>> But if you wish, you can have a *label* on your timeseries called 
>> "service", or called "environment", or anything you like.  You can add 
>> labels at scrape time:
>>
>>   - job_name: node
>>     scrape_interval: 1m
>>     static_configs:
>>       - targets:
>>           - bar:9100
>>           - baz:9100
>>         # these labels are added to every timeseries scraped from those 
>> targets
>> *        labels:*
>> *          environment: prod*
>>
>> (note that "job" and "instance" labels are also added automatically as 
>> part of the scrape; the remaining labels come from the exporter).
>>
>> Or you can add a label in your alerting rule:
>>
>> groups:
>> - name: UpDown
>>   rules:
>>   - alert: UpDown
>>     expr: up == 0
>>     for: 3m
>>     # these labels are added to every alert generated from this rule
>> *    labels:*
>> *      environment: prod*
>>
>> Note: it would be unusual to add label "environment: prod" in an alerting 
>> rule, but adding a label like "severity: critical" or "team: oncall" is 
>> more common - something which is specific to that alert, rather than the 
>> server.
>>
>> In either of these cases, the alert which arrives at alertmanager will 
>> have the given labels on it.  Hence you can match on it in alertmanager, to 
>> decide how to route the alert.
>>
>> On Friday, 3 September 2021 at 09:26:35 UTC+1 [email protected] wrote:
>>
>>> This means
>>> alert in Prometheus - Rules config
>>> is equal to
>>> service in Prometheus - Alertmanager config
>>> ?
>>>
>>> Brian Candler schrieb am Freitag, 3. September 2021 um 10:13:24 UTC+2:
>>>
>>>> Note that an "alertname" label is added automatically, so you could 
>>>> match on alertname="TargetDown" if you want.  Doesn't scale very well, but 
>>>> with a small number of rules that approach will get you started.
>>>>
>>>> If you go to your prometheus web interface, at prometheus:9090, and 
>>>> click on the "Alerts" tab at the top, then you can see firing alerts, 
>>>> including all the labels on them.
>>>>
>>>> [image: img1.png]
>>>>
>>>> On Friday, 3 September 2021 at 09:09:56 UTC+1 Brian Candler wrote:
>>>>
>>>>> The only labels you can match on from that rule are "severity: 
>>>>> warning", and the "job" and "instance" labels.
>>>>>
>>>>> > What must the alertmanager config be for this rule?
>>>>>
>>>>> You don't need *any* matching rules in alertmanager.  At simplest, you 
>>>>> can just have
>>>>>
>>>>> route:
>>>>>   receiver: default
>>>>>
>>>>> receivers:
>>>>> - name: default
>>>>>   email_configs:
>>>>>   - to: [email protected]
>>>>>     send_resolved: true
>>>>>   - to: [email protected]
>>>>>     send_resolved: true
>>>>>
>>>>> Any more than that, and it depends on your business requirements.  Do 
>>>>> you want all alerts with severity "warning" to be treated differently?  
>>>>> Use 
>>>>> a routing rule (in the "routes" section under "route").  Do you want a 
>>>>> certain subset of targets to be handled by a particular team? Then either 
>>>>> add a label in the alerting rules themselves, or ensure that those 
>>>>> targets 
>>>>> already have a particular label in their scrape config, and match that 
>>>>> label in the "routes" section.
>>>>>
>>>>> On Friday, 3 September 2021 at 08:20:49 UTC+1 [email protected] wrote:
>>>>>
>>>>>> It's clear that the config
>>>>>> - service=~"mysql|cassandra"
>>>>>> does not match the rule.
>>>>>> This was just an example.
>>>>>>
>>>>>> But this question is still open:
>>>>>> What must the alertmanager config be for this rule?
>>>>>> groups:
>>>>>> - name: general.rules
>>>>>>   rules:
>>>>>>   - alert: TargetDown
>>>>>>     annotations:
>>>>>>       message: '{{ printf "%.4g" $value }}% of the {{ $labels.job 
>>>>>> }}/{{ $labels.instance
>>>>>>         }} instances are down.'
>>>>>>     expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY 
>>>>>> (job,
>>>>>>       instance)) > 10
>>>>>>     for: 10m
>>>>>>     labels:
>>>>>>       severity: warning
>>>>>>
>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 19:18:37 
>>>>>> UTC+2:
>>>>>>
>>>>>>> Remove the match on service=~"mysql|cassandra" in your routing rule.
>>>>>>>
>>>>>>> I'm not saying with 100% certainty that your alert *doesn't* have a 
>>>>>>> service=xxx label; it's possible that it was added via other means, 
>>>>>>> such as 
>>>>>>> external_labels or alert_relabel_configs.  If you go into the 
>>>>>>> prometheus or 
>>>>>>> alertmanager web interface, you can see active alerts and their labels, 
>>>>>>> so 
>>>>>>> you'll know what you have.
>>>>>>>
>>>>>>> There was a nice web-based interface for testing alerting rules here:
>>>>>>> https://prometheus.io/webtools/alerting/routing-tree-editor/
>>>>>>> but it doesn't seem to work properly any more.
>>>>>>>
>>>>>>> On Thursday, 2 September 2021 at 15:48:57 UTC+1 [email protected] 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What should be the configuration in alertmanager.yml to match to 
>>>>>>>> the rule?
>>>>>>>>
>>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 15:22:55 
>>>>>>>> UTC+2:
>>>>>>>>
>>>>>>>>> Correct, that expression will only give "job" and "instance" 
>>>>>>>>> labels.
>>>>>>>>>
>>>>>>>>> I don't think your alertmanager rule will ever match on this alert.
>>>>>>>>>
>>>>>>>>> On Thursday, 2 September 2021 at 14:05:22 UTC+1 [email protected] 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I have defined several rule files, e.g. this general.rules.yml:
>>>>>>>>>> groups:
>>>>>>>>>> - name: general.rules
>>>>>>>>>>   rules:
>>>>>>>>>>   - alert: TargetDown
>>>>>>>>>>     annotations:
>>>>>>>>>>       message: '{{ printf "%.4g" $value }}% of the {{ $labels.job 
>>>>>>>>>> }}/{{ $labels.instance
>>>>>>>>>>         }} instances are down.'
>>>>>>>>>>     expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY 
>>>>>>>>>> (job,
>>>>>>>>>>       instance)) > 10
>>>>>>>>>>     for: 10m
>>>>>>>>>>     labels:
>>>>>>>>>>       severity: warning
>>>>>>>>>>
>>>>>>>>>> However, I don't see the correlation to service.
>>>>>>>>>>
>>>>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 
>>>>>>>>>> 13:58:11 UTC+2:
>>>>>>>>>>
>>>>>>>>>>> It looks like "service" is a label that you have set in the 
>>>>>>>>>>> prometheus alerting rule.
>>>>>>>>>>>
>>>>>>>>>>> On Thursday, 2 September 2021 at 11:52:20 UTC+1 
>>>>>>>>>>> [email protected] wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> can you please advise what is represented by a service in 
>>>>>>>>>>>> alertmanager configuration, e.g.
>>>>>>>>>>>> routes: 
>>>>>>>>>>>> # All alerts with service=mysql or service=cassandra 
>>>>>>>>>>>> # are dispatched to the database pager. - receiver: 
>>>>>>>>>>>> 'database-pager' group_wait: 10s matchers: 
>>>>>>>>>>>>  - service=~"mysql|cassandra"
>>>>>>>>>>>>
>>>>>>>>>>>> Where do I find the service in the rules or in Prometheus -> 
>>>>>>>>>>>> Alerts?
>>>>>>>>>>>>
>>>>>>>>>>>> THX
>>>>>>>>>>>>
>>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/65886618-72a0-4cfd-ba6e-71ead42be957n%40googlegroups.com.

[prometheus-users] Re: Alertmanager configuration: routes

Reply via email to