[prometheus-users] Re: Alertmanager configuration: routes

Brian Candler Fri, 03 Sep 2021 11:05:42 -0700

And I forgot to say: given an alerting rule like

  - alert: UpDown
    expr: up == 0
    for: 3m


then the label alertname="UpDown" is also added automatically (similar to 
how "job" and "instance" labels are added automatically at scrape time).

So at the end, you have a mixture of labels from the exporter, plus 
system-generated labels like "job" and "instance" and "alertname", plus any 
labels you've chosen to add yourself.  The "matchers" in alertmanager can 
match any of these.

On Friday, 3 September 2021 at 18:47:22 UTC+1 Brian Candler wrote:

> No, definitely not. There is no such thing as "service" in Prometheus - 
> Alertmanager config.
>
> But if you wish, you can have a *label* on your timeseries called 
> "service", or called "environment", or anything you like.  You can add 
> labels at scrape time:
>
>   - job_name: node
>     scrape_interval: 1m
>     static_configs:
>       - targets:
>           - bar:9100
>           - baz:9100
>         # these labels are added to every timeseries scraped from those 
> targets
> *        labels:*
> *          environment: prod*
>
> (note that "job" and "instance" labels are also added automatically as 
> part of the scrape; the remaining labels come from the exporter).
>
> Or you can add a label in your alerting rule:
>
> groups:
> - name: UpDown
>   rules:
>   - alert: UpDown
>     expr: up == 0
>     for: 3m
>     # these labels are added to every alert generated from this rule
> *    labels:*
> *      environment: prod*
>
> Note: it would be unusual to add label "environment: prod" in an alerting 
> rule, but adding a label like "severity: critical" or "team: oncall" is 
> more common - something which is specific to that alert, rather than the 
> server.
>
> In either of these cases, the alert which arrives at alertmanager will 
> have the given labels on it.  Hence you can match on it in alertmanager, to 
> decide how to route the alert.
>
> On Friday, 3 September 2021 at 09:26:35 UTC+1 [email protected] wrote:
>
>> This means
>> alert in Prometheus - Rules config
>> is equal to
>> service in Prometheus - Alertmanager config
>> ?
>>
>> Brian Candler schrieb am Freitag, 3. September 2021 um 10:13:24 UTC+2:
>>
>>> Note that an "alertname" label is added automatically, so you could 
>>> match on alertname="TargetDown" if you want.  Doesn't scale very well, but 
>>> with a small number of rules that approach will get you started.
>>>
>>> If you go to your prometheus web interface, at prometheus:9090, and 
>>> click on the "Alerts" tab at the top, then you can see firing alerts, 
>>> including all the labels on them.
>>>
>>> [image: img1.png]
>>>
>>> On Friday, 3 September 2021 at 09:09:56 UTC+1 Brian Candler wrote:
>>>
>>>> The only labels you can match on from that rule are "severity: 
>>>> warning", and the "job" and "instance" labels.
>>>>
>>>> > What must the alertmanager config be for this rule?
>>>>
>>>> You don't need *any* matching rules in alertmanager.  At simplest, you 
>>>> can just have
>>>>
>>>> route:
>>>>   receiver: default
>>>>
>>>> receivers:
>>>> - name: default
>>>>   email_configs:
>>>>   - to: [email protected]
>>>>     send_resolved: true
>>>>   - to: [email protected]
>>>>     send_resolved: true
>>>>
>>>> Any more than that, and it depends on your business requirements.  Do 
>>>> you want all alerts with severity "warning" to be treated differently?  
>>>> Use 
>>>> a routing rule (in the "routes" section under "route").  Do you want a 
>>>> certain subset of targets to be handled by a particular team? Then either 
>>>> add a label in the alerting rules themselves, or ensure that those targets 
>>>> already have a particular label in their scrape config, and match that 
>>>> label in the "routes" section.
>>>>
>>>> On Friday, 3 September 2021 at 08:20:49 UTC+1 [email protected] wrote:
>>>>
>>>>> It's clear that the config
>>>>> - service=~"mysql|cassandra"
>>>>> does not match the rule.
>>>>> This was just an example.
>>>>>
>>>>> But this question is still open:
>>>>> What must the alertmanager config be for this rule?
>>>>> groups:
>>>>> - name: general.rules
>>>>>   rules:
>>>>>   - alert: TargetDown
>>>>>     annotations:
>>>>>       message: '{{ printf "%.4g" $value }}% of the {{ $labels.job 
>>>>> }}/{{ $labels.instance
>>>>>         }} instances are down.'
>>>>>     expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY (job,
>>>>>       instance)) > 10
>>>>>     for: 10m
>>>>>     labels:
>>>>>       severity: warning
>>>>>
>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 19:18:37 
>>>>> UTC+2:
>>>>>
>>>>>> Remove the match on service=~"mysql|cassandra" in your routing rule.
>>>>>>
>>>>>> I'm not saying with 100% certainty that your alert *doesn't* have a 
>>>>>> service=xxx label; it's possible that it was added via other means, such 
>>>>>> as 
>>>>>> external_labels or alert_relabel_configs.  If you go into the prometheus 
>>>>>> or 
>>>>>> alertmanager web interface, you can see active alerts and their labels, 
>>>>>> so 
>>>>>> you'll know what you have.
>>>>>>
>>>>>> There was a nice web-based interface for testing alerting rules here:
>>>>>> https://prometheus.io/webtools/alerting/routing-tree-editor/
>>>>>> but it doesn't seem to work properly any more.
>>>>>>
>>>>>> On Thursday, 2 September 2021 at 15:48:57 UTC+1 [email protected] 
>>>>>> wrote:
>>>>>>
>>>>>>> What should be the configuration in alertmanager.yml to match to the 
>>>>>>> rule?
>>>>>>>
>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 15:22:55 
>>>>>>> UTC+2:
>>>>>>>
>>>>>>>> Correct, that expression will only give "job" and "instance" labels.
>>>>>>>>
>>>>>>>> I don't think your alertmanager rule will ever match on this alert.
>>>>>>>>
>>>>>>>> On Thursday, 2 September 2021 at 14:05:22 UTC+1 [email protected] 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I have defined several rule files, e.g. this general.rules.yml:
>>>>>>>>> groups:
>>>>>>>>> - name: general.rules
>>>>>>>>>   rules:
>>>>>>>>>   - alert: TargetDown
>>>>>>>>>     annotations:
>>>>>>>>>       message: '{{ printf "%.4g" $value }}% of the {{ $labels.job 
>>>>>>>>> }}/{{ $labels.instance
>>>>>>>>>         }} instances are down.'
>>>>>>>>>     expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY 
>>>>>>>>> (job,
>>>>>>>>>       instance)) > 10
>>>>>>>>>     for: 10m
>>>>>>>>>     labels:
>>>>>>>>>       severity: warning
>>>>>>>>>
>>>>>>>>> However, I don't see the correlation to service.
>>>>>>>>>
>>>>>>>>> Brian Candler schrieb am Donnerstag, 2. September 2021 um 13:58:11 
>>>>>>>>> UTC+2:
>>>>>>>>>
>>>>>>>>>> It looks like "service" is a label that you have set in the 
>>>>>>>>>> prometheus alerting rule.
>>>>>>>>>>
>>>>>>>>>> On Thursday, 2 September 2021 at 11:52:20 UTC+1 [email protected] 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> can you please advise what is represented by a service in 
>>>>>>>>>>> alertmanager configuration, e.g.
>>>>>>>>>>> routes: 
>>>>>>>>>>> # All alerts with service=mysql or service=cassandra 
>>>>>>>>>>> # are dispatched to the database pager. - receiver: 
>>>>>>>>>>> 'database-pager' group_wait: 10s matchers: 
>>>>>>>>>>>  - service=~"mysql|cassandra"
>>>>>>>>>>>
>>>>>>>>>>> Where do I find the service in the rules or in Prometheus -> 
>>>>>>>>>>> Alerts?
>>>>>>>>>>>
>>>>>>>>>>> THX
>>>>>>>>>>>
>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/35384490-71ed-4872-bd16-dea5d17ff450n%40googlegroups.com.

[prometheus-users] Re: Alertmanager configuration: routes

Reply via email to