Yes, you've got it. It's easy to test your hypothesis: simply paste the
alert rule expression
100 - (*avg by(instance,cluster)*
(rate(node_cpu_seconds_total{mode="idle"}[2m]))
* 100) > 95
into the PromQL query browser in the prometheus web interface, and you'll
see all the results - including their labels.
I believe you'll get results like
{instance="foo",cluster="bar"} 98.4
There won't be any "env" label there because you've aggregated it away.
Try using: *avg by(instance,cluster,env)* instead.
Or you could have separate alerting rules per environment, and re-apply the
label in your rule:
expr: 100 - (*avg by(instance,cluster)*
(rate(node_cpu_seconds_total{env="dev",mode="idle"}[2m]))
* 100) > 98
labels:
env: dev
On Monday, 22 August 2022 at 21:21:51 UTC+1 rs wrote:
> Thanks Brian, I am in the midst of setting up a slack receiver (to weed
> out the alerts going to the wrong channel). One thing I have noticed is,
> the alerts being routed incorrectly may actually have to do with a rule:
>
> - alert: High_Cpu_Load
>
> expr: 100 - (*avg by(instance,cluster)*
> (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 95
>
> for: 0m
>
> labels:
>
> severity: warning
>
> annotations:
>
> summary: Host high CPU load (instance {{ $labels.instance }})
>
> description: "CPU load is > 95%\n INSTANCE = {{ $labels.instance }}\n
> VALUE = %{{ $value | humanize }}\n LABELS = {{ $labels }}"
>
> I believe the issue may be that I'm not passing in 'env' into the
> expression and that is causing an issue with the alerts. Just a hunch, but
> I appreciate you pointing me in the right direction!
>
> On Monday, August 22, 2022 at 3:06:47 PM UTC-4 Brian Candler wrote:
>
>> "Looks correct but still doesn't work how I expect"
>>
>> What you've shown is a target configuration, not an alert arriving at
>> alertmanager.
>>
>> Therefore, I'm suggesting you take a divide-and-conquer approach. First,
>> work out which of your receiver routing rules is being triggered (is it the
>> 'production' receiver, or is it the 'slack' receiver?) by making them
>> different. This will point to which routing rule is or isn't being
>> triggered. And then you can work out why.
>>
>> There are all sorts of reasons it might not work, other than the config
>> you've shown. For example, if you have any target rewriting or metric
>> rewriting rules which set the env; if the exporter itself sets "env" and
>> you have honor_labels set; and so on.
>>
>> Hence the first part is to find out from real alert events: is the alert
>> being generated without the "dev" label? In that case alert routing is just
>> fine, and you need to work out why that label is wrong (and you're looking
>> at the prometheus side). Or is the alert actually arriving at alertmanager
>> with the "dev" label, in which case you're looking at the alertmanager side
>> to find out why it's not being routed as expected.
>>
>> On Monday, 22 August 2022 at 18:45:25 UTC+1 rs wrote:
>>
>>> I checked the json file and the tagging was correct. Here's an example:
>>>
>>>
>>> {
>>>
>>> "labels": {
>>>
>>> "cluster": "X Stage Servers",
>>>
>>> "env": "dev"
>>>
>>> },
>>>
>>> "targets": [
>>>
>>> "x:9100",
>>>
>>> "y:9100",
>>>
>>> "z:9100"
>>>
>>> ]
>>>
>>> },
>>> This is being sent to the production/default channel.
>>>
>>> On Friday, August 12, 2022 at 11:29:34 AM UTC-4 Brian Candler wrote:
>>>
>>>> Firstly, I'd drop the "continue: true" lines. They are not required,
>>>> and are just going to cause confusion.
>>>>
>>>> The 'slack' and 'production' receivers are both sending to
>>>> #prod-channel. So you'll hit this if the env is not exactly "dev". I
>>>> suggest you look in detail at the alerts themselves: maybe they're tagging
>>>> with "Dev" or "dev " (with a hidden space).
>>>>
>>>> If you change the default 'slack' receiver to go to a different
>>>> channel, or use a different title/text template, it will be easier to see
>>>> if this is the problem or not.
>>>>
>>>>
>>>> On Friday, 12 August 2022 at 09:36:22 UTC+1 rs wrote:
>>>>
>>>>> Hi everyone! I am configuring alertmanager to send outputs to a prod
>>>>> slack channel and dev slack channel. I have checked with the routing tree
>>>>> editor and everything should be working correctly.
>>>>> However, I am seeing some (not all) alerts that are tagged with 'env:
>>>>> dev' being sent to the prod slack channel. Is there some sort of old
>>>>> configuration caching happening? Is there a way to flush this out?
>>>>>
>>>>> --- Alertmanager.yml ---
>>>>> global:
>>>>> http_config:
>>>>> proxy_url: 'xyz'
>>>>> templates:
>>>>> - templates/*.tmpl
>>>>> route:
>>>>> group_by: [cluster,alertname]
>>>>> group_wait: 10s
>>>>> group_interval: 30m
>>>>> repeat_interval: 24h
>>>>> receiver: 'slack'
>>>>> routes:
>>>>> - receiver: 'production'
>>>>> match:
>>>>> env: 'prod'
>>>>> continue: true
>>>>> - receiver: 'staging'
>>>>> match:
>>>>> env: 'dev'
>>>>> continue: true
>>>>> receivers:
>>>>> #Fallback option - Default set to production server
>>>>> - name: 'slack'
>>>>> slack_configs:
>>>>> - api_url: 'api url'
>>>>> channel: '#prod-channel'
>>>>> send_resolved: true
>>>>> color: '{{ template "slack.color" . }}'
>>>>> title: '{{ template "slack.title" . }}'
>>>>> text: '{{ template "slack.text" . }}'
>>>>> actions:
>>>>> - type: button
>>>>> text: 'Query :mag:'
>>>>> url: '{{ (index .Alerts 0).GeneratorURL }}'
>>>>> - type: button
>>>>> text: 'Silence :no_bell:'
>>>>> url: '{{ template "__alert_silence_link" . }}'
>>>>> - type: button
>>>>> text: 'Dashboard :grafana:'
>>>>> url: '{{ (index .Alerts 0).Annotations.dashboard }}'
>>>>> - name: 'staging'
>>>>> slack_configs:
>>>>> - api_url: 'api url'
>>>>> channel: '#staging-channel'
>>>>> send_resolved: true
>>>>> color: '{{ template "slack.color" . }}'
>>>>> title: '{{ template "slack.title" . }}'
>>>>> text: '{{ template "slack.text" . }}'
>>>>> actions:
>>>>> - type: button
>>>>> text: 'Query :mag:'
>>>>> url: '{{ (index .Alerts 0).GeneratorURL }}'
>>>>> - type: button
>>>>> text: 'Silence :no_bell:'
>>>>> url: '{{ template "__alert_silence_link" . }}'
>>>>> - type: button
>>>>> text: 'Dashboard :grafana:'
>>>>> url: '{{ (index .Alerts 0).Annotations.dashboard }}'
>>>>> - name: 'production'
>>>>> slack_configs:
>>>>> - api_url: 'api url'
>>>>> channel: '#prod-channel'
>>>>> send_resolved: true
>>>>> color: '{{ template "slack.color" . }}'
>>>>> title: '{{ template "slack.title" . }}'
>>>>> text: '{{ template "slack.text" . }}'
>>>>> actions:
>>>>> - type: button
>>>>> text: 'Query :mag:'
>>>>> url: '{{ (index .Alerts 0).GeneratorURL }}'
>>>>> - type: button
>>>>> text: 'Silence :no_bell:'
>>>>> url: '{{ template "__alert_silence_link" . }}'
>>>>> - type: button
>>>>> text: 'Dashboard :grafana:'
>>>>> url: '{{ (index .Alerts 0).Annotations.dashboard }}'
>>>>>
>>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/0cd92b20-f8c6-4dbe-b136-c829ae202258n%40googlegroups.com.