Hello, did you find a solution? I am facing the same issue, the new ALERTS
labels don't match, although they are completely static and there should be
no issue.
tags: missing labels, missing prometheus alert labels, missing static
labels, prometheus labels don't match
On Friday, February 8, 2019 at 7:19:07 PM UTC+1 Moses Moore wrote:
> - can you share your rule files ?
> 'fraid not, this is a production machine. If I diff the before/after of
> the alerts2.yml file, the only change is the "rulefile: alerts2.yml" bit in
> the "labels:" block of that one "procMissing" alert rule.
>
> - anything special in the logs?
>
> nope, and we're using log.level=debug . I need to look again because:
>
> It's a moot point now;
> `ALERTS{alertname="procMissing",alertstate="firing",rulefile=""}` has zero
> results now instead of 62 results yesterday. I made the alert rule change
> a week ago, and prometheus hasn't been restarted in the last five days.
> Maybe it takes 7d for ALERTS{alertstate="firing"} to die of old age if they
> aren't regenerated?
>
>
> Thanks for asking. If I can reproduce it in a cleanroom, I'll mention it
> again on the list.
>
>
>
>
> On Monday, 4 February 2019 11:43:14 UTC-5, Moses Moore wrote:
>
>> I'm using Prometheus 2.6.0 and go 1.11
>>
>> After changing an alert rule to have more labels and restarting
>> Prometheus, I have ALERTS{} metrics that match the older alert rule that
>> are still firing days later.
>>
>> old alert rule:
>> alert: procMissing
>> expr:
>> namedprocess_namegroup_num_procs{environment!="alpha",groupname!~"(cron|master|rsyslogd|snmpd|sshd)"}
>>
>> < 1
>> for: 2m
>> labels: { env: '{{$labels.environment}}', region: '${{labels.region}}',
>> severity: critical }
>> annotations:
>> summary: "{{$labels.groupname}} not running on
>> {{$labels.node}}.{{$labels.ip}}"
>> description: "num_procs for groupname {{$labels.groupname}} < 1 for 2
>> minutes"
>>
>> old alert metric:
>>
>> ALERTS{alertname="procMissing",alertstate="firing",env="beta",groupname="BusinessServer",ip="[redacted]",job="process-exporter",node="[redacted]",region="us-east-1",severity="critical"}
>>
>> new alert rule:
>> alert: procMissing
>> expr:
>> namedprocess_namegroup_num_procs{environment!="alpha",groupname!~"(cron|master|rsyslogd|snmpd|sshd)",groupname!~"Business.*"}
>>
>> < 1
>> for: 2m
>> labels: { env: '{{$labels.environment}}', region: '${{labels.region}}',
>> rulefile: alerts2.yml, severity: critical }
>> annotations:
>> summary: "{{$labels.groupname}} not running on
>> {{$labels.node}}.{{$labels.ip}}"
>> description: "num_procs for groupname {{$labels.groupname}} < 1 for 2
>> minutes"
>>
>> After updating alerts2.yml and restarting prometheus, the old alert
>> metric that matches "Business.*" and without the rulefile label still
>> appears with recent timestamps dated after I made the change and restarted.
>>
>> On one hand, http://prometheus:9090/alerts says the "procMissing" alert
>> is not firing nor pending, and http://prometheus:9090/rules says the
>> "rulefile" label is in the "procMissing" alert rule description.
>> On the other hand,
>> http://prometheus:9090/graph?g0.expr=ALERTS%7Balertname%3D%22procMissing%22%2Calertstate%3D%22firing%22%2Crulefile%3D%22%22%7D
>>
>> gives me fifty current metrics that are missing the 'rulefile' label.
>>
>> Shouldn't firing alerts match the alertrules? I'd understand if the
>> pre-change ALERTS{} had timestamps that predated the change, but these have
>> timestamps after the change.
>> Why are there current ALERTS{alertstate="firing"} metrics that aren't in
>> the /alerts http report? Maybe I misunderstand the nature of ALERTS{}
>> metrics.
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/ffc0999b-125b-48af-8599-6d3a31b237a1n%40googlegroups.com.