[prometheus-users] Re: False Positive Alerts with wmi_cpu_time_total

'Vitalii Ludanenkov' via Prometheus Users Sun, 20 Feb 2022 11:42:17 -0800

I already attached screenshots with the rule, the actual query 
results(screenshot without > 20, because it doesn't show anything). The 
threshold is 20%. But the graph doesn't reach it, nonetheless, it causes an 
alert. 
On Sunday, February 20, 2022 at 6:56:44 PM UTC+2 Brian Candler wrote:


> As far as I can see, you haven't shown your actual alerting rule.
>
> However, it's straightforward to debug this: paste your entire alerting 
> "expr" into the PromQL query interface.  Wherever the line is present, it 
> means an alert will fire.  You can then work backwards from that to find 
> the problem with your expr.
>
> For example, say you have this rule:
>     expr: avg by (instance) 
> (rate(node_cpu_seconds_total{mode="idle"}[2m])) < 0.8
>
> Paste exactly "avg by (instance) 
> (rate(node_cpu_seconds_total{mode="idle"}[2m])) < 0.8" into the PromQL 
> browser to see if and when it fires.
>
> In PromQL, the expression "foo" generates a vector: the set of all 
> timeseries whose metric name is "foo".  Then "foo < 0.8" is a filter, not a 
> boolean.  It filters the vector to only those whose value is less than 
> 0.8.  When used as an alerting expression, you get an alert if the vector 
> is not empty.
>
> On Sunday, 20 February 2022 at 16:38:10 UTC [email protected] 
> wrote:
>
>> Hello everybody. 
>> We are facing some issues with CPU monitoring.
>> Our graphs don't show reaching the thresholds even one time, not for 3m.
>> All info and screenshots will be below.
>> Alert is configured to alert at 20%. Related only to the blue graph.
>>
>> [image: Screenshot 2022-02-18 133538.png]
>>
>> [image: Screenshot 2022-02-18 133642.png]
>>
>> Prometheus creates a massive amount of alerts in our Opsgenie, there are 
>> no issues with other alerts or even with a threshold of 60%.
>> [image: Screenshot 2022-02-18 133820.png]
>>
>> Alert query:
>>
>> [image: Screenshot 2022-02-18 134142.png]
>>
>> Maybe you have some suggestions on what can cause that flapping and 
>> triggering the alert? 
>> Already tried to check graphs by 1,2,5,10 minute, by the hour and etc, 
>> there is nothing that should result in an alert.
>> Also, there are no such alerts from Cloudwatch monitoring.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/aadc847f-6f8b-439e-8c52-1179c70835a2n%40googlegroups.com.

[prometheus-users] Re: False Positive Alerts with wmi_cpu_time_total

Reply via email to