I have the same issue for one node the CPU usage, Prometheus firing false 
possitive.
I checked in the GCP monitoring CPU Usage is not over 80%

On Tuesday, October 2, 2018 at 10:14:38 AM UTC-4 [email protected] wrote:

> Hi,
>
> I have 2 cpu usage alerts set up in prometheus:
>
> alert: Cpu_Usage_Greater_Than_70_Pct 
> <https://monitoring.roomvo.com/prometheus/graph?g0.expr=ALERTS%7Balertname%3D%22Cpu_Usage_Greater_Than_70_Pct%22%7D&g0.tab=1>
> expr: cpu:usage >
>   70 
> <https://monitoring.roomvo.com/prometheus/graph?g0.expr=cpu%3Ausage+%3E+70&g0.tab=1>
> labels:
>   severity: warning
> annotations:
>   description: CPU Usage on these nodes is greater than 70 pct (over 5m)
>   severity: warning
>   summary: 'WARNING: CPU Usage is greater than 70 pct'
>
>
> alert: Cpu_Usage_Greater_Than_90_Pct 
> <https://monitoring.roomvo.com/prometheus/graph?g0.expr=ALERTS%7Balertname%3D%22Cpu_Usage_Greater_Than_90_Pct%22%7D&g0.tab=1>
> expr: cpu:usage >
>   90 
> <https://monitoring.roomvo.com/prometheus/graph?g0.expr=cpu%3Ausage+%3E+90&g0.tab=1>
> labels:
>   severity: danger
> annotations:
>   description: CPU Usage on these nodes is greater than 90 pct (over 5m)
>   severity: danger
>   summary: 'DANGER: CPU Usage is greater than 90 pct'
>
>
>
> Where cpu:usage is defind as:
>
> File: recording_rules.yml; Group name: Cpu Usage Percentage (over 5m)
> -------
> record: cpu:usage 
> <https://monitoring.roomvo.com/prometheus/graph?g0.expr=cpu%3Ausage&g0.tab=1>
> expr: 100
>   * (1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) BY (instance,
>   job))) 
> <https://monitoring.roomvo.com/prometheus/graph?g0.expr=100+%2A+%281+-+%28avg%28irate%28node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B5m%5D%29%29+BY+%28instance%2C+job%29%29%29&g0.tab=1>
>
>
>
>
> This morning, the "*cpu usage greater than 90 pct*" alert fired (and was 
> sent to AlertManager that emailed several people), but the 70% one did not 
> fire.  Upon further investigation of Prometheus DB (via /graph GUI), I see 
> that cpu% was never greater than ever 40% on any node for several days.
> This seems to be a false positive alarm.
>
> Is there a way for me to debug the root cause? 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4fd6c42e-c807-4f99-ab25-d0ab1faa1267n%40googlegroups.com.

Reply via email to