I have the same issue for one node the CPU usage, Prometheus firing false possitive. I checked in the GCP monitoring CPU Usage is not over 80%
On Tuesday, October 2, 2018 at 10:14:38 AM UTC-4 [email protected] wrote: > Hi, > > I have 2 cpu usage alerts set up in prometheus: > > alert: Cpu_Usage_Greater_Than_70_Pct > <https://monitoring.roomvo.com/prometheus/graph?g0.expr=ALERTS%7Balertname%3D%22Cpu_Usage_Greater_Than_70_Pct%22%7D&g0.tab=1> > expr: cpu:usage > > 70 > <https://monitoring.roomvo.com/prometheus/graph?g0.expr=cpu%3Ausage+%3E+70&g0.tab=1> > labels: > severity: warning > annotations: > description: CPU Usage on these nodes is greater than 70 pct (over 5m) > severity: warning > summary: 'WARNING: CPU Usage is greater than 70 pct' > > > alert: Cpu_Usage_Greater_Than_90_Pct > <https://monitoring.roomvo.com/prometheus/graph?g0.expr=ALERTS%7Balertname%3D%22Cpu_Usage_Greater_Than_90_Pct%22%7D&g0.tab=1> > expr: cpu:usage > > 90 > <https://monitoring.roomvo.com/prometheus/graph?g0.expr=cpu%3Ausage+%3E+90&g0.tab=1> > labels: > severity: danger > annotations: > description: CPU Usage on these nodes is greater than 90 pct (over 5m) > severity: danger > summary: 'DANGER: CPU Usage is greater than 90 pct' > > > > Where cpu:usage is defind as: > > File: recording_rules.yml; Group name: Cpu Usage Percentage (over 5m) > ------- > record: cpu:usage > <https://monitoring.roomvo.com/prometheus/graph?g0.expr=cpu%3Ausage&g0.tab=1> > expr: 100 > * (1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) BY (instance, > job))) > <https://monitoring.roomvo.com/prometheus/graph?g0.expr=100+%2A+%281+-+%28avg%28irate%28node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B5m%5D%29%29+BY+%28instance%2C+job%29%29%29&g0.tab=1> > > > > > This morning, the "*cpu usage greater than 90 pct*" alert fired (and was > sent to AlertManager that emailed several people), but the 70% one did not > fire. Upon further investigation of Prometheus DB (via /graph GUI), I see > that cpu% was never greater than ever 40% on any node for several days. > This seems to be a false positive alarm. > > Is there a way for me to debug the root cause? > > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4fd6c42e-c807-4f99-ab25-d0ab1faa1267n%40googlegroups.com.

