I don't know how GCP calculates their CPU metrics, but node_cpu_seconds_total looks to contain statistics for user/kernel/interrupt etc spaces. Maybe you can make a separate graph based on each of those and see if one is much higher ( https://www.robustperception.io/understanding-machine-cpu-usage)
On Thu, Jul 8, 2021 at 2:54 PM James S <[email protected]> wrote: > > I have changed the query to > sum(rate(node_cpu_seconds_total{mode!="idle"} [5m])) by (node) / > sum(kube_node_status_capacity_cpu_cores) by node > > But the result is the same. my problem is not fixed > > On Thursday, July 8, 2021 at 12:49:54 PM UTC-4 James S wrote: > >> >> GCP monitoring CPU usage for the node >> [image: 78903591-5A30-42EC-934D-ED4065F3B46B.png] >> On Thursday, July 8, 2021 at 12:48:05 PM UTC-4 James S wrote: >> >>> It is 4 CPU machine >>> the Grafana graph: >>> [image: DAD8018C-0081-4856-8D89-C4700BB65F23.png] >>> GCP monitoring: >>> [image: 78903591-5A30-42EC-934D-ED4065F3B46B.png] >>> On Thursday, July 8, 2021 at 12:13:23 PM UTC-4 Stuart Clark wrote: >>> >>>> On 2021-07-08 16:07, James S wrote: >>>> > We do not see any stress on the cluster and we do not see this in GCP >>>> > cloud monitoring this behavior. >>>> > >>>> >>>> What does the graph of the metric look like? >>>> >>>> Is this a single or multiple CPU machine? >>>> >>>> > On Thursday, July 8, 2021 at 9:50:37 AM UTC-4 Stuart Clark wrote: >>>> > >>>> >> On 08/07/2021 14:31, James S wrote: >>>> >>> We are getting False positive for only one node all the time. we >>>> >> do >>>> >>> not have this issue with other nodes >>>> >>> >>>> >>> we have the rule configured for the CPU usage was >>>> >>> >>>> >>> alert:NodeCPUUtilWar >>>> >>> expr: instance:node_cpu_utilisation:rate1m > 0.8 >>>> >>> for: 5m >>>> >>> >>>> >>> record: instance:node_cpu_utilisation:rate1m >>>> >>> - expr: >>>> >>> 1 - avg without (cpu, mode) >>>> >>> (rate(node_cpu-seconds_total{job="node_exporter", mode ="idle"} >>>> >> [1m])) >>>> >>> >>>> >> What makes you say it is a false positive? What does the graph of >>>> >> that >>>> >> metric show? >>>> >> >>>> >>>> -- >>>> Stuart Clark >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/7c0120dd-6e8c-4e55-9f46-a97d0d176229n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/7c0120dd-6e8c-4e55-9f46-a97d0d176229n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAOAKi8wDygYLbp-vyHRTq3a6A1OVxA5D0vd6eGJakUSd__N3Gg%40mail.gmail.com.

