I am using the Prometheus recording rule to capture the CPU usage over 5 
mins and later want to use it to get maximum CPU used for 5 mins in the 
last 30 days.

Recording Rule

- expr: 1 - avg by (instance) 
> (rate(node_cpu_seconds_total{mode="idle"}[5m]))
>   record: instance:node_cpu_usage:rate5m


Max. CPU used in last 30 days over 5 mins

max_over_time(instance:node_cpu_usage:rate5m[30d])

 
Now, let's imagine a situation where Prometheus starts to pull data with an 
interval of 15sec which started at 00:00:00 so

   1. 1st pull was at 00:00:00. Here, the instance:node_cpu_usage:rate5m is 
   not calculated since the rate function needs a minimum of 2 data points.
   2. 2nd pull was at 00:00:15. *Here, the instance:node_cpu_usage:rate5m 
   is calculated as the difference of 2nd pull - 1st pull data divided by 300 
   seconds. Now, this is the issue of why rate function dividing by 300 
   instead of 15.*
   3. 3rd pull was at 00:00:45. Again, the same scenario occurs.
   4. The wrong data is being saved in instance:node_cpu_usage:rate5m till 
   5th minute.

[image: prometheus_rate_less_data.png]


*So, doing a max_over_time(instance:node_cpu_usage:rate5m[30d]) will for 
sure give me the first value which is wrong.*
Also if there are server crashes or network issue then the rate gives wrong 
data.

How can I overcome this outlier?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/995498ad-5ac2-4b97-a53f-0ae3f1e97c18%40googlegroups.com.

Reply via email to