I am using the Prometheus recording rule to capture the CPU usage over 5
mins and later want to use it to get maximum CPU used for 5 mins in the
last 30 days.
Recording Rule
- expr: 1 - avg by (instance)
> (rate(node_cpu_seconds_total{mode="idle"}[5m]))
> record: instance:node_cpu_usage:rate5m
Max. CPU used in last 30 days over 5 mins
max_over_time(instance:node_cpu_usage:rate5m[30d])
Now, let's imagine a situation where Prometheus starts to pull data with an
interval of 15sec which started at 00:00:00 so
1. 1st pull was at 00:00:00. Here, the instance:node_cpu_usage:rate5m is
not calculated since the rate function needs a minimum of 2 data points.
2. 2nd pull was at 00:00:15. *Here, the instance:node_cpu_usage:rate5m
is calculated as the difference of 2nd pull - 1st pull data divided by 300
seconds. Now, this is the issue of why rate function dividing by 300
instead of 15.*
3. 3rd pull was at 00:00:45. Again, the same scenario occurs.
4. The wrong data is being saved in instance:node_cpu_usage:rate5m till
5th minute.
[image: prometheus_rate_less_data.png]
*So, doing a max_over_time(instance:node_cpu_usage:rate5m[30d]) will for
sure give me the first value which is wrong.*
Also if there are server crashes or network issue then the rate gives wrong
data.
How can I overcome this outlier?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/995498ad-5ac2-4b97-a53f-0ae3f1e97c18%40googlegroups.com.