[prometheus-users] Re: Monotonously increasing cost calculation over a selected duration of time in grafan/prometheus

Brian Candler Tue, 02 Jun 2020 08:40:27 -0700

sum() works across multiple timeseries at a given instant.

However, the problem seems to be that your metrics are different.  
"container_cpu_usage_seconds_total" is almost certainly a counter which 
increases monotonically, but "container_memory_usage_bytes" is almost 
certainly a gauge which goes up and down.

I suggest you start by graphing the two separately, to get to a feel of how
they look.

The question most people want to answer is "how much resource did I use
over the previous 3 months"?

For the counter, to get the increase over 3 months you can use the
increase() function with a range vector:

increase(container_cpu_usage_seconds_total[90d])

Or more simply, just subtract the values now and 90 days previously: e.g.

container_cpu_usage_seconds_total - container_cpu_usage_seconds_total
offset 90d

- but that will give wrong results if the counter has reset to zero during
that time, so increase() is strongly recommended.

In the prometheus UI, in the "exec" view you'll see the values for right
now (i.e. how now compares with 90 days ago). If you graph this, it will
be swept over time. So the value shown for 7 days ago won't be how much
you were using 7 days ago, but how much you used over the period from 97
days ago to 7 days ago.

If you want the graph to show how much resource you were using *at that
instant*, then you use rate() on the counter.

For the container memory usage you probably want to use avg_over_time()
with a range vector, e.g.

avg_over_time(container_memory_usage_bytes[90d])

Again, at a given point in time, this will show the average memory usage
over the previous 90 day period. If you want the instantaneous usage, then
the bare metric (container_memory_usage_bytes) is what you want.

If you want the graph to show the *cumulative* usage of resource up to and
including that time, then container_cpu_usage_seconds_total is already
cumulative - although it starts from an arbitrary offset, and it may reset
to zero at inopportune times.
*Cumulative" usage of memory is more awkward - are you saying you want a
metered usage measured in GB-seconds? You would need to integrate the
value, i.e. the inverse of deriv(), and I don't know how to do that with
prometheus.

Note: I have not bothered summing across series - with the examples above
you'll get as many result timeseries as you have input timeseries. You can
add sum(...) or sum by (labels) (...) across those expressions as
required. e.g.

sum by (namespace,pod) (...)

Note that for cpu_usage_seconds_total you may want to filter out "idle" CPU
seconds if they are given as a separate metric, otherwise summing across
all the dimensions will always add up to 100%. Ditto for memory and "free"
or "buffer/cache".

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/ce55eb56-c167-43ff-8390-29147b4851ca%40googlegroups.com.

[prometheus-users] Re: Monotonously increasing cost calculation over a selected duration of time in grafan/prometheus

Reply via email to