sum() works across multiple timeseries at a given instant.

However, the problem seems to be that your metrics are different.  
"container_cpu_usage_seconds_total" is almost certainly a counter which 
increases monotonically, but "container_memory_usage_bytes" is almost 
certainly a gauge which goes up and down.

I suggest you start by graphing the two separately, to get to a feel of how 
they look.

The question most people want to answer is "how much resource did I use 
over the previous 3 months"?

For the counter, to get the increase over 3 months you can use the 
increase() function with a range vector:

    increase(container_cpu_usage_seconds_total[90d])

Or more simply, just subtract the values now and 90 days previously: e.g.

    container_cpu_usage_seconds_total - container_cpu_usage_seconds_total 
offset 90d

- but that will give wrong results if the counter has reset to zero during 
that time, so increase() is strongly recommended.

In the prometheus UI, in the "exec" view you'll see the values for right 
now (i.e. how now compares with 90 days ago).  If you graph this, it will 
be swept over time.  So the value shown for 7 days ago won't be how much 
you were using 7 days ago, but how much you used over the period from 97 
days ago to 7 days ago.

If you want the graph to show how much resource you were using *at that 
instant*, then you use rate() on the counter. 

For the container memory usage you probably want to use avg_over_time() 
with a range vector, e.g.

    avg_over_time(container_memory_usage_bytes[90d])

Again, at a given point in time, this will show the average memory usage 
over the previous 90 day period.  If you want the instantaneous usage, then 
the bare metric (container_memory_usage_bytes) is what you want.

If you want the graph to show the *cumulative* usage of resource up to and 
including that time, then container_cpu_usage_seconds_total is already 
cumulative - although it starts from an arbitrary offset, and it may reset 
to zero at inopportune times.
*Cumulative" usage of memory is more awkward - are you saying you want a 
metered usage measured in GB-seconds?  You would need to integrate the 
value, i.e. the inverse of deriv(), and I don't know how to do that with 
prometheus.

Note: I have not bothered summing across series - with the examples above 
you'll get as many result timeseries as you have input timeseries.  You can 
add sum(...) or sum by (labels) (...) across those expressions as 
required.  e.g.

    sum by (namespace,pod) (...)

Note that for cpu_usage_seconds_total you may want to filter out "idle" CPU 
seconds if they are given as a separate metric, otherwise summing across 
all the dimensions will always add up to 100%.  Ditto for memory and "free" 
or "buffer/cache".

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ce55eb56-c167-43ff-8390-29147b4851ca%40googlegroups.com.

Reply via email to