[prometheus-users] Re: Graph Tab in Prometheus

Brian Candler Thu, 18 Aug 2022 01:46:46 -0700

Presumably you are using the PromQL query browser built into prometheus? 
(Not some third party tool like Grafana etc?)

When you draw a graph from time T1 to T2, you send the prometheus API a range 
query 
<https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries> 
to repeatedly evaluate an instant vector query over a time range from T1 to 
T2 with some step S.  The step S is chosen by the client so that it a 
suitable number fit in the display, e.g. if it wants 200 data points then 
it could chose step = (T2 - T1) / 200.  In the prometheus graph view you 
can see this by moving your mouse left and right over the graph; a pop-up 
shows you each data point, and you can see it switch from point to point as 
you move left to right.

Therefore, it's showing the values of the timeseries at the instants T1, 
T1+S, T1+2S, ... T2-S,T2.

When evaluating a timeseries at a given instant in time, it finds the 
closest value *at or before* that time (up to a maximum lookback interval, 
which by default is 5 minutes).

Therefore, your graph is showing *samples* of the data in the TSDB.  If you 
zoom out too far, you may be missing "interesting" values.  For example:

TSDB :  0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0  ...
Graph:       0         0         1         0         0 ...

Counters make this less of a problem: you can get your graph to show how 
the counter has *increased* between two adjacent points (usually then 
divided by the step time, to get a rate).

However, the problem for a metric like ALERTS is it's not a counter, and it 
doesn't even switch between 0 and 1, but the whole timeseries appears and 
disappears.  (In fact, it creates separate timeseries for when the alert is 
in state "pending" and "firing").  If you graph step is more than 5 
minutes, you may not catch the alert's presence at all.

What you could try is a query like this:

max_over_time(ALERTS{alertname="CPUUtilization"}[1h])

The inner query is a range vector: it returns all data points within a 1 
hour window, between 1 hour before the evaluation time up to the evaluation 
time.  Then if *any* data points exist in that window, the highest one 
returned, forming an instant vector again.  When your graph sweeps this 
expression over a time period from T1 to T2, then each data point will 
cover one hour. That should catch the "missing" samples.

Of course, the time window is fixed to 1h in that query, and you may need 
to adjust it depending on your graph zoom level, to match the time period 
between adjacent points on the graph.  If you're using grafana, there's a magic 
variable 
<https://grafana.com/docs/grafana/latest/variables/variable-types/global-variables/#__interval>

$__interval you can use.  I vaguely remember seeing a proposal for PromQL 
to have a way of referring to "the current step interval" in a range vector 
expression, but I don't know what happened to that.

HTH,

Brian.

On Wednesday, 17 August 2022 at 23:21:03 UTC+1 [email protected] wrote:

> I am currently looking for all CPU alerts using a query of 
> ALERTS{alertname="CPUUtilization"}
>
> I am stepping through the graph time frame one click at a time.  
>
> At the 12h time, I get one entry.  At 1d I get zero entries.  At 2d, I get 
> 4 entries but not the one I found at 12h.  I would expect to get everything 
> from 2d to now.
>
> At 1w, I get 8 entries but at 2w, I only get 5 entries.  I would expect to 
> get everything from 2w to now.
>
> Last week I ran this same query and found the alert I was looking for back 
> in April.  Today I ran the same query and I cannot find that alert from 
> April.
>
> I see this behavior in multiple Prometheus environments.
>
> Is this a problem or the way the graphing works in Prometheus?
>
> Prometheus version is 2.29.1
> Prometheus retention period is 1y
> DB is currently 1.2TB.  There are DBs as large as 5TB in other Prometheus 
> environments.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bc524790-336c-43bf-b187-9bbfd02bca02n%40googlegroups.com.

[prometheus-users] Re: Graph Tab in Prometheus

Reply via email to