[prometheus-users] Re: Why don't I see gaps in instance vectors if Prometheus itself is down by < 5 Mins

Brian Candler Thu, 20 Jan 2022 08:45:55 -0800

On Thursday, 20 January 2022 at 11:45:46 UTC [email protected] wrote:

> Thanks for the explanation, I thought staleness is applicable only to 
> Prometheus Targets, haven't imagined this concept to Prometheus restarts 
> and unavailability. So, you say 'statelessness' is also applied to 
> Prometheus availability.

No, I'm saying the opposite.

If prometheus fails to scrape a metric which it scraped before in the same
scrape job, it inserts a staleness marker. However if you stop and start
prometheus, then there is no staleness marker to write.

Prometheus therefore falls back to its normal default behaviour, which is
to look back up to 5 minutes for the previous valid data point.

> With this approach, how do the users know the truth? Why did Prometheus
invoke query look-back? Is it due to Prometheus Target
unavailability/unreachability or Prometheus unavailability?

None of those. It's quite simply because time series consist of values at
particular points in time, e.g. X1 at T1, X2 and T2, X3 at T3, where Tn are
the exact times they were scraped.

When you ask for the value of a timeseries at some arbitrary time T, there
is almost certainly not going to be any data point which exists at exactly
time T (it would be extremely unlikely). Therefore, Prometheus defines the
value of a timeseries at time T to be the value of the *most recent data
point* at or before time T. But it also constrains itself to looking back
no more than 5 minutes (this is tunable) so as not to expend an unlimited
amount of effort looking for a data point hours or even years earlier.

Think about what happens when prometheus draws a graph. It samples the
timeseries at a series of steps across a time window: say at time 01:00,
01:30, 02:00, 02:30, 03:00 etc. The start/end times and the size of the
steps will be determined by your graphing software and your screen
resolution.

Now say you are scraping data points at 1 minute intervals, and points were
read in as X1 at 01:17, X2 at 02:18, X3 at 03:17.

The graph will show:
01:00 - no data (no value within the previous 5 minutes)
01:30 - value is X1
02:00 - value is X1
02:30 - value is X2
03:00 - value is X2
03:30 - value is X3

Note that a timeseries has no idea of what its "scrape interval" is,
because there isn't one. Although *normally* they are scraped at *roughly*
regular intervals, nothing enforces this. You could have a scrape job
running at 1m intervals, and then switch it to 15s intervals for a while,
and then switch it back to 1m intervals. All the points will be saved in
the timeseries. But if you shutdown prometheus, well, there's no way of
knowing this has occurred. There will be a larger interval between scrapes
than "normal", but as far as prometheus knows, you might just have missed a
couple of scrapes, or increased the scrape interval for a little while.

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/41a7960e-6b98-4023-a3e5-845fd8c76024n%40googlegroups.com.

[prometheus-users] Re: Why don't I see gaps in instance vectors if Prometheus itself is down by < 5 Mins

Reply via email to