[prometheus-users] Re: Why don't I see gaps in instance vectors if Prometheus itself is down by < 5 Mins

[email protected] Thu, 20 Jan 2022 18:01:43 -0800

Thank you very much for the detailed explanation!

I will write here what I understood, please shout if I am wrong:


   1. If we have a self-monitoring job in Prometheus and if it restarts
      a. if restart time > 5Mins, we see gaps and there are no staleness 
   markers will be applied by Prometheus as its process got restarted
      b. if restart time <= 5Mins, there will not be any gaps in the 
   graphs, Prometheus will auto-fill the best known (last known scrape) values.
   2. If a series is marked stale, Prometheus fills the NaN value in the 
   TSDB for that series.
   3. Gaps in graphs mean that the target is unavailable or unreachable.

A few more questions on this subject:

   1. Is there a metric that gives us a hint about the number of stale 
   series?
   2. How do we know if a series is marked stale?
   3. Is it a good idea to adjust the query delta look-back CLI flag?
   4. Can I set a scraping interval of a job to 20 minutes? At the moment, 
   one can't adjust query delta look-back per scrape job.

On Thursday, January 20, 2022 at 5:45:49 PM UTC+1 Brian Candler wrote:

> On Thursday, 20 January 2022 at 11:45:46 UTC [email protected] wrote:
>
>> Thanks for the explanation, I thought staleness is applicable only to 
>> Prometheus Targets, haven't imagined this concept to Prometheus restarts 
>> and unavailability. So, you say 'statelessness' is also applied to 
>> Prometheus availability.
>
>
> No, I'm saying the opposite.
>
> If prometheus fails to scrape a metric which it scraped before in the same 
> scrape job, it inserts a staleness marker.  However if you stop and start 
> prometheus, then there is no staleness marker to write.
>
> Prometheus therefore falls back to its normal default behaviour, which is 
> to look back up to 5 minutes for the previous valid data point.
>
> > With this approach,  how do the users know the truth? Why did Prometheus 
> invoke query look-back? Is it due to Prometheus Target 
> unavailability/unreachability or Prometheus unavailability?
>
> None of those.  It's quite simply because time series consist of values at 
> particular points in time, e.g. X1 at T1, X2 and T2, X3 at T3, where Tn are 
> the exact times they were scraped.
>
> When you ask for the value of a timeseries at some arbitrary time T, there 
> is almost certainly not going to be any data point which exists at exactly 
> time T (it would be extremely unlikely).  Therefore, Prometheus defines the 
> value of a timeseries at time T to be the value of the *most recent data 
> point* at or before time T.  But it also constrains itself to looking back 
> no more than 5 minutes (this is tunable) so as not to expend an unlimited 
> amount of effort looking for a data point hours or even years earlier.
>
> Think about what happens when prometheus draws a graph.  It samples the 
> timeseries at a series of steps across a time window: say at time 01:00, 
> 01:30, 02:00, 02:30, 03:00 etc.  The start/end times and the size of the 
> steps will be determined by your graphing software and your screen 
> resolution.
>
> Now say you are scraping data points at 1 minute intervals, and points 
> were read in as X1 at 01:17, X2 at 02:18, X3 at 03:17.
>
> The graph will show:
> 01:00 - no data (no value within the previous 5 minutes)
> 01:30 - value is X1
> 02:00 - value is X1
> 02:30 - value is X2
> 03:00 - value is X2
> 03:30 - value is X3
>
> Note that a timeseries has no idea of what its "scrape interval" is, 
> because there isn't one.  Although *normally* they are scraped at *roughly* 
> regular intervals, nothing enforces this.  You could have a scrape job 
> running at 1m intervals, and then switch it to 15s intervals for a while, 
> and then switch it back to 1m intervals.  All the points will be saved in 
> the timeseries.  But if you shutdown prometheus, well, there's no way of 
> knowing this has occurred.  There will be a larger interval between scrapes 
> than "normal", but as far as prometheus knows, you might just have missed a 
> couple of scrapes, or increased the scrape interval for a little while.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/30ca6497-da0c-466c-b1d6-361dae2ee755n%40googlegroups.com.

[prometheus-users] Re: Why don't I see gaps in instance vectors if Prometheus itself is down by < 5 Mins

Reply via email to