On Saturday, 13 May 2023 at 03:26:18 UTC+1 Christoph Anton Mitterer wrote:

  (If there is jitter in the sampling time, then occasionally it might look 
at 4 or 6 samples)


Jitter in the sense that the samples are taken at slightly different times?


Yes. Each sample is timestamped with the time the scrape took place.

Consider a 5 minute window which contains generally contains 5 samples at 1 
minute intervals:

   |...*......*......*......*......*....|...*....

Now consider what happens when one of those samples is right on the 
boundary of the window:

   |*......*......*......*......*.......|*.......

Depending on the exact timings that the scrape takes place, it's possible 
that the first sample could fall outside:

   *|......*......*......*......*.......|*.......

Or the next sample could fall inside:

   |*......*......*......*......*......*|.......

 

Do you think that could affect the desired behaviour?


In my experience, the scraping regularity of Prometheus is very good (just 
try putting "up[5m]" into the PromQL browser and looking at the timestamps 
of the samples, they seem to increment in exact intervals).  Oo it's 
unlikely to happen much, and it might when the system is under high load, I 
guess.  Or it might never happen, if Prometheus writes the timestamps of 
the times it *wanted* to make the scrape, not when it actually occurred.  
Determining that would require looking in source code.
 

Another point I basically don't understand... how does all that relate to 
the scrap intervals?
The plain up == 0 simply looks at the most recent sample (going back up to 
5m as you've said in the other thread).

The series up[Ns] looks back N seconds, giving whichever samples are within 
there and now. AFAIU, there it doesn't go "automatically" back any further 
(like the 5m above), right?


That's correct.

So if you're trying to make mutual expressions which fire in case A but not 
B, and case B but not A, then you'd probably be better off writing then to 
both use up[5m].

min_over_time(up[5m]) == 0    # use this instead of "up == 0  // for: 5m" 
for the main alert.

 


In order for the for: to work I need at least two samples


No, you just need two rule evaluations. The rule evaluation interval 
doesn't have to be the same as the scrape interval, and even if they are 
the same, they are not synchronized.


If what I've written above is correct (and it may well not be!), then

expr: up == 0
for: 5m

will fire if "up" is zero for 6 cycles, whereas


(*rule evaluation* cycles, if your rule evaluation interval is 1m)
 


As far as I understand you... 6 cycles of rule evaluation interval... with 
at least two samples within that interval, right?


No.  The expression "up" is evaluated at each rule evaluation time, and it 
gives the most recent value of "up", looking back up to 5 minutes.

So if you had a scrape interval of 2 minutes, with a rule evaluation 
interval of 1 minute it could be that two rule evaluations of "up" see the 
same scraped value.

(This can also happen in real life with a 1 minute scrape interval, if you 
have a failed scrape)

 

Once an alert fires (in prometheus), even i just for one evaluation 
interval cycle.... and there is no inhibiton rule or so in alertmanager... 
is it expected that a notification is sent out for sure,... regardless of 
alertmanagers grouping settings?


There is group_wait. If the alert were to trigger and clear within the 
group_wait interval, I'd expect no alert to be sent. But I've not tested 
that.
 

Like when the alert fires for one short 15s evaluation interval and clears 
again afterwards,... but group_wait: is set to some 7d ... is it expected 
to send that singe firing event after 7d, even if it has resolved already 
once the 7d are over and there was .g. no further firing in between?


You'll need to test it, but my expectation would be that it wouldn't send 
*anything* for 7 days (while it waits for other similar alerts to appear), 
and if all alerts have disappeared within that period, that nothing would 
be sent.  However, I don't know if the 7 day clock resets as soon as all 
alerts go away, or it continues to tick.  If this matters to you, then test 
it.

Nobody in their right might would use 7d for group_wait of course.  
Typically you might set it to around a minute, so that if a bunch of 
similar alerts fire within that 1 minute period, they are gathered together 
into a single notification rather than a slew of separate notifications.

HTH,

Brian.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c2aa08ef-e27d-4c3c-b364-8a064d0fc7d0n%40googlegroups.com.

Reply via email to