[prometheus-users] Re: Measuring occurrences where a certain threshold was exceeded

Brian Candler Thu, 26 Mar 2020 11:08:21 -0700

On Thursday, 26 March 2020 17:47:04 UTC, Chris Featherstone wrote:
>
> I have this query
>
> quantile_over_time(0.90, 
> replication_read_duration_seconds{job="heartbeat-read"}[5m]) < .005 != bool 
> 1
>


Obviously any value which is less than 0.005 is not equal to 1, so this 
will always return 1 or nothing.

It sounds like what you're trying to do here is:

quantile_over_time(0.90, 
replication_read_duration_seconds{job="heartbeat-read"}[5m]) < bool .005
 
which will return 0 or 1.

But I don't think this will solve your problem very well:


> I am trying to measure the times when my duration is greater than 5ms and 
> then report a percentage. Something like: Over the last 5 minutes, 99.9% of 
> requests were below 5ms.
>

replication_read_duration_seconds is a gauge? How often does it change?

If you want to report that 999 in 1000 requests were below 5ms, then you 
need at least 1000 samples, and if that's over a 5 minute period you must 
be scraping more than 3 times per second.  That's not really how prometheus 
is supposed to be used.

It sounds like what you really want is to collect these events in a 
histogram <https://prometheus.io/docs/concepts/metric_types/#histogram>, 
then report on the histogram.  But that means changing how you collect the 
data in the first place.

As a simple way to think about a histogram, imagine you have two counters:
- A counts the total events
- B counts only the events with latency < 5ms

If you take the increase of B over 5 minutes, divided by the increase in A 
over 5 minutes, that gives you the fraction you're looking for.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0799f1f1-8580-468e-829f-630c996a4c92%40googlegroups.com.

[prometheus-users] Re: Measuring occurrences where a certain threshold was exceeded

Reply via email to