On Thursday, 26 March 2020 17:47:04 UTC, Chris Featherstone wrote:
>
> I have this query
>
> quantile_over_time(0.90,
> replication_read_duration_seconds{job="heartbeat-read"}[5m]) < .005 != bool
> 1
>
Obviously any value which is less than 0.005 is not equal to 1, so this
will always return 1 or nothing.
It sounds like what you're trying to do here is:
quantile_over_time(0.90,
replication_read_duration_seconds{job="heartbeat-read"}[5m]) < bool .005
which will return 0 or 1.
But I don't think this will solve your problem very well:
> I am trying to measure the times when my duration is greater than 5ms and
> then report a percentage. Something like: Over the last 5 minutes, 99.9% of
> requests were below 5ms.
>
replication_read_duration_seconds is a gauge? How often does it change?
If you want to report that 999 in 1000 requests were below 5ms, then you
need at least 1000 samples, and if that's over a 5 minute period you must
be scraping more than 3 times per second. That's not really how prometheus
is supposed to be used.
It sounds like what you really want is to collect these events in a
histogram <https://prometheus.io/docs/concepts/metric_types/#histogram>,
then report on the histogram. But that means changing how you collect the
data in the first place.
As a simple way to think about a histogram, imagine you have two counters:
- A counts the total events
- B counts only the events with latency < 5ms
If you take the increase of B over 5 minutes, divided by the increase in A
over 5 minutes, that gives you the fraction you're looking for.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/0799f1f1-8580-468e-829f-630c996a4c92%40googlegroups.com.