Thanks Brian,

'replication_read_duration_seconds' is a gauge and it updates every time 
this exporter is hit by prometheus (30s right now). I was trying to see if 
I could somehow make our current metrics work, but its pretty clear that I 
need the total count (and histogram).

The current metrics report duration on only this individual scrape. So to 
get anything meaningful I need to know the results of every attempt. Kind 
of like two buckets, 0-5ms, and 5ms-Inf.

On Thursday, March 26, 2020 at 12:08:09 PM UTC-6, Brian Candler wrote:
>
> On Thursday, 26 March 2020 17:47:04 UTC, Chris Featherstone wrote:
>>
>> I have this query
>>
>> quantile_over_time(0.90, 
>> replication_read_duration_seconds{job="heartbeat-read"}[5m]) < .005 != bool 
>> 1
>>
>
> Obviously any value which is less than 0.005 is not equal to 1, so this 
> will always return 1 or nothing.
>
> It sounds like what you're trying to do here is:
>
> quantile_over_time(0.90, 
> replication_read_duration_seconds{job="heartbeat-read"}[5m]) < bool .005
>  
> which will return 0 or 1.
>
> But I don't think this will solve your problem very well:
>
>
>> I am trying to measure the times when my duration is greater than 5ms and 
>> then report a percentage. Something like: Over the last 5 minutes, 99.9% of 
>> requests were below 5ms.
>>
>
> replication_read_duration_seconds is a gauge? How often does it change?
>
> If you want to report that 999 in 1000 requests were below 5ms, then you 
> need at least 1000 samples, and if that's over a 5 minute period you must 
> be scraping more than 3 times per second.  That's not really how prometheus 
> is supposed to be used.
>
> It sounds like what you really want is to collect these events in a 
> histogram <https://prometheus.io/docs/concepts/metric_types/#histogram>, 
> then report on the histogram.  But that means changing how you collect the 
> data in the first place.
>
> As a simple way to think about a histogram, imagine you have two counters:
> - A counts the total events
> - B counts only the events with latency < 5ms
>
> If you take the increase of B over 5 minutes, divided by the increase in A 
> over 5 minutes, that gives you the fraction you're looking for.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f6529079-531b-4827-9ad3-fc84b509c12b%40googlegroups.com.

Reply via email to