Thanks. While I understand the limitations with a gauge, the objective here 
is to backport existing reports with the new backend, integrate and 
optimize later. There is a period of time we need to continue backward 
compatibility due to high barrier to change in clients. The time window 
used to calculate percentiles is biweekly or months, so taking the last/avg 
window within 1 minute (or few seconds in some cases) window is not too far 
fetched, and accepted by users.  In light of this, is there a reasonable 
approach to recreate histograms/summaries from existing metrics within 
Prometheus?



On Sunday, August 7, 2022 at 2:18:42 PM UTC-4 Stuart Clark wrote:

> On 07/08/2022 18:14, Johny wrote:
> > Gauge contains most recent values of a metric, sampled every 1 min or 
> > so, and exported by a user application, e.g. some latency sampled at 1 
> > minute intervals by a client application. Lets presume this time 
> > series (scraped by Prometheus or sent via remote write) is absolute 
> > containing all the information we need for calculating derived 
> > statistics. In the most raw form, you can fetch the data points, sort 
> > them and calculate percentile. Incidentally, legacy backend has 
> > efficient mechanisms to calculate percentiles by scanning and reducing 
> > data using map-reduce.
>
> I'm presuming there are more than one request/event every minute or so?
>
> If that is the case it would mean that you can't make a histogram that 
> shows what you actually want to know. While in theory you could look at 
> the 60 samples per hour and plot those on a histogram it would be pretty 
> meaningless. If we assumed 1 request per second, sampling the latest 
> latency value every minute would mean that 59/60 events are being 
> discarded - so you have no idea what is actually happening from looking 
> at that single sampled latency. Your samples could all be returning 
> "low" values, which makes you believe that everything is working fine, 
> but in actual fact the other 59 events per minute are "high" and you 
> would never know.
>
> This is the reason why histograms exist, and why more generally counters 
> are more useful than gauges. A gauge can only tell you about "now" which 
> may or may not be representative of what has actually been happening 
> since the last scrape. A counter however will tell you the absolute 
> change since the last scrape (e.g. the total number of requests since 
> the previous scrape, or the sum of the latencies of all events since the 
> scrape) meaning you never lose information (a counter that represents 
> total latency won't let you know if there was one spike or everything 
> was slow, but it will give you an average since the last scrape instead 
> of losing data).
>
> It would be worth understanding why you aren't able to produce a 
> histogram in the application (or externally via processing an event 
> feed, such as logs)? By design a simple histogram is pretty low impact, 
> being a set of counters for each bucket.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2c1915be-fb9f-4858-91ef-bdc22dcac675n%40googlegroups.com.

Reply via email to