Thanks to both of you. If we can implement the operations (> and <) on native histogram, that would be great snd solve the problem and possibly open up even more possibilities than just trimmed mean.
Any pointers as to how this can or should be implemented? I’ll file a feature request at the link provided. On Wed, Aug 7, 2024 at 3:34 AM Bjoern Rabenstein <bjo...@rabenste.in> wrote: > On 02.08.24 21:21, Jacques Bernier wrote: > > Tl;Dr; I'd like to implement histogram_trimmed_mean > > > > "A truncated mean or trimmed mean is a statistical measure of central > > tendency, much like the mean and median. It involves the calculation of > the > > mean after discarding given parts of a probability distribution or > sample > > at the high and low end, and typically discarding an equal amount of > both. > > This number of points to be discarded is usually given as a percentage > of > > the total number of points, but may also be given as a fixed number of > > points." https://en.wikipedia.org/wiki/Truncated_mean > > I vaguely remember that we discussed this before, but I cannot find > any reference to it right now. > > Given that native histograms are so much better, I would focus on > implementing this for native histograms. (Trimming will almost always > involve interpolation, and that just gets horribly wrong with the low > resolution usually provided by classic histograms.) > > My memory from the previous discussion was to "simply" implement the > `>` and `<` operator between native histograms and scalars/floats. It > would return a new histogram with all the observations below or above > the threshold given by the scalar removed. (This will be an estimate > in most cases, but given the generally high resolution of native > histograms, that's OK.) > > In that way, you can already "trim" a histogram at a given > threshold. This will return a histogram containing only requests that > lasted longer than 100ms: > > request_duration_seconds > 0.1 > > You can then combine this with other PromQL expressions to implement > trimming at percantages. The following will exclude the 25% shortest > requests: > > request_duration_seconds > histogram_quantile(0.25, > request_duration_seconds) > > From there, you can use all the other tools to do something with the > returned histogram, e.g. calculating a mean or a median or whatever > you want. > > -- > Björn Rabenstein > [PGP-ID] 0x851C3DA17D748D03 > [email] bjo...@rabenste.in > -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CACZaohJw3xYvhn6Uiida_3rOVU-9tQX%3DCMSBhobvsXaeg6S1CA%40mail.gmail.com.