On Wed, 24 Nov 2021 at 14:52, Darshan Chaudhary <[email protected]> wrote:
> During query evaluation, Prometheus tracks the current samples held in > memory at evaluator.currentSamples > <https://github.com/prometheus/prometheus/blob/f0003bc0ba77fca5ed4c1fe30337beea85dd95d1/promql/engine.go#L871>. > This might be a good proxy for the "work" that Prometheus had to do to get > the query result? > That's memory usage, not work done. There was https://github.com/prometheus/prometheus/pull/6890 to track samples touched which should be a good proxy (I use 10M/s as my rule of thumb), waiting to make sure the performance hit is negligable. Brian > > On Wednesday, 24 November 2021 at 18:25:46 UTC+5:30 [email protected] > wrote: > >> Hello all, >> >> *TL;DR: *measuring `http_request_duration_seconds` on the query path is >> a bad proxy for query latency as it does not account for data distribution >> and number of samples/series touched by a query (both of which have >> significant implications on the performance of a query) >> >> --- >> >> I'm exploring more granular performance metrics for prom queries >> <https://github.com/thanos-io/thanos/issues/4895> downstream in Thanos >> (inspired by this discussion from Ian Billet >> <https://github.com/thanos-io/thanos/discussions/4674>) and wanted to >> reach out to the Prometheus developer community for ideas on how people are >> measuring and tracking query performance systematically. >> >> The aim is to create a new metric that captures these additional >> dimensions with respect to the query to better understand/quantify query >> performance SLI's with respect to number of samples/series touched >> *before* a query is executed. >> >> The current solution I have arrived at is crude n-dimensional histogram, >> where query_duration is observed/bucketed with labels representing some >> scale (simplified to t-shirt sizes) of samples touched and series queried. >> This would allow me to query for query_duration quantiles for some ranges >> of sample/series sizes (e.g. 90% of queries for up to 1,000,000 samples and >> up to 10 series complete in less than 2s) >> >> I would love to hear about other approaches members of the community have >> taken for capturing this level of performance granularity in a metric (as >> well as stir the pot wrt the thanos proposal). >> >> Thanks, >> >> Moad. > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-developers/a392a3c9-21b0-4174-9219-53cda79de0f1n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-developers/a392a3c9-21b0-4174-9219-53cda79de0f1n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLpB-Vr%2BbZEwAuKjSkJZFgu3NPj4nBRU5rhqw3HsHUjGwg%40mail.gmail.com.

