Re: [prometheus-users] Beginner Question to clarify some core concepts

'vivapolonium' via Prometheus Users Sun, 19 Apr 2020 04:16:21 -0700

Hey Stuart,

thanks for your help, your provided link helped solving the issue. My 
metrics now make more sens. Thank you!


Am Samstag, 18. April 2020 14:45:00 UTC+2 schrieb Stuart Clark:
>
> On 18/04/2020 13:18, 'vivapolonium' via Prometheus Users wrote: 
> > Hey everyone, 
> > 
> > I'm failry new to prometheus and trying to wrap my head around some 
> > concepts which are not really clear to me. 
> > 
> > I'm running a Scala-Application with the official Prometheus Java 
> > client. I'm trying to measure the performance of http endpoints and 
> > use a `Summary` for that. I implemented an endpoint where I serve the 
> > Metrics via an internal andpoint by taking the `TextFormat.write004` 
> > method and serving it by myself (not via the included HTTPServlet). 
> > 
> > I've setup a Prometheus instance querying that endpoint every 15s and 
> > set the maxAge of the Summary also to 15s. Now I have a PromQL-Query 
> > like this: `sum 
> > 
> by(route)(requests_latency_seconds_sum/requests_latency_seconds_count)*1000`, 
>
> > which should give me the average response-time of an endpoint in 
> > milliseconds for each scrape-interval 
> > 
> > When rendering the data though, I get some kind of weirdly aggregated 
> > data points which is probably a mixture of bad settings and 
> > misunderstanding. Take this metric for example: 
> > 
> > ``` 
> > requests_latency_seconds_count{route="library.get",} 83.0 
> > requests_latency_seconds_sum{route="library.get",} 949.2774687769999 
> > ``` 
> > 
> > This summary does not reset after 15s, instead it keeps accumulating 
> > all the data which makes it useless to pin-point timebased anomalies 
> > in my application. 
> > 
> That isn't a summary (that would have quantile labels), or at least the 
> bit you are showing doesn't cover that. 
>
> Normal counters don't reset except when the application restarts. Within 
> PromQL there is the rate() function which allows you to see spikes in 
> latency over time. 
>
> So try to add rate() as described at 
> https://www.robustperception.io/rate-then-sum-never-sum-then-rate 
>
> Generally I don't use summaries and instead use histograms. Summaries 
> aren't aggregatable (for example if you run multiple instances) or 
> adjustable within Prometheus. With histograms you can aggregate and 
> calculate percentiles over any range. 
>
> > I digged into the sourcecode of the java library and did not find a 
> > way to reset the values to zero or remove them after scraping them. Is 
> > this intentionally? Did I miss something in my configuration? Also, as 
> > I understood it, the summary is supposed to reset itself? 
> > 
> > Hope someone can give me some hints how to solve this 
> > 
>
> -- 
> Stuart Clark 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a2807c30-03e2-46db-8ab4-840c73b701a8%40googlegroups.com.

Re: [prometheus-users] Beginner Question to clarify some core concepts

Reply via email to