Strange! Have you tried a more recent Prometheus version, btw.? Just to rule that part out, since 2.13.1 is pretty old...
On Thu, Apr 23, 2020 at 3:02 PM Per Lundberg <[email protected]> wrote: > With global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m, > there are more 60s spikes shown if I change to a 15s or 5s interval. With > the other query (histogram_quantile(0.99, sum by > (le)(rate(hbx_controller_action_seconds_bucket[1m])))), it still doesn't go > above 1.2s, oddly enough. > On 2020-04-23 15:38, Julius Volz wrote: > > Odd. Depending on time window alignment it can always be that some spikes > might appear in one graph and not another, but such a big difference is > strange. Just to make sure, what happens when you bring down the resolution > on both queries to 15s (which is your rule evaluation interval) or lower? > > On Thu, Apr 23, 2020 at 12:59 PM Per Lundberg <[email protected]> > wrote: > >> Hi, >> >> We have been using Prometheus (2.13.1) with one of our larger customer >> installations for a while; thus far, it's been working great and we are >> very thankful for the nice piece of software that it is. (We are a software >> company ourselves, using Prometheus to monitor the health of both our own >> application as well as many other relevant parts of the services involved). >> Because of the volume of metrics for some of our metrics, we have a number >> of recording rules set up, to make querying of this data reasonable from >> e.g. Grafana. >> >> However, today we started some really strange behavior after a planned >> restart on one of the Tomcat-based application services we are monitoring. >> Some requests *seems* to be peaking at 60s (indicating a problem in our >> application backend), but the strange thing here is that our recording >> rules produce very different results than just running the same queries in >> the Prometheus console. >> >> Here is how the recording rule has been defined in a >> custom_recording_rules.yml file: >> >> - name: hbx_controller_action_global >> rules: >> - record: >> global:hbx_controller_action_seconds:histogram_quantile_50p_rate_1m >> expr: histogram_quantile(0.5, sum by >> (le)(rate(hbx_controller_action_seconds_bucket[1m]))) >> - record: >> global:hbx_controller_action_seconds:histogram_quantile_75p_rate_1m >> expr: histogram_quantile(0.75, sum by >> (le)(rate(hbx_controller_action_seconds_bucket[1m]))) >> - record: >> global:hbx_controller_action_seconds:histogram_quantile_95p_rate_1m >> expr: histogram_quantile(0.95, sum by >> (le)(rate(hbx_controller_action_seconds_bucket[1m]))) >> - record: >> global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m >> expr: histogram_quantile(0.99, sum by >> (le)(rate(hbx_controller_action_seconds_bucket[1m]))) >> >> Querying >> global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m >> yields an output like this: >> >> >> However, running the individual query gives a completely different view >> of this data. Note how the 60-second peaks are completely gone in this >> screenshot: >> >> >> I don't really know what to make out of this. Are we doing something >> fundamentally wrong here in how our recording rules are set up, or could >> this be a bug in Prometheus (unlikely)? Btw, we have the >> evaluation_interval set to 15s globally. >> >> Thanks in advance. >> >> Best regards, >> Per >> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/2d75ca0f-a24f-42e4-beb8-2ee88e04acdf%40googlegroups.com >> <https://groups.google.com/d/msgid/prometheus-users/2d75ca0f-a24f-42e4-beb8-2ee88e04acdf%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CA%2BT6YowWCuW%3D3E9BUc2%2Bts1Pm33HKpGV%3DAvbdPuonq8P2rcLfg%40mail.gmail.com.

