Re: [prometheus-users] Recording rule displaying different results than ad-hoc querying

Julius Volz Thu, 23 Apr 2020 06:25:14 -0700

Strange! Have you tried a more recent Prometheus version, btw.? Just to
rule that part out, since 2.13.1 is pretty old...


On Thu, Apr 23, 2020 at 3:02 PM Per Lundberg <[email protected]> wrote:

> With global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m,
> there are more 60s spikes shown if I change to a 15s or 5s interval. With
> the other query (histogram_quantile(0.99, sum by
> (le)(rate(hbx_controller_action_seconds_bucket[1m])))), it still doesn't go
> above 1.2s, oddly enough.
> On 2020-04-23 15:38, Julius Volz wrote:
>
> Odd. Depending on time window alignment it can always be that some spikes
> might appear in one graph and not another, but such a big difference is
> strange. Just to make sure, what happens when you bring down the resolution
> on both queries to 15s (which is your rule evaluation interval) or lower?
>
> On Thu, Apr 23, 2020 at 12:59 PM Per Lundberg <[email protected]>
> wrote:
>
>> Hi,
>>
>> We have been using Prometheus (2.13.1) with one of our larger customer
>> installations for a while; thus far, it's been working great and we are
>> very thankful for the nice piece of software that it is. (We are a software
>> company ourselves, using Prometheus to monitor the health of both our own
>> application as well as many other relevant parts of the services involved).
>> Because of the volume of metrics for some of our metrics, we have a number
>> of recording rules set up, to make querying of this data reasonable from
>> e.g. Grafana.
>>
>> However, today we started some really strange behavior after a planned
>> restart on one of the Tomcat-based application services we are monitoring.
>> Some requests *seems* to be peaking at 60s (indicating a problem in our
>> application backend), but the strange thing here is that our recording
>> rules produce very different results than just running the same queries in
>> the Prometheus console.
>>
>> Here is how the recording rule has been defined in a
>> custom_recording_rules.yml file:
>>
>>   - name: hbx_controller_action_global
>>     rules:
>>       - record:
>> global:hbx_controller_action_seconds:histogram_quantile_50p_rate_1m
>>         expr: histogram_quantile(0.5, sum by
>> (le)(rate(hbx_controller_action_seconds_bucket[1m])))
>>       - record:
>> global:hbx_controller_action_seconds:histogram_quantile_75p_rate_1m
>>         expr: histogram_quantile(0.75, sum by
>> (le)(rate(hbx_controller_action_seconds_bucket[1m])))
>>       - record:
>> global:hbx_controller_action_seconds:histogram_quantile_95p_rate_1m
>>         expr: histogram_quantile(0.95, sum by
>> (le)(rate(hbx_controller_action_seconds_bucket[1m])))
>>       - record:
>> global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m
>>         expr: histogram_quantile(0.99, sum by
>> (le)(rate(hbx_controller_action_seconds_bucket[1m])))
>>
>> Querying
>> global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m
>> yields an output like this:
>>
>>
>> However, running the individual query gives a completely different view
>> of this data. Note how the 60-second peaks are completely gone in this
>> screenshot:
>>
>>
>> I don't really know what to make out of this. Are we doing something
>> fundamentally wrong here in how our recording rules are set up, or could
>> this be a bug in Prometheus (unlikely)? Btw, we have the
>> evaluation_interval set to 15s globally.
>>
>> Thanks in advance.
>>
>> Best regards,
>> Per
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/2d75ca0f-a24f-42e4-beb8-2ee88e04acdf%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/2d75ca0f-a24f-42e4-beb8-2ee88e04acdf%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CA%2BT6YowWCuW%3D3E9BUc2%2Bts1Pm33HKpGV%3DAvbdPuonq8P2rcLfg%40mail.gmail.com.

Re: [prometheus-users] Recording rule displaying different results than ad-hoc querying

Reply via email to