Re: [prometheus-users] Recording rule displaying different results than ad-hoc querying

Per Lundberg Thu, 23 Apr 2020 06:03:15 -0700

Withglobal:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m,there are more 60s spikes shown if I change to a 15s or 5s interval.With the other query (histogram_quantile(0.99, sum by(le)(rate(hbx_controller_action_seconds_bucket[1m])))), it still doesn'tgo above 1.2s, oddly enough.


On 2020-04-23 15:38, Julius Volz wrote:

Odd. Depending on time window alignment it can always be that somespikes might appear in one graph and not another, but such a bigdifference is strange. Just to make sure, what happens when you bringdown the resolution on both queries to 15s (which is your ruleevaluation interval) or lower?

On Thu, Apr 23, 2020 at 12:59 PM Per Lundberg <[email protected]<mailto:[email protected]>> wrote:


    Hi,

    We have been using Prometheus (2.13.1) with one of our larger
    customer installations for a while; thus far, it's been working
    great and we are very thankful for the nice piece of software that
    it is. (We are a software company ourselves, using Prometheus to
    monitor the health of both our own application as well as many
    other relevant parts of the services involved). Because of the
    volume of metrics for some of our metrics, we have a number of
    recording rules set up, to make querying of this data reasonable
    from e.g. Grafana.

    However, today we started some really strange behavior after a
    planned restart on one of the Tomcat-based application services we
    are monitoring. Some requests /seems/ to be peaking at 60s
    (indicating a problem in our application backend), but the strange
    thing here is that our recording rules produce very different
    results than just running the same queries in the Prometheus console.

    Here is how the recording rule has been defined in a
    custom_recording_rules.yml file:

      - name: hbx_controller_action_global
        rules:
          - record:
    global:hbx_controller_action_seconds:histogram_quantile_50p_rate_1m
            expr: histogram_quantile(0.5, sum by
    (le)(rate(hbx_controller_action_seconds_bucket[1m])))
          - record:
    global:hbx_controller_action_seconds:histogram_quantile_75p_rate_1m
            expr: histogram_quantile(0.75, sum by
    (le)(rate(hbx_controller_action_seconds_bucket[1m])))
          - record:
    global:hbx_controller_action_seconds:histogram_quantile_95p_rate_1m
            expr: histogram_quantile(0.95, sum by
    (le)(rate(hbx_controller_action_seconds_bucket[1m])))
          - record:
    global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m
            expr: histogram_quantile(0.99, sum by
    (le)(rate(hbx_controller_action_seconds_bucket[1m])))

    Querying
    global:hbx_controller_action_seconds:histogram_quantile_99p_rate_1m
    yields an output like this:


    However, running the individual query gives a completely different
    view of this data. Note how the 60-second peaks are completely
    gone in this screenshot:


    I don't really know what to make out of this. Are we doing
    something fundamentally wrong here in how our recording rules are
    set up, or could this be a bug in Prometheus (unlikely)? Btw, we
    have the evaluation_interval set to 15s globally.

    Thanks in advance.

    Best regards,
    Per

--You received this message because you are subscribed to the Google

    Groups "Prometheus Users" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/prometheus-users/2d75ca0f-a24f-42e4-beb8-2ee88e04acdf%40googlegroups.com
    
<https://groups.google.com/d/msgid/prometheus-users/2d75ca0f-a24f-42e4-beb8-2ee88e04acdf%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/53d723c0-708a-7c14-8b82-80a68802612c%40hibox.tv.

Re: [prometheus-users] Recording rule displaying different results than ad-hoc querying

Reply via email to