[prometheus-users] Re: Unexpected Spikes in data recorded using Recording Rules.

'Brian Candler' via Prometheus Users Fri, 14 Mar 2025 09:24:12 -0700

*Sum of envoy_cluster_upstream_rq* is not a useful query, because  when 
timeseries come and go the sum jumps down and up (as you can see). You 
can't do anything with this.


Instead you need to sum(increase(...)), but that's what you're already 
doing.

If you select a time range that doesn't include a spike, do the two graphs 
look the same? If they do, then maybe there's some odd timing issue, e.g. 
your grafana/thanos graphs are at such a resolution that you're skipping 
over the spikes (if this was the problem, I'd suggest refreshing the page 
every 10 seconds, for 5 or 10 minutes, and see if any spikes come and go).

Otherwise, you could look separately at the graphs of
increase(envoy_cluster_upstream_rq)
sum(increase(envoy_cluster_upstream_rq))

Or maybe it's something to do with Thanos and recording rules.

Sorry, I can't think of anything more than that.

On Friday, 14 March 2025 at 14:45:01 UTC Kishore Kumar wrote:

> Hi Brian,
>       I hope you have a good day. I humbly request to take a look at the 
> above attached graphs and reply regarding the same. 
> Apologies and Thank You,
> Kishore. 
>
> On Wednesday, March 12, 2025 at 7:31:46 PM UTC+5:30 Kishore Kumar wrote:
>
>> Hi Brian,
>>         We have used Thanos Query UI to query the graph, and we observe 
>> the same graph that we observed in Grafana. We use the following recording 
>> rules, albeit with a different name, and without hidden information. 
>> *-record: rest-server-recording-rule*
>> * expr: 
>> sum(increase(envoy_cluster_upstream_rq{kubernetes_namespace=~".*<hidden>.*", 
>> kubernetes_pod_name=~"rest-.*", envoy_cluster_name=~"<hidden>"}[3m])/3) by 
>> (kubernetes_namespace,kubernetes_container_name,envoy_cluster_name)*
>>
>> The source metric here, envoy_cluster_upstream_rq is not actually 
>> monotonically increasing graph, and there are counter resets happening. 
>> Attaching the images below.
>> *envoy_cluster_upstream_rq: *
>> [image: 2.png]
>>
>> *Sum of envoy_cluster_upstream_rq: *
>> [image: 1.png]
>>
>> *Actual Query: *
>>
>> *[image: 3.png]*
>>
>> *Recording Rule: *
>>
>> *[image: 4.png]*
>>
>> Even if it is not a monotonically increasing graph, new spikes should not 
>> have been created in the recording rule as we don't see them in the actual 
>> query result. 
>>
>> We would like to know if we are supposed to change any parameters related 
>> to recording rules, to make them match as close as possible. 
>>
>> Thanks for the response,
>> Have a nice day.
>> Kishore
>>
>>
>> On Wednesday, March 12, 2025 at 12:33:26 AM UTC+5:30 Brian Candler wrote:
>>
>>> To more easily debug your issue, please take Grafana out of the 
>>> equation, as it has its own foibles. To do this, use the PromQL web browser 
>>> to formulate a query within the PromQL web interface.
>>>
>>> Then, show if there's a difference between the results: if there is, 
>>> show the exact query you're giving to PromQL and the exact definition of 
>>> the recording rule.  Show both graphs, and highlight the differences.
>>>
>>> My *guess* is it's something to do with detected counter resets, i.e. 
>>> example_metrics is not increasing monotonically.  You can formulate queries 
>>> to detect this.
>>>
>>> On Tuesday, 11 March 2025 at 13:48:54 UTC Kishore Kumar wrote:
>>>
>>>> Hi Prometheus users,
>>>>           We are having a PromQL query and a Recording Rule that 
>>>> records the PromQL, like the example given below.
>>>>
>>>> *-record: rest-server-recording-rule*
>>>> * expr: sum(increase(example_metric[1m])) by 
>>>> (kubernetes_container_name)*
>>>>
>>>> The Recording Rule Scrape interval and Evaluation interval are both 30 
>>>> seconds, and set in Prometheus configuration. 
>>>>
>>>> We are seeing unexpected spikes in the data recorded by the recording 
>>>> rule, whereas this *unexpected spike* is not present in the source 
>>>> expression, like shown in the below graph (Used Grafana for comparison).
>>>>
>>>> Can we know the reason why this unexpected spike is being created by 
>>>> the recording rule? We would like to know the explanation of how recording 
>>>> rule captures the data of a query.
>>>>
>>>> Thanks for reading this message, have a great day.
>>>>
>>>> *Sum(increase) RawQuery* - data produced when we query the raw 
>>>> *expression* directly.
>>>> *Recording Rules* - Data captured by the recording rule.
>>>> [image: image-2025-3-10_18-52-48.png]
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/1f92dd40-c562-4403-a1ae-7779770527d4n%40googlegroups.com.

[prometheus-users] Re: Unexpected Spikes in data recorded using Recording Rules.

Reply via email to