The other way to solve this is to use recording rules to pre-summarize the
data.

For example:

groups:
- name: User Egress
  interval: 5m
  - record: category_user:user_egress_total:increase5m
    expr: sum by (category, user)
(increase(user_egress_total{job="test"}[5m]))

With this, you can now summarize with fewer samples over longer periods of
time.

sum_over_time(category_user:user_egress_total:increase5m{user=~"$user",
category=~"$category"}[$__range])

On Fri, May 1, 2020 at 10:16 AM O <[email protected]> wrote:

> Thanks for your responses. I am using grok exporter for parsing logs to
> convert them into prometheus metrics.
> grok exporter service won't restart very often but it could still happen.
>
> Here's the query that I am using to calculate the egress data by different
> users in the selected time range in Grafana:
> sum by (category, user)(increase(user_egress_total{job="test",
> user=~"$user", category=~"$category"}[$__range]))
>
> Cardinality is too high for the metric, so I end up getting ~1200
> timeseries. After applying the step and changing [$__range] to
> [$__range:1m], I was able to make it work for ~15 days.
> I understand that high cardinality metrics are not recommended for
> Prometheus. But, I am wondering if there is a better way of implementing it
> in Prometheus either using a different exporter or by rewriting the query.
>
> Appreciate your inputs. Thanks!
>
>
> On Thursday, April 30, 2020 at 11:51:43 PM UTC-7, Brian Brazil wrote:
>>
>> On Fri, 1 May 2020 at 07:46, Christian Hoffmann <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> On 5/1/20 3:57 AM, O wrote:
>>> > I am using increase() function to calculate the increase in counter
>>> > value over a time period. These values are being displayed in a table
>>> in
>>> > Grafana. But, for a duration of 15 days or so, it errors out because
>>> the
>>> > number of samples that are being pulled is too high and the limit
>>> > for |--query.max-samples| flag is crossed.
>>> >
>>> > So, my question is if there is a better way to calculate the increase
>>> in
>>> > counter and display it in the Grafana table without pulling so many
>>> > labels from Prometheus.
>>>
>>> increase() tries to detect counter resets. In order for this to work,
>>> each data point has to be considered (at least I assume that this is the
>>> case). I don't see a a way around this.
>>>
>>
>> You're correct.
>>
>>
>>>
>>> If you know for sure that your counter does not reset (at least in the
>>> timeframe you are interested in), you might achieve what you want by a
>>> simple substraction which should be less resource-intensive:
>>>
>>> your_metric - your_metric offset 14d
>>>
>>> Of course, you can also increase the max-samples value. It is primarily
>>> there as a safeguard against high resource usage (i.e. you might need
>>> more RAM and longer processing times).
>>>
>>
>> Even with 15d of data at a 1s interval, that's only 1.3M samples that
>> need to be in memory at a time to calculate the rate() - so it's not the
>> rate() function that's the issue here.
>>
>> --
>> Brian Brazil
>> www.robustperception.io
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/4d076cee-9ac9-4b0d-8247-83c2bceb0ff3%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/4d076cee-9ac9-4b0d-8247-83c2bceb0ff3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmp2T-JYy5NT5YqYSa4BMbLxpviwJgxSOx%2BmY31g5ZKqkQ%40mail.gmail.com.

Reply via email to