Thanks for your responses. I am using grok exporter for parsing logs to
convert them into prometheus metrics.
grok exporter service won't restart very often but it could still happen.
Here's the query that I am using to calculate the egress data by different
users in the selected time range in Grafana:
sum by (category, user)(increase(user_egress_total{job="test",
user=~"$user", category=~"$category"}[$__range]))
Cardinality is too high for the metric, so I end up getting ~1200
timeseries. After applying the step and changing [$__range] to
[$__range:1m], I was able to make it work for ~15 days.
I understand that high cardinality metrics are not recommended for
Prometheus. But, I am wondering if there is a better way of implementing it
in Prometheus either using a different exporter or by rewriting the query.
Appreciate your inputs. Thanks!
On Thursday, April 30, 2020 at 11:51:43 PM UTC-7, Brian Brazil wrote:
>
> On Fri, 1 May 2020 at 07:46, Christian Hoffmann <
> [email protected] <javascript:>> wrote:
>
>> Hi,
>>
>> On 5/1/20 3:57 AM, O wrote:
>> > I am using increase() function to calculate the increase in counter
>> > value over a time period. These values are being displayed in a table in
>> > Grafana. But, for a duration of 15 days or so, it errors out because the
>> > number of samples that are being pulled is too high and the limit
>> > for |--query.max-samples| flag is crossed.
>> >
>> > So, my question is if there is a better way to calculate the increase in
>> > counter and display it in the Grafana table without pulling so many
>> > labels from Prometheus.
>>
>> increase() tries to detect counter resets. In order for this to work,
>> each data point has to be considered (at least I assume that this is the
>> case). I don't see a a way around this.
>>
>
> You're correct.
>
>
>>
>> If you know for sure that your counter does not reset (at least in the
>> timeframe you are interested in), you might achieve what you want by a
>> simple substraction which should be less resource-intensive:
>>
>> your_metric - your_metric offset 14d
>>
>> Of course, you can also increase the max-samples value. It is primarily
>> there as a safeguard against high resource usage (i.e. you might need
>> more RAM and longer processing times).
>>
>
> Even with 15d of data at a 1s interval, that's only 1.3M samples that need
> to be in memory at a time to calculate the rate() - so it's not the rate()
> function that's the issue here.
>
> --
> Brian Brazil
> www.robustperception.io
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/4d076cee-9ac9-4b0d-8247-83c2bceb0ff3%40googlegroups.com.