Thanks for your responses. I am using grok exporter for parsing logs to 
convert them into prometheus metrics.
grok exporter service won't restart very often but it could still happen. 

Here's the query that I am using to calculate the egress data by different 
users in the selected time range in Grafana:
sum by (category, user)(increase(user_egress_total{job="test", 
user=~"$user", category=~"$category"}[$__range]))

Cardinality is too high for the metric, so I end up getting ~1200 
timeseries. After applying the step and changing [$__range] to 
[$__range:1m], I was able to make it work for ~15 days.
I understand that high cardinality metrics are not recommended for 
Prometheus. But, I am wondering if there is a better way of implementing it 
in Prometheus either using a different exporter or by rewriting the query. 

Appreciate your inputs. Thanks!


On Thursday, April 30, 2020 at 11:51:43 PM UTC-7, Brian Brazil wrote:
>
> On Fri, 1 May 2020 at 07:46, Christian Hoffmann <
> [email protected] <javascript:>> wrote:
>
>> Hi,
>>
>> On 5/1/20 3:57 AM, O wrote:
>> > I am using increase() function to calculate the increase in counter
>> > value over a time period. These values are being displayed in a table in
>> > Grafana. But, for a duration of 15 days or so, it errors out because the
>> > number of samples that are being pulled is too high and the limit
>> > for |--query.max-samples| flag is crossed. 
>> > 
>> > So, my question is if there is a better way to calculate the increase in
>> > counter and display it in the Grafana table without pulling so many
>> > labels from Prometheus.
>>
>> increase() tries to detect counter resets. In order for this to work,
>> each data point has to be considered (at least I assume that this is the
>> case). I don't see a a way around this.
>>
>
> You're correct.
>  
>
>>
>> If you know for sure that your counter does not reset (at least in the
>> timeframe you are interested in), you might achieve what you want by a
>> simple substraction which should be less resource-intensive:
>>
>> your_metric - your_metric offset 14d
>>
>> Of course, you can also increase the max-samples value. It is primarily
>> there as a safeguard against high resource usage (i.e. you might need
>> more RAM and longer processing times).
>>
>
> Even with 15d of data at a 1s interval, that's only 1.3M samples that need 
> to be in memory at a time to calculate the rate() - so it's not the rate() 
> function that's the issue here.
>
> -- 
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4d076cee-9ac9-4b0d-8247-83c2bceb0ff3%40googlegroups.com.

Reply via email to