[prometheus-users] Re: How auto-resolved an alarm ?

Brian Candler Thu, 23 Jun 2022 00:34:58 -0700

The length of the label doesn't really matter in this discussion: you 
should not be putting a log message in a label at all.  *Any* label which 
varies from request to request is a serious problem, because each unique 
value of that label will generate a new timeseries in Prometheus, and 
you'll get a cardinality explosion.

Internally, Prometheus maintains a mapping of
     {bag of labels} => timeseries

Whether the labels themselves are short or long makes very little 
difference.  It's the number of distinct values of that label which is 
important, because that defines the number of timeseries.  Each timeseries 
has impacts on RAM usage and chunk storage.

If you have a limited set of log *categories* - say a few dozen values - 
then using that as a label is fine.  The problem is a label whose value 
varies from event to event, e.g. it contains a timestamp or an IP address 
or any varying value.  You will cause yourself great pain if you use such 
things as labels.

But don't take my word for it - please read
https://prometheus.io/docs/practices/naming/#labels
https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels

"CAUTION: Remember that every unique combination of key-value label pairs 
represents a new time series, which can dramatically increase the amount of 
data stored.* Do not use labels to store dimensions with high cardinality 
*(many 
different label values), such as user IDs, email addresses, or other 
unbounded sets of values."

I completely understand your desire to get specific log messages in alerts. 
If you need to do that, then as I said before, use Loki instead of 
Prometheus.  Loki stores the entire log message, as well as labels.  It has 
its own LogQL query language inspired by PromQL, and integrates with 
Grafana and alerting.  It's what you need for handling logs, rather than 
metrics.

(If you still want to do this with prometheus, it would be an interesting 
project to see if you can get exemplars in an alert.  But I suspect this 
would involve hacking mtail, alertmanager and even prometheus itself.  This 
is something only to be attempted by a serious Go coder)

On Thursday, 23 June 2022 at 08:13:40 UTC+1 Loïc wrote:

> Hi,
>
> If i use the label for storing the message field, do you know what is the 
> maximum length of the string that should not be exceeded?
> Is there a recommendation on the maximum size?
>
> Thanks
> Loïc
>
> Le mercredi 22 juin 2022 à 16:44:37 UTC+2, Loïc a écrit :
>
>> Thanks for your reply Brian :)
>>
>> Le mercredi 22 juin 2022 à 15:24:19 UTC+2, Brian Candler a écrit :
>>
>>> > if i want sent the error log into the alarm generated, i should add 
>>> the error message as label of my metric.
>>>
>>> That gives you a high cardinality label, which is not what Prometheus is 
>>> designed for.  Every distinct combination of labels defines a new 
>>> timeseries.
>>>
>>> I can see two solutions here:
>>>
>>> 1. Use a log storage system like Loki or ElasticSearch/OpenSearch, 
>>> rather than Prometheus
>>>
>>> 2. Include the error message as an "exemplar".  When you have multiple 
>>> events in the same timeseries and time window, then you'll only get one 
>>> exemplar.  But it may be good enough to give you an "example" of the type 
>>> of error you're seeing, and it keeps the cardinality of your counters low. 
>>> (Exemplars are experimental and need to be turned on with a feature flag, 
>>> and I don't know if mtail supports them)
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/875f8088-9dd5-4e3e-98a7-dff47cc74fe5n%40googlegroups.com.

[prometheus-users] Re: How auto-resolved an alarm ?

Reply via email to