The length of the label doesn't really matter in this discussion: you
should not be putting a log message in a label at all. *Any* label which
varies from request to request is a serious problem, because each unique
value of that label will generate a new timeseries in Prometheus, and
you'll get a cardinality explosion.
Internally, Prometheus maintains a mapping of
{bag of labels} => timeseries
Whether the labels themselves are short or long makes very little
difference. It's the number of distinct values of that label which is
important, because that defines the number of timeseries. Each timeseries
has impacts on RAM usage and chunk storage.
If you have a limited set of log *categories* - say a few dozen values -
then using that as a label is fine. The problem is a label whose value
varies from event to event, e.g. it contains a timestamp or an IP address
or any varying value. You will cause yourself great pain if you use such
things as labels.
But don't take my word for it - please read
https://prometheus.io/docs/practices/naming/#labels
https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels
"CAUTION: Remember that every unique combination of key-value label pairs
represents a new time series, which can dramatically increase the amount of
data stored.* Do not use labels to store dimensions with high cardinality
*(many
different label values), such as user IDs, email addresses, or other
unbounded sets of values."
I completely understand your desire to get specific log messages in alerts.
If you need to do that, then as I said before, use Loki instead of
Prometheus. Loki stores the entire log message, as well as labels. It has
its own LogQL query language inspired by PromQL, and integrates with
Grafana and alerting. It's what you need for handling logs, rather than
metrics.
(If you still want to do this with prometheus, it would be an interesting
project to see if you can get exemplars in an alert. But I suspect this
would involve hacking mtail, alertmanager and even prometheus itself. This
is something only to be attempted by a serious Go coder)
On Thursday, 23 June 2022 at 08:13:40 UTC+1 Loïc wrote:
> Hi,
>
> If i use the label for storing the message field, do you know what is the
> maximum length of the string that should not be exceeded?
> Is there a recommendation on the maximum size?
>
> Thanks
> Loïc
>
> Le mercredi 22 juin 2022 à 16:44:37 UTC+2, Loïc a écrit :
>
>> Thanks for your reply Brian :)
>>
>> Le mercredi 22 juin 2022 à 15:24:19 UTC+2, Brian Candler a écrit :
>>
>>> > if i want sent the error log into the alarm generated, i should add
>>> the error message as label of my metric.
>>>
>>> That gives you a high cardinality label, which is not what Prometheus is
>>> designed for. Every distinct combination of labels defines a new
>>> timeseries.
>>>
>>> I can see two solutions here:
>>>
>>> 1. Use a log storage system like Loki or ElasticSearch/OpenSearch,
>>> rather than Prometheus
>>>
>>> 2. Include the error message as an "exemplar". When you have multiple
>>> events in the same timeseries and time window, then you'll only get one
>>> exemplar. But it may be good enough to give you an "example" of the type
>>> of error you're seeing, and it keeps the cardinality of your counters low.
>>> (Exemplars are experimental and need to be turned on with a feature flag,
>>> and I don't know if mtail supports them)
>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/875f8088-9dd5-4e3e-98a7-dff47cc74fe5n%40googlegroups.com.