Re: [prometheus-users] Clarification about the context of not overuse the label values

Brian Brazil Fri, 07 Aug 2020 01:10:13 -0700

On Fri, 7 Aug 2020 at 08:57, Pau Freixes <[email protected]> wrote:

> Hi,
>
> By reading this [1] and this [2] having the feeling that the reader,
> or maybe only me, can have some troubles about when this rule can be
> applied and under what circumstances this should be applied and how.
>
> From my understanding, correct me if I'm wrong, Prometheus is
> encouraging the use of labels for slicing your metrics [2], like for
> example for identifying what service is the owner of a time series.
> Considering the following HTTP metrics http_api_requests, would be
> fine having different time series for the same metric name identified
> with the following label values
>
> http_api_requests service_name=foo, status_code=200
> http_api_requests service_name=foo, status_code=500
> http_api_requests service_name=bar, status_code=200
> http_api_requests service_name=bar, status_code=500
>
> And in the use case of having not 2 services but 1K different
> services, this would be still fine since the total number of metrics
> would be still manageable.
>


1K services is high cardinality, and then that's also broken out by
status_code.


>
> From what can be read in [1], this could be misunderstood
>
> > As a general guideline, try to keep the cardinality of your metrics
> below 10, and for metrics that exceed that, aim to limit them to a handful
> across your whole system. The vast majority of your metrics should have no
> labels.
>
> Looking at the previous example and the general guideline someone
> could understand that adding the service_name as a label name is
> breaking that rule.
>
> From my understanding, correct me if I'm wrong, what this general
> guideline is should be circumscribed on the side effect of adding a
> label with a large cardinality, or by adding one that thought not
> having a large cardinality once it's added together with another label
> implies an explosion with the number of the metrics.
>

That's the basic idea.


>
> For example, let's consider the previous example of the
> http_api_requests, what would happen if we would add the resource path
> as a metric variable? having something like this
>
> http_api_requests service_name=foo, status_code=200, resource_path="/a"
> http_api_requests service_name=foo, status_code=500, resource_path="/b"
> http_api_requests service_name=bar, status_code=200, resource_path="/c"
> http_api_requests service_name=bar, status_code=500, resource_path="/d"
>
> This will become an issue? having the feeling that it would depend,
> depend on how the query is done. If the query would be done also
> narrowing by service name this should not be a problem since the total
> number of time series should be still a manageable number, while the
> total number of time series if the query was not filtered by service
> name will be most likely unmanageable.
>
> If this is true, and most likely the second query wouldn't make any
> sense, why not prefix the metric name by the service name for avoiding
> future queries that by mistake could break the system?
>

That doesn't change the cardinality, it just makes querying harder for
users and is an anti-pattern.


>
> Another example, lets consider that we add as a label the pod id,
> which can have thousands of different values but they are in somehow
> stable during a window time, the metric will look like this
>
> http_api_requests service_name=foo, status_code=200,
> resource_path="/a", pod_name="1ef"
> http_api_requests service_name=foo, status_code=500,
> resource_path="/b", pod_name="2ef"
> http_api_requests service_name=bar, status_code=200,
> resource_path="/c", pod_name="3ef"
> http_api_requests service_name=bar, status_code=500,
> resource_path="/d", pod_name="4ef"
>
> The query that we will be running typically won't be using any pod
> slicing, but we will still do a narrowing by service name. Let's
> consider a scenario where we do have more or less a stable number of
> 500 pods in a window time, would be the query still manageable by
> PrometheusIO?
>

That will end very poorly, as we're now talking a cardinality of 4 million
(presuming just 2 status codes and 4 paths) from each individual target.

Looking at the example that you provide about node_exporter seems fine
> to me since we will still narrow the query always to one specific
> service which will reduce dramatically the number of time series
> involved during the query.
>

It's not just about querying, it's also about how much data Prometheus has
to store. Prometheus can practically hold somewhere in the low tens of
millions of active time series.


>
> am I missing something in my rationale? If not, would it make sense on
> rewording a bit the following message:
>
> >> As a general guideline, try to keep the cardinality of your metrics
> below 10, and for metrics that exceed that, aim to limit them to a handful
> across your whole system. The vast majority of your metrics should have no
> labels.
>
> Should be used as a rule of thumb the number of time series involved
> during a query, where this number should be < X?
>

Less than 100K is a good guideline there I think. You can do more, but
things start to get problematic by the time you're at 1M.

Brian


>
> Thanks!
>
>
> [1]
> https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels
> [2] https://www.robustperception.io/target-labels-not-metric-name-prefixes
>
> --
> --pau
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/CA%2BULCcF3h%3DvEvsVPZs-2zC2xNrd60tz6vZMMN4aN-6LwEdz75A%40mail.gmail.com
> .
>


-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHJKeLq3LUcJT1zT71Eb5u1X5HfGZK4iVQ3Czn0Bv89bYQ80kg%40mail.gmail.com.

Re: [prometheus-users] Clarification about the context of not overuse the label values

Reply via email to