On Fri, 7 Aug 2020 at 08:57, Pau Freixes <[email protected]> wrote: > Hi, > > By reading this [1] and this [2] having the feeling that the reader, > or maybe only me, can have some troubles about when this rule can be > applied and under what circumstances this should be applied and how. > > From my understanding, correct me if I'm wrong, Prometheus is > encouraging the use of labels for slicing your metrics [2], like for > example for identifying what service is the owner of a time series. > Considering the following HTTP metrics http_api_requests, would be > fine having different time series for the same metric name identified > with the following label values > > http_api_requests service_name=foo, status_code=200 > http_api_requests service_name=foo, status_code=500 > http_api_requests service_name=bar, status_code=200 > http_api_requests service_name=bar, status_code=500 > > And in the use case of having not 2 services but 1K different > services, this would be still fine since the total number of metrics > would be still manageable. >
1K services is high cardinality, and then that's also broken out by status_code. > > From what can be read in [1], this could be misunderstood > > > As a general guideline, try to keep the cardinality of your metrics > below 10, and for metrics that exceed that, aim to limit them to a handful > across your whole system. The vast majority of your metrics should have no > labels. > > Looking at the previous example and the general guideline someone > could understand that adding the service_name as a label name is > breaking that rule. > > From my understanding, correct me if I'm wrong, what this general > guideline is should be circumscribed on the side effect of adding a > label with a large cardinality, or by adding one that thought not > having a large cardinality once it's added together with another label > implies an explosion with the number of the metrics. > That's the basic idea. > > For example, let's consider the previous example of the > http_api_requests, what would happen if we would add the resource path > as a metric variable? having something like this > > http_api_requests service_name=foo, status_code=200, resource_path="/a" > http_api_requests service_name=foo, status_code=500, resource_path="/b" > http_api_requests service_name=bar, status_code=200, resource_path="/c" > http_api_requests service_name=bar, status_code=500, resource_path="/d" > > This will become an issue? having the feeling that it would depend, > depend on how the query is done. If the query would be done also > narrowing by service name this should not be a problem since the total > number of time series should be still a manageable number, while the > total number of time series if the query was not filtered by service > name will be most likely unmanageable. > > If this is true, and most likely the second query wouldn't make any > sense, why not prefix the metric name by the service name for avoiding > future queries that by mistake could break the system? > That doesn't change the cardinality, it just makes querying harder for users and is an anti-pattern. > > Another example, lets consider that we add as a label the pod id, > which can have thousands of different values but they are in somehow > stable during a window time, the metric will look like this > > http_api_requests service_name=foo, status_code=200, > resource_path="/a", pod_name="1ef" > http_api_requests service_name=foo, status_code=500, > resource_path="/b", pod_name="2ef" > http_api_requests service_name=bar, status_code=200, > resource_path="/c", pod_name="3ef" > http_api_requests service_name=bar, status_code=500, > resource_path="/d", pod_name="4ef" > > The query that we will be running typically won't be using any pod > slicing, but we will still do a narrowing by service name. Let's > consider a scenario where we do have more or less a stable number of > 500 pods in a window time, would be the query still manageable by > PrometheusIO? > That will end very poorly, as we're now talking a cardinality of 4 million (presuming just 2 status codes and 4 paths) from each individual target. Looking at the example that you provide about node_exporter seems fine > to me since we will still narrow the query always to one specific > service which will reduce dramatically the number of time series > involved during the query. > It's not just about querying, it's also about how much data Prometheus has to store. Prometheus can practically hold somewhere in the low tens of millions of active time series. > > am I missing something in my rationale? If not, would it make sense on > rewording a bit the following message: > > >> As a general guideline, try to keep the cardinality of your metrics > below 10, and for metrics that exceed that, aim to limit them to a handful > across your whole system. The vast majority of your metrics should have no > labels. > > Should be used as a rule of thumb the number of time series involved > during a query, where this number should be < X? > Less than 100K is a good guideline there I think. You can do more, but things start to get problematic by the time you're at 1M. Brian > > Thanks! > > > [1] > https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels > [2] https://www.robustperception.io/target-labels-not-metric-name-prefixes > > -- > --pau > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/CA%2BULCcF3h%3DvEvsVPZs-2zC2xNrd60tz6vZMMN4aN-6LwEdz75A%40mail.gmail.com > . > -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLq3LUcJT1zT71Eb5u1X5HfGZK4iVQ3Czn0Bv89bYQ80kg%40mail.gmail.com.

