On Sun, May 17, 2020 at 7:57 PM Tom Lee <[email protected]> wrote:
> Yes, I'm interested in what Tom's intent is behind the question. From a
>> Prometheus perspective, the total time-series load is most important. But
>> it might be different for his use case.
>>
>
> Ah yep, really great question. I'm going to absolutely butcher the
> terminology here, but the idea is we're sort of trying to differentiate
> between "number of unique metric names" and "label/dimensional cardinality
> within those metrics". The reason for us differentiating is something of an
> implementation detail with respect to our own systems, but I think it also
> applies somewhat to Prometheus and/or Grafana too: when you run a
> non-aggregating query for a metric *x*, you might expect to see one
> timeseries charted -- or you might see hundreds or even thousands. In our
> own test setup we have JMX metrics for 15 Kafka servers reporting in.
> Executing a "query" like *kafka_cluster_Partition_Value *(a metric
> reported by the JMX exporter on behalf of Kafka) yields something like
> 20,000-30,000 distinct timeseries charted by Prometheus. It spends a
> surprising amount of time to execute that simple little query as a result.
> This sort of cardinality "explosion" has big implications for system
> architecture and scalability in our own systems, too.
>
Sorry for the delay! Yeah, makes sense, metric names that have many series
can be problematic in UIs when doing queries without filters or
aggregations. On the other hand, we know that having at least *some* of
those is very common (almost every user has a couple huge ones), so we
probably don't need a survey to tell us that :) More importantly maybe, to
see how many metrics are too "overloaded", just having the total number
metric names vs. the total number of series doesn't answer the question
fully: you don't know whether the series are evenly split up across your
metric names, or whether they're all clustered in a few names. It's also a
bit challenging to get users to compile a list of distinct metric names
across Prometheus servers, without some command-line foo or similar. We
could ask something along the lines of "How many series do your largest N
metric names contain?", and then give them a query like 'topk(3, count
by(__name__) ({__name__!=""}))' to determine that per server. It would
still require some manual work to combine results between servers though,
hmmm...
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/CA%2BT6Yoy09CpjyxSes3QmrDanXq%3DOtPbrX%2BYPaKD2ga4X1%3DW3Vw%40mail.gmail.com.