Yeah, agree. I really like the "largest N metric names" idea. I think both
total series and "top N metrics" are interesting for different reasons, but
also agree getting "real" numbers is a challenge whatever we decide to do
here. :)

On Wed, May 20, 2020 at 6:38 AM Julius Volz <[email protected]> wrote:

> On Sun, May 17, 2020 at 7:57 PM Tom Lee <[email protected]> wrote:
>
>> Yes, I'm interested in what Tom's intent is behind the question. From a
>>> Prometheus perspective, the total time-series load is most important. But
>>> it might be different for his use case.
>>>
>>
>> Ah yep, really great question. I'm going to absolutely butcher the
>> terminology here, but the idea is we're sort of trying to differentiate
>> between "number of unique metric names" and "label/dimensional cardinality
>> within those metrics". The reason for us differentiating is something of an
>> implementation detail with respect to our own systems, but I think it also
>> applies somewhat to Prometheus and/or Grafana too: when you run a
>> non-aggregating query for a metric *x*, you might expect to see one
>> timeseries charted -- or you might see hundreds or even thousands. In our
>> own test setup we have JMX metrics for 15 Kafka servers reporting in.
>> Executing a "query" like *kafka_cluster_Partition_Value *(a metric
>> reported by the JMX exporter on behalf of Kafka) yields something like
>> 20,000-30,000 distinct timeseries charted by Prometheus. It spends a
>> surprising amount of time to execute that simple little query as a result.
>> This sort of cardinality "explosion" has big implications for system
>> architecture and scalability in our own systems, too.
>>
>
> Sorry for the delay! Yeah, makes sense, metric names that have many series
> can be problematic in UIs when doing queries without filters or
> aggregations. On the other hand, we know that having at least *some* of
> those is very common (almost every user has a couple huge ones), so we
> probably don't need a survey to tell us that :) More importantly maybe, to
> see how many metrics are too "overloaded", just having the total number
> metric names vs. the total number of series doesn't answer the question
> fully: you don't know whether the series are evenly split up across your
> metric names, or whether they're all clustered in a few names. It's also a
> bit challenging to get users to compile a list of distinct metric names
> across Prometheus servers, without some command-line foo or similar. We
> could ask something along the lines of "How many series do your largest N
> metric names contain?", and then give them a query like 'topk(3, count
> by(__name__) ({__name__!=""}))' to determine that per server. It would
> still require some manual work to combine results between servers though,
> hmmm...
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMUmz5g%2B-sUFSdeY4%3D%3D3366KfoyE9ibDLgh6iYmrXy5v1dPxag%40mail.gmail.com.

Reply via email to