Hi Tom, Just posted the final survey here:
https://groups.google.com/forum/#!topic/prometheus-users/XU7tbVn23co https://groups.google.com/forum/#!topic/prometheus-developers/ToCQNP2mODQ Let's see what results look like, hope it's helpful although not all questions made it this time :) Regards, Julius On Fri, May 22, 2020 at 10:49 AM Julius Volz <[email protected]> wrote: > Yeah, I think as interesting as this could be, the survey is growing quite > large already, and this would be one of the more complicated questions in > terms of explaining it clearly enough and then getting users to compile the > results. So I'm tending towards leaving it out this time around. > > But from experience you can safely assume that most large Prometheus > deployments have a few metric names that are huge in their number of series > (like a couple of 10k), and that would blow up any graph or other UI > display without aggregation / filtering. > > On Wed, May 20, 2020 at 7:00 PM Tom Lee <[email protected]> wrote: > >> Yeah, agree. I really like the "largest N metric names" idea. I think >> both total series and "top N metrics" are interesting for different >> reasons, but also agree getting "real" numbers is a challenge whatever we >> decide to do here. :) >> >> On Wed, May 20, 2020 at 6:38 AM Julius Volz <[email protected]> >> wrote: >> >>> On Sun, May 17, 2020 at 7:57 PM Tom Lee <[email protected]> wrote: >>> >>>> Yes, I'm interested in what Tom's intent is behind the question. From a >>>>> Prometheus perspective, the total time-series load is most important. But >>>>> it might be different for his use case. >>>>> >>>> >>>> Ah yep, really great question. I'm going to absolutely butcher the >>>> terminology here, but the idea is we're sort of trying to differentiate >>>> between "number of unique metric names" and "label/dimensional cardinality >>>> within those metrics". The reason for us differentiating is something of an >>>> implementation detail with respect to our own systems, but I think it also >>>> applies somewhat to Prometheus and/or Grafana too: when you run a >>>> non-aggregating query for a metric *x*, you might expect to see one >>>> timeseries charted -- or you might see hundreds or even thousands. In our >>>> own test setup we have JMX metrics for 15 Kafka servers reporting in. >>>> Executing a "query" like *kafka_cluster_Partition_Value *(a metric >>>> reported by the JMX exporter on behalf of Kafka) yields something like >>>> 20,000-30,000 distinct timeseries charted by Prometheus. It spends a >>>> surprising amount of time to execute that simple little query as a result. >>>> This sort of cardinality "explosion" has big implications for system >>>> architecture and scalability in our own systems, too. >>>> >>> >>> Sorry for the delay! Yeah, makes sense, metric names that have many >>> series can be problematic in UIs when doing queries without filters or >>> aggregations. On the other hand, we know that having at least *some* of >>> those is very common (almost every user has a couple huge ones), so we >>> probably don't need a survey to tell us that :) More importantly maybe, to >>> see how many metrics are too "overloaded", just having the total number >>> metric names vs. the total number of series doesn't answer the question >>> fully: you don't know whether the series are evenly split up across your >>> metric names, or whether they're all clustered in a few names. It's also a >>> bit challenging to get users to compile a list of distinct metric names >>> across Prometheus servers, without some command-line foo or similar. We >>> could ask something along the lines of "How many series do your largest N >>> metric names contain?", and then give them a query like 'topk(3, count >>> by(__name__) ({__name__!=""}))' to determine that per server. It would >>> still require some manual work to combine results between servers though, >>> hmmm... >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CA%2BT6Yox2Ojuc4vQHybYHcgsGC%3DLX-b8-sptjTWiA%2BnX5SckyWQ%40mail.gmail.com.

