Hey Ben, Wow, look at all this activity! Thanks so much for jumping on this stuff.
Just answering a question that was directed at me below: On Sat, May 16, 2020 at 2:18 AM Ben Kochie <[email protected]> wrote: > > > On Sat, May 16, 2020 at 11:13 AM Julius Volz <[email protected]> > wrote: > >> On Sat, May 16, 2020 at 10:25 AM Ben Kochie <[email protected]> wrote: >> >>> Thanks for the link to the other survey. That's pretty good. >>> >>> On Fri, May 15, 2020 at 7:27 PM 'Tom Lee' via Prometheus Users < >>> [email protected]> wrote: >>> >>>> Hi Richard, >>>> >>>> Reading between the lines it sounds like we're *potentially* talking >>>> about a broader/larger "State of Clojure >>>> <https://clojure.org/news/2020/02/20/state-of-clojure-2020>" >>>> type thing for Prometheus. Is that accurate? >>>> >>>> Certainly don't mind the results being public. My only real concern is >>>> timelines: we were hoping to use some of the raw data to help advise some >>>> load testing on our end, and things are already looking pretty aggressive. >>>> If we're looking at something that's going to take weeks or more to start >>>> seeing results rolling in we probably won't quite get the data we were >>>> hoping to get in time. From a purely selfish perspective we'd be pretty >>>> disappointed to go forward without data from "the source", so to speak. Of >>>> course, I totally understand the team's actions here. I'm just whining to >>>> myself. >>>> >>>> Timelines aside, we'd be excited to see something "official" in the >>>> longer term. It would be useful for engineers like myself, and I know there >>>> are product managers and research folks lurking our virtual halls who would >>>> love such readily available data for future efforts. >>>> >>>> The questions from our survey: >>>> >>>> - Roughly how many Prometheus *servers* are you operationally >>>> responsible for? >>>> - Of all the Prometheus servers that you are responsible for, which >>>> version would you say is the most widely deployed? >>>> >>>> The first two questions are good. I might modify the first one to >>> clarify with/without HA. For example, we have 21 Prometheus servers, but 7 >>> of those are duplicates for HA. >>> >>>> >>>> - How many unique metrics are reporting across all of your >>>> Prometheus servers? >>>> - How many unique *timeseries* are reporting across all of your >>>> Prometheus servers? >>>> >>>> These two need to be clarified for Prometheus. We tend to use the terms >>> metrics and time-series interchangeably. Are you asking about unique metric >>> names? >>> >> >> I guess the first question is about unique metric names. The problem is >> that there's no easy way to get the number of unique metric names across >> multiple servers, as there might be anything between 0 - 100% overlap of >> metric names between Prometheus servers, and getting users to calculate a >> set union might be too much work. Also, time series are more relevant than >> number of metrics in Prometheus, so maybe we should only keep the second >> question? >> > > Yes, I'm interested in what Tom's intent is behind the question. From a > Prometheus perspective, the total time-series load is most important. But > it might be different for his use case. > Ah yep, really great question. I'm going to absolutely butcher the terminology here, but the idea is we're sort of trying to differentiate between "number of unique metric names" and "label/dimensional cardinality within those metrics". The reason for us differentiating is something of an implementation detail with respect to our own systems, but I think it also applies somewhat to Prometheus and/or Grafana too: when you run a non-aggregating query for a metric *x*, you might expect to see one timeseries charted -- or you might see hundreds or even thousands. In our own test setup we have JMX metrics for 15 Kafka servers reporting in. Executing a "query" like *kafka_cluster_Partition_Value *(a metric reported by the JMX exporter on behalf of Kafka) yields something like 20,000-30,000 distinct timeseries charted by Prometheus. It spends a surprising amount of time to execute that simple little query as a result. This sort of cardinality "explosion" has big implications for system architecture and scalability in our own systems, too. Please let me know if that's still not clear! > We should probably include some specific PromQL queries to make the > results easy to gather for survey participants. > Yeah this is a great idea. My own PromQL skills are pretty lame or I probably would've done something like this myself. :) >>>> - If you use Grafana to visualize your Prometheus data, what >>>> version of Grafana do you typically use? >>>> - What value do you typically use for the "scrape_interval" config >>>> setting in your Prometheus servers? >>>> - Is there anything else you would like to tell us about your >>>> Prometheus deployment(s)? For example, interesting challenges, pain >>>> points, >>>> or quirks of your configuration? >>>> >>>> I have a few additional questions that could be added to the list. >>> >>> * How many unique exporter/target types do you have? >>> * What is your samples/second ingestion rate across all Prometheus >>> servers? >>> * What is your general metric retention time? >>> * Do you use external storage (Federation/remote_write/etc) >>> * If yes, which external storage system(s)? >>> >>> >>>> Look forward to whatever might eventuate here, those big community >>>> surveys are always a lot of fun to read through. >>>> >>>> Cheers, >>>> Tom >>>> >>>> On Fri, May 15, 2020 at 9:25 AM Richard Hartmann < >>>> [email protected]> wrote: >>>> >>>>> Hi Tom, >>>>> >>>>> after some internal deliberation, we think it would be unfair to give >>>>> any single survey our official blessing, and running more than once >>>>> every, say, year seems to be too much for users, too. On the other >>>>> hand, user surveys make sense for everyone. >>>>> >>>>> Would you be OK with sending your questions to >>>>> [email protected] or as a reply in this thread? We >>>>> would then publish them for comments/feedback and run the survey under >>>>> the Prometheus umbrella, sharing replies publicly. >>>>> >>>>> >>>>> Best, >>>>> Richard >>>>> >>>>> On Wed, May 13, 2020 at 9:44 PM 'Tom Lee' via Prometheus Users >>>>> <[email protected]> wrote: >>>>> > >>>>> > Understood Julius, appreciate the transparency. Thank you! >>>>> > >>>>> > On Wed, May 13, 2020 at 12:37 PM Julius Volz < >>>>> [email protected]> wrote: >>>>> >> >>>>> >> Hi Tom, >>>>> >> >>>>> >> Thanks for checking in first! We're currently discussing within the >>>>> Prometheus Team how we would prefer to handle such requests in general (so >>>>> that things remain fair between companies, etc.) and will get back to you >>>>> as soon. >>>>> >> >>>>> >> Regards, >>>>> >> Julius >>>>> >> >>>>> >> On Wed, May 13, 2020 at 7:21 PM 'Tom Lee' via Prometheus Users < >>>>> [email protected]> wrote: >>>>> >>> >>>>> >>> Hi folks, >>>>> >>> >>>>> >>> Full disclosure: I'm an engineer from New Relic ( >>>>> https://newrelic.com/). We've been looking into improving our open >>>>> source monitoring story and Prometheus is a key piece of that. Right now, >>>>> though, there are some pieces of the puzzle that we can't easily dig into >>>>> without more input from the Prometheus community at large. >>>>> >>> >>>>> >>> Is this mailing list an okay place to send a Google Forms type >>>>> survey with maybe half a dozen questions? And if not, can folks suggest >>>>> somewhere that might be more appropriate? >>>>> >>> >>>>> >>> Cheers, >>>>> >>> Tom >>>>> >>> >>>>> >>> -- >>>>> >>> You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Users" group. >>>>> >>> To unsubscribe from this group and stop receiving emails from it, >>>>> send an email to [email protected]. >>>>> >>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-users/b48e651a-b264-4ade-8eb8-7dc7eec11b15%40googlegroups.com >>>>> . >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Julius Volz >>>>> >> PromLabs - promlabs.com >>>>> > >>>>> > -- >>>>> > You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Users" group. >>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>> send an email to [email protected]. >>>>> > To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-users/CAMUmz5j6aqwyJ8aDmFvggkD%2BeTpMMLzM0zTp1kXgnFP%2Bd%3Dw5fQ%40mail.gmail.com >>>>> . >>>>> >>>>> >>>>> >>>>> -- >>>>> Richard >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Prometheus Users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/prometheus-users/CAMUmz5gNNk4naMFLJ_cWQJQO5g1pGVTZnn74FF4PV%3DHOmhpTqw%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/prometheus-users/CAMUmz5gNNk4naMFLJ_cWQJQO5g1pGVTZnn74FF4PV%3DHOmhpTqw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/CABbyFmqGobiq7NUF-jzqGXOphhG8JjYrb%3D19C5Sm6W5CctmquA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/prometheus-users/CABbyFmqGobiq7NUF-jzqGXOphhG8JjYrb%3D19C5Sm6W5CctmquA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAMUmz5hNMbX45pzwkvEErcNnckC%2Bp3PuwjfWnbLPqUjROjih-Q%40mail.gmail.com.

