To add a bit more detail to that example, I was actually using a fairly tuned remote write queue config that sent large batches since the batch send deadline was set to 1 minute longer with a max samples per send of 5,000. Here's that config: ``` remote_write: - url: http://localhost:3030/remote/write remote_timeout: 30s queue_config: capacity: 10000 max_shards: 10 min_shards: 3 max_samples_per_send: 5000 batch_send_deadline: 1m min_backoff: 50ms max_backoff: 1s ```
Using the default config we get worse utilization for both before/after numbers but the delta/difference is less: - steady state ~177kb/sec without this change - steady state ~210kb/sec with this change - roughly 20% increase Using config: ``` remote_write: - url: http://localhost:3030/remote/write remote_timeout: 30s ``` Implicitly the values for this config is: - min shards 1 - max shards 1000 - max samples per send 100 - capacity 500 - batch send deadline 5s - min backoff 30ms - max backoff 100ms On Wed, Aug 19, 2020 at 4:26 AM Brian Brazil < [email protected]> wrote: > On Wed, 19 Aug 2020 at 09:20, Rob Skillington <[email protected]> wrote: > >> Here's the results from testing: >> - node_exporter exporting 309 metrics each by turning on a lot of >> optional >> collectors, all have help set, very few have unit set >> - running 8 on the host at 1s scrape interval, each with unique instance >> label >> - steady state ~137kb/sec without this change >> - steady state ~172kb/sec with this change >> - roughly 30% increase >> >> Graph here: >> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976 >> >> How do we want to proceed? This could be fairly close to the higher end of >> the spectrum in terms of expected increase given the node_exporter >> metrics >> density and fairly verbose metadata. >> >> Even having said that however 30% is a fairly big increase and relatively >> large >> egress cost to have to swallow without any way to back out of this >> behavior. >> >> What do folks think of next steps? >> > > It is on the high end, however this is going to be among the worst cases > as there's not going to be a lot of per-metric cardinality from the node > exporter. I bet if you greatly increased the number of targets (and reduced > the scrape interval to compensate) it'd be more reasonable. I think this is > just about okay. > > Brian > > >> >> >> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington <[email protected]> >> wrote: >> >>> Agreed - I'll see what I can do in getting some numbers for a workload >>> collecting cAdvisor metrics, it seems to have a significant amount of >>> HELP set: >>> >>> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics >>> >>> >>> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil < >>> [email protected]> wrote: >>> >>>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto < >>>> [email protected]> wrote: >>>> >>>>> On 11 Aug 11:05, Brian Brazil wrote: >>>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan <[email protected]> >>>>> wrote: >>>>> > >>>>> > > I'm hesitant to add anything that significantly increases the >>>>> network >>>>> > > bandwidth usage or remote write while at the same time not giving >>>>> users a >>>>> > > way to tune the usage to their needs. >>>>> > > >>>>> > > I agree with Brian that we don't want the protocol itself to become >>>>> > > stateful by introducing something like negotiation. I'd also >>>>> prefer not to >>>>> > > introduce multiple ways to do things, though I'm hoping we can >>>>> find a way >>>>> > > to accommodate your use case while not ballooning average users >>>>> network >>>>> > > egress bill. >>>>> > > >>>>> > > I am fine with forcing the consuming end to be somewhat stateful >>>>> like in >>>>> > > the case of Josh's PR where all metadata is sent periodically and >>>>> must be >>>>> > > stored by the remote storage system. >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> > > Overall I'd like to see some numbers regarding current network >>>>> bandwidth >>>>> > > of remote write, remote write with metadata via Josh's PR, and >>>>> remote write >>>>> > > with sending metadata for every series in a remote write payload. >>>>> > > >>>>> > >>>>> > I agree, I noticed that in Rob's PR and had the same thought. >>>>> >>>>> Remote bandwidth are likely to affect only people using remote write. >>>>> >>>>> Getting a view on the on-disk size of the WAL would be great too, as >>>>> that will affect everyone. >>>>> >>>> >>>> I'm not worried about that, it's only really on series creation so >>>> won't be noticed unless you have really high levels of churn. >>>> >>>> Brian >>>> >>>> >>>>> >>>>> > >>>>> > Brian >>>>> > >>>>> > >>>>> > > >>>>> > > Rob, I'll review your PR tomorrow but it looks like Julien and >>>>> Brian may >>>>> > > already have that covered. >>>>> > > >>>>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington < >>>>> [email protected]> >>>>> > > wrote: >>>>> > > >>>>> > >> Update: The PR now sends the fields over remote write from the >>>>> WAL and >>>>> > >> metadata >>>>> > >> is also updated in the WAL when any field changes. >>>>> > >> >>>>> > >> Now opened the PR against the primary repo: >>>>> > >> https://github.com/prometheus/prometheus/pull/7771 >>>>> > >> >>>>> > >> I have tested this end-to-end with a modified M3 branch: >>>>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata >>>>> > >> > {... "msg":"received >>>>> > >> series","labels":"{__name__="prometheus_rule_group_... >>>>> > >> > >>>>> iterations_total",instance="localhost:9090",job="prometheus01",role=... >>>>> > >> > "remote"}","type":"counter","unit":"","help":"The total number >>>>> of >>>>> > >> scheduled... >>>>> > >> > rule group evaluations, whether executed or missed."} >>>>> > >> >>>>> > >> Tests still haven't been updated. Please any feedback on the >>>>> approach / >>>>> > >> data structures would be greatly appreciated. >>>>> > >> >>>>> > >> Would be good to know what others thoughts are on next steps. >>>>> > >> >>>>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington < >>>>> [email protected]> >>>>> > >> wrote: >>>>> > >> >>>>> > >>> Here's a draft PR that builds that propagates metadata to the >>>>> WAL and >>>>> > >>> the WAL >>>>> > >>> reader can read it back: >>>>> > >>> https://github.com/robskillington/prometheus/pull/1/files >>>>> > >>> >>>>> > >>> Would like a little bit of feedback before on the datatypes and >>>>> > >>> structure going >>>>> > >>> further if folks are open to that. >>>>> > >>> >>>>> > >>> There's a few things not happening: >>>>> > >>> - Remote write queue manager does not use or send these extra >>>>> fields yet. >>>>> > >>> - Head does not reset the "metadata" slice (not sure where >>>>> "series" >>>>> > >>> slice is >>>>> > >>> reset in the head for pending series writes to WAL, want to do >>>>> in same >>>>> > >>> place). >>>>> > >>> - Metadata is not re-written on change yet. >>>>> > >>> - Tests. >>>>> > >>> >>>>> > >>> >>>>> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington < >>>>> [email protected]> >>>>> > >>> wrote: >>>>> > >>> >>>>> > >>>> Sounds good, I've updated the proposal with details on places >>>>> in which >>>>> > >>>> changes >>>>> > >>>> are required given the new approach: >>>>> > >>>> >>>>> > >>>> >>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit# >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil < >>>>> > >>>> [email protected]> wrote: >>>>> > >>>> >>>>> > >>>>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington < >>>>> [email protected]> >>>>> > >>>>> wrote: >>>>> > >>>>> >>>>> > >>>>>> True - I mean this could also be a blacklist by config >>>>> perhaps, so if >>>>> > >>>>>> you >>>>> > >>>>>> really don't want to have increased egress you can optionally >>>>> turn >>>>> > >>>>>> off sending >>>>> > >>>>>> the TYPE, HELP, UNIT or send them at different frequencies via >>>>> > >>>>>> config. We could >>>>> > >>>>>> package some sensible defaults so folks don't need to update >>>>> their >>>>> > >>>>>> config. >>>>> > >>>>>> >>>>> > >>>>>> The main intention is to enable these added features and make >>>>> it >>>>> > >>>>>> possible for >>>>> > >>>>>> various consumers to be able to adjust some of these >>>>> parameters if >>>>> > >>>>>> required >>>>> > >>>>>> since backends can be so different in their implementation. >>>>> For M3 I >>>>> > >>>>>> would be >>>>> > >>>>>> totally fine with the extra egress that should be mitigated >>>>> fairly >>>>> > >>>>>> considerably >>>>> > >>>>>> by Snappy and the fact that HELP is common across certain >>>>> metric >>>>> > >>>>>> families and >>>>> > >>>>>> receiving it every single Remote Write request. >>>>> > >>>>>> >>>>> > >>>>> >>>>> > >>>>> That's really a micro-optimisation. If you are that worried >>>>> about >>>>> > >>>>> bandwidth you'd run a sidecar specific to your remote backend >>>>> that was >>>>> > >>>>> stateful and far more efficient overall. Sending the full >>>>> label names and >>>>> > >>>>> values on every request is going to be far more than the >>>>> overhead of >>>>> > >>>>> metadata on top of that, so I don't see a need as it stands >>>>> for any of this >>>>> > >>>>> to be configurable. >>>>> > >>>>> >>>>> > >>>>> Brian >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>>> >>>>> > >>>>>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil < >>>>> > >>>>>> [email protected]> wrote: >>>>> > >>>>>> >>>>> > >>>>>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington < >>>>> [email protected]> >>>>> > >>>>>>> wrote: >>>>> > >>>>>>> >>>>> > >>>>>>>> Hey Björn, >>>>> > >>>>>>>> >>>>> > >>>>>>>> >>>>> > >>>>>>>> Thanks for the detailed response. I've had a few back and >>>>> forths on >>>>> > >>>>>>>> this with >>>>> > >>>>>>>> Brian and Chris over IRC and CNCF Slack now too. >>>>> > >>>>>>>> >>>>> > >>>>>>>> I agree that fundamentally it seems naive to idealistically >>>>> model >>>>> > >>>>>>>> this around >>>>> > >>>>>>>> per metric name. It needs to be per series given what may >>>>> happen >>>>> > >>>>>>>> w.r.t. >>>>> > >>>>>>>> collision across targets, etc. >>>>> > >>>>>>>> >>>>> > >>>>>>>> Perhaps we can separate these discussions apart into two >>>>> > >>>>>>>> considerations: >>>>> > >>>>>>>> >>>>> > >>>>>>>> 1) Modeling of the data such that it is kept around for >>>>> > >>>>>>>> transmission (primarily >>>>> > >>>>>>>> we're focused on WAL here). >>>>> > >>>>>>>> >>>>> > >>>>>>>> 2) Transmission (and of which you allude to has many areas >>>>> for >>>>> > >>>>>>>> improvement). >>>>> > >>>>>>>> >>>>> > >>>>>>>> For (1) - it seems like this needs to be done per time >>>>> series, >>>>> > >>>>>>>> thankfully we >>>>> > >>>>>>>> actually already have modeled this to be stored per series >>>>> data >>>>> > >>>>>>>> just once in a >>>>> > >>>>>>>> single WAL file. I will write up my proposal here, but it >>>>> will >>>>> > >>>>>>>> surmount to >>>>> > >>>>>>>> essentially encoding the HELP, UNIT and TYPE to the WAL per >>>>> series >>>>> > >>>>>>>> similar to >>>>> > >>>>>>>> how labels for a series are encoded once per series in the >>>>> WAL. >>>>> > >>>>>>>> Since this >>>>> > >>>>>>>> optimization is in place, there's already a huge dampening >>>>> effect >>>>> > >>>>>>>> on how >>>>> > >>>>>>>> expensive it is to write out data about a series (e.g. >>>>> labels). We >>>>> > >>>>>>>> can always >>>>> > >>>>>>>> go and collect a sample WAL file and measure how much extra >>>>> size >>>>> > >>>>>>>> with/without >>>>> > >>>>>>>> HELP, UNIT and TYPE this would add, but it seems like it >>>>> won't >>>>> > >>>>>>>> fundamentally >>>>> > >>>>>>>> change the order of magnitude in terms of "information >>>>> about a >>>>> > >>>>>>>> timeseries >>>>> > >>>>>>>> storage size" vs "datapoints about a timeseries storage >>>>> size". One >>>>> > >>>>>>>> extra change >>>>> > >>>>>>>> would be re-encoding the series into the WAL if the HELP >>>>> changed >>>>> > >>>>>>>> for that >>>>> > >>>>>>>> series, just so that when HELP does change it can be up to >>>>> date >>>>> > >>>>>>>> from the view >>>>> > >>>>>>>> of whoever is reading the WAL (i.e. the Remote Write loop). >>>>> Since >>>>> > >>>>>>>> this entry >>>>> > >>>>>>>> needs to be loaded into memory for Remote Write today >>>>> anyway, with >>>>> > >>>>>>>> string >>>>> > >>>>>>>> interning as suggested by Chris, it won't change the memory >>>>> profile >>>>> > >>>>>>>> algorithmically of a Prometheus with Remote Write enabled. >>>>> There >>>>> > >>>>>>>> will be some >>>>> > >>>>>>>> overhead that at most would likely be similar to the label >>>>> data, >>>>> > >>>>>>>> but we aren't >>>>> > >>>>>>>> altering data structures (so won't change big-O magnitude >>>>> of memory >>>>> > >>>>>>>> being used), >>>>> > >>>>>>>> we're adding fields to existing data structures that exist >>>>> and >>>>> > >>>>>>>> string interning >>>>> > >>>>>>>> should actually make it much less onerous since there is a >>>>> large >>>>> > >>>>>>>> duplicative >>>>> > >>>>>>>> effect with HELP among time series. >>>>> > >>>>>>>> >>>>> > >>>>>>>> For (2) - now we have basically TYPE, HELP and UNIT all >>>>> available >>>>> > >>>>>>>> for >>>>> > >>>>>>>> transmission if we wanted to send it with every single >>>>> datapoint. >>>>> > >>>>>>>> While I think >>>>> > >>>>>>>> we should definitely examine HPACK like compression >>>>> features as you >>>>> > >>>>>>>> mentioned >>>>> > >>>>>>>> Björn, I think we should think more about separating that >>>>> kind of >>>>> > >>>>>>>> work into a >>>>> > >>>>>>>> Milestone 2 where this is considered. >>>>> > >>>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>>> For the time being it's very plausible >>>>> > >>>>>>>> we could do some negotiation of the receiving Remote Write >>>>> endpoint >>>>> > >>>>>>>> by sending >>>>> > >>>>>>>> a "GET" to the remote write endpoint and seeing if it >>>>> responds with >>>>> > >>>>>>>> a >>>>> > >>>>>>>> "capabilities + preferences" response, and if the endpoint >>>>> > >>>>>>>> specifies that it >>>>> > >>>>>>>> would like to receive metadata all the time on every single >>>>> request >>>>> > >>>>>>>> and let >>>>> > >>>>>>>> Snappy take care of keeping size not ballooning too much, >>>>> or if it >>>>> > >>>>>>>> would like >>>>> > >>>>>>>> TYPE on every single datapoint, and HELP and UNIT every >>>>> > >>>>>>>> DESIRED_SECONDS or so. >>>>> > >>>>>>>> To enable a "send HELP every 10 minutes" feature we would >>>>> have to >>>>> > >>>>>>>> add to the >>>>> > >>>>>>>> datastructure that holds the LABELS, TYPE, HELP and UNIT >>>>> for each >>>>> > >>>>>>>> series a >>>>> > >>>>>>>> "last sent" timestamp to know when to resend to that >>>>> backend, but >>>>> > >>>>>>>> that seems >>>>> > >>>>>>>> entirely plausible and would not use more than 4 extra >>>>> bytes. >>>>> > >>>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> Negotiation is fundamentally stateful, as the process that >>>>> receives >>>>> > >>>>>>> the first request may be a very different one from the one >>>>> that receives >>>>> > >>>>>>> the second - such as if an upgrade is in progress. Remote >>>>> write is intended >>>>> > >>>>>>> to be a very simple thing that's easy to implement on the >>>>> receiver end and >>>>> > >>>>>>> is a send-only request-based protocol, so request-time >>>>> negotiation is >>>>> > >>>>>>> basically out. Any negotiation needs to happen via the >>>>> config file, and >>>>> > >>>>>>> even then it'd be better if nothing ever needed to be >>>>> configured. Getting >>>>> > >>>>>>> all the users of a remote write to change their config file >>>>> or restart all >>>>> > >>>>>>> their Prometheus servers is not an easy task after all. >>>>> > >>>>>>> >>>>> > >>>>>>> Brian >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>>> >>>>> > >>>>>>>> These thoughts are based on the discussion I've had and the >>>>> > >>>>>>>> thoughts on this >>>>> > >>>>>>>> thread. What's the feedback on this before I go ahead and >>>>> > >>>>>>>> re-iterate the design >>>>> > >>>>>>>> to more closely map to what I'm suggesting here? >>>>> > >>>>>>>> >>>>> > >>>>>>>> Best, >>>>> > >>>>>>>> Rob >>>>> > >>>>>>>> >>>>> > >>>>>>>> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein < >>>>> > >>>>>>>> [email protected]> wrote: >>>>> > >>>>>>>> >>>>> > >>>>>>>>> On 03.08.20 03:04, Rob Skillington wrote: >>>>> > >>>>>>>>> > Ok - I have a proposal which could be broken up into two >>>>> pieces, >>>>> > >>>>>>>>> first >>>>> > >>>>>>>>> > delivering TYPE per datapoint, the second consistently >>>>> and >>>>> > >>>>>>>>> reliably HELP and >>>>> > >>>>>>>>> > UNIT once per unique metric name: >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> >>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo >>>>> > >>>>>>>>> > /edit#heading=h.bik9uwphqy3g >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Thanks for the doc. I have commented on it, but while >>>>> doing so, I >>>>> > >>>>>>>>> felt >>>>> > >>>>>>>>> the urge to comment more generally, which would not fit >>>>> well into >>>>> > >>>>>>>>> the >>>>> > >>>>>>>>> margin of a Google doc. My thoughts are also a bit out of >>>>> scope of >>>>> > >>>>>>>>> Rob's design doc and more about the general topic of >>>>> remote write >>>>> > >>>>>>>>> and >>>>> > >>>>>>>>> the equally general topic of metadata (about which we have >>>>> an >>>>> > >>>>>>>>> ongoing >>>>> > >>>>>>>>> discussion among the Prometheus developers). >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Disclaimer: I don't know the remote-write protocol very >>>>> well. My >>>>> > >>>>>>>>> hope >>>>> > >>>>>>>>> here is that my somewhat distant perspective is of some >>>>> value as it >>>>> > >>>>>>>>> allows to take a step back. However, I might just miss >>>>> crucial >>>>> > >>>>>>>>> details >>>>> > >>>>>>>>> that completely invalidate my thoughts. We'll see... >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> I do care a lot about metadata, though. (And ironically, >>>>> the reason >>>>> > >>>>>>>>> why I declared remote write "somebody else's problem" is >>>>> that I've >>>>> > >>>>>>>>> always disliked how it fundamentally ignores metadata.) >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Rob's document embraces the fact that metadata can change >>>>> over >>>>> > >>>>>>>>> time, >>>>> > >>>>>>>>> but it assumes that at any given time, there is only one >>>>> set of >>>>> > >>>>>>>>> metadata per unique metric name. It takes into account >>>>> that there >>>>> > >>>>>>>>> can >>>>> > >>>>>>>>> be drift, but it considers them an irregularity that will >>>>> only >>>>> > >>>>>>>>> happen >>>>> > >>>>>>>>> occasionally and iron out over time. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> In practice, however, metadata can be legitimately and >>>>> deliberately >>>>> > >>>>>>>>> different for different time series of the same name. >>>>> > >>>>>>>>> Instrumentation >>>>> > >>>>>>>>> libraries and even the exposition format inherently >>>>> require one >>>>> > >>>>>>>>> set of >>>>> > >>>>>>>>> metadata per metric name, but this is all only enforced >>>>> (and meant >>>>> > >>>>>>>>> to >>>>> > >>>>>>>>> be enforced) _per target_. Once the samples are ingested >>>>> (or even >>>>> > >>>>>>>>> sent >>>>> > >>>>>>>>> onwards via remote write), they have no notion of what >>>>> target they >>>>> > >>>>>>>>> came from. Furthermore, samples created by rule evaluation >>>>> don't >>>>> > >>>>>>>>> have >>>>> > >>>>>>>>> an originating target in the first place. (Which raises the >>>>> > >>>>>>>>> question >>>>> > >>>>>>>>> of metadata for recording rules, which is another can of >>>>> worms I'd >>>>> > >>>>>>>>> like to open eventually...) >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> (There is also the technical difficulty that the WAL has >>>>> no notion >>>>> > >>>>>>>>> of >>>>> > >>>>>>>>> bundling or referencing all the series with the same >>>>> metric name. >>>>> > >>>>>>>>> That >>>>> > >>>>>>>>> was commented about in the doc but is not my focus here.) >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Rob's doc sees TYPE as special because it is so cheap to >>>>> just add >>>>> > >>>>>>>>> to >>>>> > >>>>>>>>> every data point. That's correct, but it's giving me an >>>>> itch: >>>>> > >>>>>>>>> Should >>>>> > >>>>>>>>> we really create different ways of handling metadata, >>>>> depending on >>>>> > >>>>>>>>> its >>>>> > >>>>>>>>> expected size? >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Compare this with labels. There is no upper limit to their >>>>> number >>>>> > >>>>>>>>> or >>>>> > >>>>>>>>> size. Still, we have no plan of treating "large" labels >>>>> differently >>>>> > >>>>>>>>> from "short" labels. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> On top of that, we have by now gained the insight that >>>>> metadata is >>>>> > >>>>>>>>> changing over time and essentially has to be tracked per >>>>> series. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Or in other words: From a pure storage perspective, >>>>> metadata >>>>> > >>>>>>>>> behaves >>>>> > >>>>>>>>> exactly the same as labels! (There are certainly huge >>>>> differences >>>>> > >>>>>>>>> semantically, but those only manifest themselves on the >>>>> query >>>>> > >>>>>>>>> level, >>>>> > >>>>>>>>> i.e. how you treat it in PromQL etc.) >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> (This is not exactly a new insight. This is more or less >>>>> what I >>>>> > >>>>>>>>> said >>>>> > >>>>>>>>> during the 2016 dev summit, when we first discussed remote >>>>> write. >>>>> > >>>>>>>>> But >>>>> > >>>>>>>>> I don't want to dwell on "told you so" moments... :o) >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> There is a good reason why we don't just add metadata as >>>>> "pseudo >>>>> > >>>>>>>>> labels": As discussed a lot in the various design docs >>>>> including >>>>> > >>>>>>>>> Rob's >>>>> > >>>>>>>>> one, it would blow up the data size significantly because >>>>> HELP >>>>> > >>>>>>>>> strings >>>>> > >>>>>>>>> tend to be relatively long. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> And that's the point where I would like to take a step >>>>> back: We are >>>>> > >>>>>>>>> discussing to essentially treat something that is >>>>> structurally the >>>>> > >>>>>>>>> same thing in three different ways: Way 1 for labels as we >>>>> know >>>>> > >>>>>>>>> them. Way 2 for "small" metadata. Way 3 for "big" metadata. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> However, while labels tend to be shorter than HELP >>>>> strings, there >>>>> > >>>>>>>>> is >>>>> > >>>>>>>>> the occasional use case with long or many labels. >>>>> (Infamously, at >>>>> > >>>>>>>>> SoundCloud, a binary accidentally put a whole HTML page >>>>> into a >>>>> > >>>>>>>>> label. That wasn't a use case, it was a bug, but the >>>>> Prometheus >>>>> > >>>>>>>>> server >>>>> > >>>>>>>>> ingesting that was just chugging along as if nothing >>>>> special had >>>>> > >>>>>>>>> happened. It looked weird in the expression browser, >>>>> though...) I'm >>>>> > >>>>>>>>> sure any vendor offering Prometheus remote storage as a >>>>> service >>>>> > >>>>>>>>> will >>>>> > >>>>>>>>> have a customer or two that use excessively long label >>>>> names. If we >>>>> > >>>>>>>>> have to deal with that, why not bite the bullet and treat >>>>> metadata >>>>> > >>>>>>>>> in >>>>> > >>>>>>>>> the same way as labels in general? Or to phrase it in >>>>> another way: >>>>> > >>>>>>>>> Any >>>>> > >>>>>>>>> solution for "big" metadata could be used for labels, too, >>>>> to >>>>> > >>>>>>>>> alleviate the pain with excessively long label names. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Or most succintly: A robust and really good solution for >>>>> > >>>>>>>>> "big" metadata in remote write will make remote write much >>>>> more >>>>> > >>>>>>>>> efficient if applied to labels, too. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Imagine an NALSD tech interview question that boils down >>>>> to "design >>>>> > >>>>>>>>> Prometheus remote write". I bet that most of the better >>>>> candidates >>>>> > >>>>>>>>> will recognize that most of the payload will consist of >>>>> series >>>>> > >>>>>>>>> indentifiers (call them labels or whatever) and they will >>>>> suggest >>>>> > >>>>>>>>> to >>>>> > >>>>>>>>> first transmit some kind of index and from then on only >>>>> transmit >>>>> > >>>>>>>>> short >>>>> > >>>>>>>>> series IDs. The best candidates will then find out about >>>>> all the >>>>> > >>>>>>>>> problems with that: How to keep the protocol stateless, >>>>> how to >>>>> > >>>>>>>>> re-sync >>>>> > >>>>>>>>> the index, how to update it if new series arrive etc. >>>>> Those are >>>>> > >>>>>>>>> certainly all good reasons why remote write as we know it >>>>> does not >>>>> > >>>>>>>>> transfer an index of series IDs. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> However, my point here is that we are now discussing >>>>> exactly those >>>>> > >>>>>>>>> problems when we talk about metadata transmission. Let's >>>>> solve >>>>> > >>>>>>>>> those >>>>> > >>>>>>>>> problems and apply them to remote write in general! >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Some thoughts about that: >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> Current remote write essentially transfers all labels for >>>>> _every_ >>>>> > >>>>>>>>> sample. This works reasonably well. Even if metadata blows >>>>> up the >>>>> > >>>>>>>>> data >>>>> > >>>>>>>>> size by 5x or 10x, transfering the whole index of metadata >>>>> and >>>>> > >>>>>>>>> labels >>>>> > >>>>>>>>> should remain feasible as long as we do it less frequently >>>>> than >>>>> > >>>>>>>>> once >>>>> > >>>>>>>>> every 10 samples. It's something that could be done each >>>>> time a >>>>> > >>>>>>>>> remote-write receiver connects. From then on, we "only" >>>>> have to >>>>> > >>>>>>>>> track >>>>> > >>>>>>>>> when new series (or series with new metadata) show up and >>>>> transfer >>>>> > >>>>>>>>> those. (I know it's not trivial, but we are already >>>>> discussing >>>>> > >>>>>>>>> possible solutions in the various design docs.) Whenever a >>>>> > >>>>>>>>> remote-write receiver gets out of sync for some reason, it >>>>> can >>>>> > >>>>>>>>> simply >>>>> > >>>>>>>>> cut the connection and start with a complete re-sync >>>>> again. As >>>>> > >>>>>>>>> long as >>>>> > >>>>>>>>> that doesn't happen more often than once every 10 samples, >>>>> we still >>>>> > >>>>>>>>> have a net gain. Combining this with sharding is another >>>>> challenge, >>>>> > >>>>>>>>> but it doesn't appear unsolveable. >>>>> > >>>>>>>>> >>>>> > >>>>>>>>> -- >>>>> > >>>>>>>>> Björn Rabenstein >>>>> > >>>>>>>>> [PGP-ID] 0x851C3DA17D748D03 >>>>> > >>>>>>>>> [email] [email protected] >>>>> > >>>>>>>>> >>>>> > >>>>>>>> -- >>>>> > >>>>>>>> You received this message because you are subscribed to the >>>>> Google >>>>> > >>>>>>>> Groups "Prometheus Developers" group. >>>>> > >>>>>>>> To unsubscribe from this group and stop receiving emails >>>>> from it, >>>>> > >>>>>>>> send an email to >>>>> [email protected] >>>>> > >>>>>>>> . >>>>> > >>>>>>>> To view this discussion on the web visit >>>>> > >>>>>>>> >>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com >>>>> > >>>>>>>> < >>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>> > >>>>> > >>>>>>>> . >>>>> > >>>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> -- >>>>> > >>>>>>> Brian Brazil >>>>> > >>>>>>> www.robustperception.io >>>>> > >>>>>>> >>>>> > >>>>>> >>>>> > >>>>> >>>>> > >>>>> -- >>>>> > >>>>> Brian Brazil >>>>> > >>>>> www.robustperception.io >>>>> > >>>>> >>>>> > >>>> -- >>>>> > >> You received this message because you are subscribed to the >>>>> Google Groups >>>>> > >> "Prometheus Developers" group. >>>>> > >> To unsubscribe from this group and stop receiving emails from it, >>>>> send an >>>>> > >> email to [email protected]. >>>>> > >> To view this discussion on the web visit >>>>> > >> >>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com >>>>> > >> < >>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>> > >>>>> > >> . >>>>> > >> >>>>> > > >>>>> > >>>>> > -- >>>>> > Brian Brazil >>>>> > www.robustperception.io >>>>> > >>>>> > -- >>>>> > You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Developers" group. >>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>> send an email to [email protected]. >>>>> > To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLouK0PKQMpmuWibEs3%3DDyrEXfN%2BbiUygfak4S_h0k30pw%40mail.gmail.com >>>>> . >>>>> >>>>> -- >>>>> Julien Pivotto >>>>> @roidelapluie >>>>> >>>> >>>> >>>> -- >>>> Brian Brazil >>>> www.robustperception.io >>>> >>> > > -- > Brian Brazil > www.robustperception.io > -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CABakzZYuBL-LXQ1swOnTTq7Sfuvmo1mosyX1%3DWV1fc3PxdV36w%40mail.gmail.com.

