On Wed, 19 Aug 2020 at 09:20, Rob Skillington <[email protected]> wrote:
> Here's the results from testing: > - node_exporter exporting 309 metrics each by turning on a lot of optional > collectors, all have help set, very few have unit set > - running 8 on the host at 1s scrape interval, each with unique instance > label > - steady state ~137kb/sec without this change > - steady state ~172kb/sec with this change > - roughly 30% increase > > Graph here: > https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976 > > How do we want to proceed? This could be fairly close to the higher end of > the spectrum in terms of expected increase given the node_exporter metrics > density and fairly verbose metadata. > > Even having said that however 30% is a fairly big increase and relatively > large > egress cost to have to swallow without any way to back out of this > behavior. > > What do folks think of next steps? > It is on the high end, however this is going to be among the worst cases as there's not going to be a lot of per-metric cardinality from the node exporter. I bet if you greatly increased the number of targets (and reduced the scrape interval to compensate) it'd be more reasonable. I think this is just about okay. Brian > > > On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington <[email protected]> > wrote: > >> Agreed - I'll see what I can do in getting some numbers for a workload >> collecting cAdvisor metrics, it seems to have a significant amount of >> HELP set: >> >> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics >> >> >> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil < >> [email protected]> wrote: >> >>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto <[email protected]> >>> wrote: >>> >>>> On 11 Aug 11:05, Brian Brazil wrote: >>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan <[email protected]> >>>> wrote: >>>> > >>>> > > I'm hesitant to add anything that significantly increases the >>>> network >>>> > > bandwidth usage or remote write while at the same time not giving >>>> users a >>>> > > way to tune the usage to their needs. >>>> > > >>>> > > I agree with Brian that we don't want the protocol itself to become >>>> > > stateful by introducing something like negotiation. I'd also prefer >>>> not to >>>> > > introduce multiple ways to do things, though I'm hoping we can find >>>> a way >>>> > > to accommodate your use case while not ballooning average users >>>> network >>>> > > egress bill. >>>> > > >>>> > > I am fine with forcing the consuming end to be somewhat stateful >>>> like in >>>> > > the case of Josh's PR where all metadata is sent periodically and >>>> must be >>>> > > stored by the remote storage system. >>>> > > >>>> > >>>> > >>>> > >>>> > > Overall I'd like to see some numbers regarding current network >>>> bandwidth >>>> > > of remote write, remote write with metadata via Josh's PR, and >>>> remote write >>>> > > with sending metadata for every series in a remote write payload. >>>> > > >>>> > >>>> > I agree, I noticed that in Rob's PR and had the same thought. >>>> >>>> Remote bandwidth are likely to affect only people using remote write. >>>> >>>> Getting a view on the on-disk size of the WAL would be great too, as >>>> that will affect everyone. >>>> >>> >>> I'm not worried about that, it's only really on series creation so won't >>> be noticed unless you have really high levels of churn. >>> >>> Brian >>> >>> >>>> >>>> > >>>> > Brian >>>> > >>>> > >>>> > > >>>> > > Rob, I'll review your PR tomorrow but it looks like Julien and >>>> Brian may >>>> > > already have that covered. >>>> > > >>>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington <[email protected] >>>> > >>>> > > wrote: >>>> > > >>>> > >> Update: The PR now sends the fields over remote write from the WAL >>>> and >>>> > >> metadata >>>> > >> is also updated in the WAL when any field changes. >>>> > >> >>>> > >> Now opened the PR against the primary repo: >>>> > >> https://github.com/prometheus/prometheus/pull/7771 >>>> > >> >>>> > >> I have tested this end-to-end with a modified M3 branch: >>>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata >>>> > >> > {... "msg":"received >>>> > >> series","labels":"{__name__="prometheus_rule_group_... >>>> > >> > >>>> iterations_total",instance="localhost:9090",job="prometheus01",role=... >>>> > >> > "remote"}","type":"counter","unit":"","help":"The total number of >>>> > >> scheduled... >>>> > >> > rule group evaluations, whether executed or missed."} >>>> > >> >>>> > >> Tests still haven't been updated. Please any feedback on the >>>> approach / >>>> > >> data structures would be greatly appreciated. >>>> > >> >>>> > >> Would be good to know what others thoughts are on next steps. >>>> > >> >>>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington < >>>> [email protected]> >>>> > >> wrote: >>>> > >> >>>> > >>> Here's a draft PR that builds that propagates metadata to the WAL >>>> and >>>> > >>> the WAL >>>> > >>> reader can read it back: >>>> > >>> https://github.com/robskillington/prometheus/pull/1/files >>>> > >>> >>>> > >>> Would like a little bit of feedback before on the datatypes and >>>> > >>> structure going >>>> > >>> further if folks are open to that. >>>> > >>> >>>> > >>> There's a few things not happening: >>>> > >>> - Remote write queue manager does not use or send these extra >>>> fields yet. >>>> > >>> - Head does not reset the "metadata" slice (not sure where >>>> "series" >>>> > >>> slice is >>>> > >>> reset in the head for pending series writes to WAL, want to do >>>> in same >>>> > >>> place). >>>> > >>> - Metadata is not re-written on change yet. >>>> > >>> - Tests. >>>> > >>> >>>> > >>> >>>> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington < >>>> [email protected]> >>>> > >>> wrote: >>>> > >>> >>>> > >>>> Sounds good, I've updated the proposal with details on places in >>>> which >>>> > >>>> changes >>>> > >>>> are required given the new approach: >>>> > >>>> >>>> > >>>> >>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit# >>>> > >>>> >>>> > >>>> >>>> > >>>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil < >>>> > >>>> [email protected]> wrote: >>>> > >>>> >>>> > >>>>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington < >>>> [email protected]> >>>> > >>>>> wrote: >>>> > >>>>> >>>> > >>>>>> True - I mean this could also be a blacklist by config >>>> perhaps, so if >>>> > >>>>>> you >>>> > >>>>>> really don't want to have increased egress you can optionally >>>> turn >>>> > >>>>>> off sending >>>> > >>>>>> the TYPE, HELP, UNIT or send them at different frequencies via >>>> > >>>>>> config. We could >>>> > >>>>>> package some sensible defaults so folks don't need to update >>>> their >>>> > >>>>>> config. >>>> > >>>>>> >>>> > >>>>>> The main intention is to enable these added features and make >>>> it >>>> > >>>>>> possible for >>>> > >>>>>> various consumers to be able to adjust some of these >>>> parameters if >>>> > >>>>>> required >>>> > >>>>>> since backends can be so different in their implementation. >>>> For M3 I >>>> > >>>>>> would be >>>> > >>>>>> totally fine with the extra egress that should be mitigated >>>> fairly >>>> > >>>>>> considerably >>>> > >>>>>> by Snappy and the fact that HELP is common across certain >>>> metric >>>> > >>>>>> families and >>>> > >>>>>> receiving it every single Remote Write request. >>>> > >>>>>> >>>> > >>>>> >>>> > >>>>> That's really a micro-optimisation. If you are that worried >>>> about >>>> > >>>>> bandwidth you'd run a sidecar specific to your remote backend >>>> that was >>>> > >>>>> stateful and far more efficient overall. Sending the full label >>>> names and >>>> > >>>>> values on every request is going to be far more than the >>>> overhead of >>>> > >>>>> metadata on top of that, so I don't see a need as it stands for >>>> any of this >>>> > >>>>> to be configurable. >>>> > >>>>> >>>> > >>>>> Brian >>>> > >>>>> >>>> > >>>>> >>>> > >>>>>> >>>> > >>>>>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil < >>>> > >>>>>> [email protected]> wrote: >>>> > >>>>>> >>>> > >>>>>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington < >>>> [email protected]> >>>> > >>>>>>> wrote: >>>> > >>>>>>> >>>> > >>>>>>>> Hey Björn, >>>> > >>>>>>>> >>>> > >>>>>>>> >>>> > >>>>>>>> Thanks for the detailed response. I've had a few back and >>>> forths on >>>> > >>>>>>>> this with >>>> > >>>>>>>> Brian and Chris over IRC and CNCF Slack now too. >>>> > >>>>>>>> >>>> > >>>>>>>> I agree that fundamentally it seems naive to idealistically >>>> model >>>> > >>>>>>>> this around >>>> > >>>>>>>> per metric name. It needs to be per series given what may >>>> happen >>>> > >>>>>>>> w.r.t. >>>> > >>>>>>>> collision across targets, etc. >>>> > >>>>>>>> >>>> > >>>>>>>> Perhaps we can separate these discussions apart into two >>>> > >>>>>>>> considerations: >>>> > >>>>>>>> >>>> > >>>>>>>> 1) Modeling of the data such that it is kept around for >>>> > >>>>>>>> transmission (primarily >>>> > >>>>>>>> we're focused on WAL here). >>>> > >>>>>>>> >>>> > >>>>>>>> 2) Transmission (and of which you allude to has many areas >>>> for >>>> > >>>>>>>> improvement). >>>> > >>>>>>>> >>>> > >>>>>>>> For (1) - it seems like this needs to be done per time >>>> series, >>>> > >>>>>>>> thankfully we >>>> > >>>>>>>> actually already have modeled this to be stored per series >>>> data >>>> > >>>>>>>> just once in a >>>> > >>>>>>>> single WAL file. I will write up my proposal here, but it >>>> will >>>> > >>>>>>>> surmount to >>>> > >>>>>>>> essentially encoding the HELP, UNIT and TYPE to the WAL per >>>> series >>>> > >>>>>>>> similar to >>>> > >>>>>>>> how labels for a series are encoded once per series in the >>>> WAL. >>>> > >>>>>>>> Since this >>>> > >>>>>>>> optimization is in place, there's already a huge dampening >>>> effect >>>> > >>>>>>>> on how >>>> > >>>>>>>> expensive it is to write out data about a series (e.g. >>>> labels). We >>>> > >>>>>>>> can always >>>> > >>>>>>>> go and collect a sample WAL file and measure how much extra >>>> size >>>> > >>>>>>>> with/without >>>> > >>>>>>>> HELP, UNIT and TYPE this would add, but it seems like it >>>> won't >>>> > >>>>>>>> fundamentally >>>> > >>>>>>>> change the order of magnitude in terms of "information about >>>> a >>>> > >>>>>>>> timeseries >>>> > >>>>>>>> storage size" vs "datapoints about a timeseries storage >>>> size". One >>>> > >>>>>>>> extra change >>>> > >>>>>>>> would be re-encoding the series into the WAL if the HELP >>>> changed >>>> > >>>>>>>> for that >>>> > >>>>>>>> series, just so that when HELP does change it can be up to >>>> date >>>> > >>>>>>>> from the view >>>> > >>>>>>>> of whoever is reading the WAL (i.e. the Remote Write loop). >>>> Since >>>> > >>>>>>>> this entry >>>> > >>>>>>>> needs to be loaded into memory for Remote Write today >>>> anyway, with >>>> > >>>>>>>> string >>>> > >>>>>>>> interning as suggested by Chris, it won't change the memory >>>> profile >>>> > >>>>>>>> algorithmically of a Prometheus with Remote Write enabled. >>>> There >>>> > >>>>>>>> will be some >>>> > >>>>>>>> overhead that at most would likely be similar to the label >>>> data, >>>> > >>>>>>>> but we aren't >>>> > >>>>>>>> altering data structures (so won't change big-O magnitude of >>>> memory >>>> > >>>>>>>> being used), >>>> > >>>>>>>> we're adding fields to existing data structures that exist >>>> and >>>> > >>>>>>>> string interning >>>> > >>>>>>>> should actually make it much less onerous since there is a >>>> large >>>> > >>>>>>>> duplicative >>>> > >>>>>>>> effect with HELP among time series. >>>> > >>>>>>>> >>>> > >>>>>>>> For (2) - now we have basically TYPE, HELP and UNIT all >>>> available >>>> > >>>>>>>> for >>>> > >>>>>>>> transmission if we wanted to send it with every single >>>> datapoint. >>>> > >>>>>>>> While I think >>>> > >>>>>>>> we should definitely examine HPACK like compression features >>>> as you >>>> > >>>>>>>> mentioned >>>> > >>>>>>>> Björn, I think we should think more about separating that >>>> kind of >>>> > >>>>>>>> work into a >>>> > >>>>>>>> Milestone 2 where this is considered. >>>> > >>>>>>>> >>>> > >>>>>>> >>>> > >>>>>>> >>>> > >>>>>>> >>>> > >>>>>>>> For the time being it's very plausible >>>> > >>>>>>>> we could do some negotiation of the receiving Remote Write >>>> endpoint >>>> > >>>>>>>> by sending >>>> > >>>>>>>> a "GET" to the remote write endpoint and seeing if it >>>> responds with >>>> > >>>>>>>> a >>>> > >>>>>>>> "capabilities + preferences" response, and if the endpoint >>>> > >>>>>>>> specifies that it >>>> > >>>>>>>> would like to receive metadata all the time on every single >>>> request >>>> > >>>>>>>> and let >>>> > >>>>>>>> Snappy take care of keeping size not ballooning too much, or >>>> if it >>>> > >>>>>>>> would like >>>> > >>>>>>>> TYPE on every single datapoint, and HELP and UNIT every >>>> > >>>>>>>> DESIRED_SECONDS or so. >>>> > >>>>>>>> To enable a "send HELP every 10 minutes" feature we would >>>> have to >>>> > >>>>>>>> add to the >>>> > >>>>>>>> datastructure that holds the LABELS, TYPE, HELP and UNIT for >>>> each >>>> > >>>>>>>> series a >>>> > >>>>>>>> "last sent" timestamp to know when to resend to that >>>> backend, but >>>> > >>>>>>>> that seems >>>> > >>>>>>>> entirely plausible and would not use more than 4 extra bytes. >>>> > >>>>>>>> >>>> > >>>>>>> >>>> > >>>>>>> Negotiation is fundamentally stateful, as the process that >>>> receives >>>> > >>>>>>> the first request may be a very different one from the one >>>> that receives >>>> > >>>>>>> the second - such as if an upgrade is in progress. Remote >>>> write is intended >>>> > >>>>>>> to be a very simple thing that's easy to implement on the >>>> receiver end and >>>> > >>>>>>> is a send-only request-based protocol, so request-time >>>> negotiation is >>>> > >>>>>>> basically out. Any negotiation needs to happen via the config >>>> file, and >>>> > >>>>>>> even then it'd be better if nothing ever needed to be >>>> configured. Getting >>>> > >>>>>>> all the users of a remote write to change their config file >>>> or restart all >>>> > >>>>>>> their Prometheus servers is not an easy task after all. >>>> > >>>>>>> >>>> > >>>>>>> Brian >>>> > >>>>>>> >>>> > >>>>>>> >>>> > >>>>>>>> >>>> > >>>>>>>> These thoughts are based on the discussion I've had and the >>>> > >>>>>>>> thoughts on this >>>> > >>>>>>>> thread. What's the feedback on this before I go ahead and >>>> > >>>>>>>> re-iterate the design >>>> > >>>>>>>> to more closely map to what I'm suggesting here? >>>> > >>>>>>>> >>>> > >>>>>>>> Best, >>>> > >>>>>>>> Rob >>>> > >>>>>>>> >>>> > >>>>>>>> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein < >>>> > >>>>>>>> [email protected]> wrote: >>>> > >>>>>>>> >>>> > >>>>>>>>> On 03.08.20 03:04, Rob Skillington wrote: >>>> > >>>>>>>>> > Ok - I have a proposal which could be broken up into two >>>> pieces, >>>> > >>>>>>>>> first >>>> > >>>>>>>>> > delivering TYPE per datapoint, the second consistently and >>>> > >>>>>>>>> reliably HELP and >>>> > >>>>>>>>> > UNIT once per unique metric name: >>>> > >>>>>>>>> > >>>> > >>>>>>>>> >>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo >>>> > >>>>>>>>> > /edit#heading=h.bik9uwphqy3g >>>> > >>>>>>>>> >>>> > >>>>>>>>> Thanks for the doc. I have commented on it, but while doing >>>> so, I >>>> > >>>>>>>>> felt >>>> > >>>>>>>>> the urge to comment more generally, which would not fit >>>> well into >>>> > >>>>>>>>> the >>>> > >>>>>>>>> margin of a Google doc. My thoughts are also a bit out of >>>> scope of >>>> > >>>>>>>>> Rob's design doc and more about the general topic of remote >>>> write >>>> > >>>>>>>>> and >>>> > >>>>>>>>> the equally general topic of metadata (about which we have >>>> an >>>> > >>>>>>>>> ongoing >>>> > >>>>>>>>> discussion among the Prometheus developers). >>>> > >>>>>>>>> >>>> > >>>>>>>>> Disclaimer: I don't know the remote-write protocol very >>>> well. My >>>> > >>>>>>>>> hope >>>> > >>>>>>>>> here is that my somewhat distant perspective is of some >>>> value as it >>>> > >>>>>>>>> allows to take a step back. However, I might just miss >>>> crucial >>>> > >>>>>>>>> details >>>> > >>>>>>>>> that completely invalidate my thoughts. We'll see... >>>> > >>>>>>>>> >>>> > >>>>>>>>> I do care a lot about metadata, though. (And ironically, >>>> the reason >>>> > >>>>>>>>> why I declared remote write "somebody else's problem" is >>>> that I've >>>> > >>>>>>>>> always disliked how it fundamentally ignores metadata.) >>>> > >>>>>>>>> >>>> > >>>>>>>>> Rob's document embraces the fact that metadata can change >>>> over >>>> > >>>>>>>>> time, >>>> > >>>>>>>>> but it assumes that at any given time, there is only one >>>> set of >>>> > >>>>>>>>> metadata per unique metric name. It takes into account that >>>> there >>>> > >>>>>>>>> can >>>> > >>>>>>>>> be drift, but it considers them an irregularity that will >>>> only >>>> > >>>>>>>>> happen >>>> > >>>>>>>>> occasionally and iron out over time. >>>> > >>>>>>>>> >>>> > >>>>>>>>> In practice, however, metadata can be legitimately and >>>> deliberately >>>> > >>>>>>>>> different for different time series of the same name. >>>> > >>>>>>>>> Instrumentation >>>> > >>>>>>>>> libraries and even the exposition format inherently require >>>> one >>>> > >>>>>>>>> set of >>>> > >>>>>>>>> metadata per metric name, but this is all only enforced >>>> (and meant >>>> > >>>>>>>>> to >>>> > >>>>>>>>> be enforced) _per target_. Once the samples are ingested >>>> (or even >>>> > >>>>>>>>> sent >>>> > >>>>>>>>> onwards via remote write), they have no notion of what >>>> target they >>>> > >>>>>>>>> came from. Furthermore, samples created by rule evaluation >>>> don't >>>> > >>>>>>>>> have >>>> > >>>>>>>>> an originating target in the first place. (Which raises the >>>> > >>>>>>>>> question >>>> > >>>>>>>>> of metadata for recording rules, which is another can of >>>> worms I'd >>>> > >>>>>>>>> like to open eventually...) >>>> > >>>>>>>>> >>>> > >>>>>>>>> (There is also the technical difficulty that the WAL has no >>>> notion >>>> > >>>>>>>>> of >>>> > >>>>>>>>> bundling or referencing all the series with the same metric >>>> name. >>>> > >>>>>>>>> That >>>> > >>>>>>>>> was commented about in the doc but is not my focus here.) >>>> > >>>>>>>>> >>>> > >>>>>>>>> Rob's doc sees TYPE as special because it is so cheap to >>>> just add >>>> > >>>>>>>>> to >>>> > >>>>>>>>> every data point. That's correct, but it's giving me an >>>> itch: >>>> > >>>>>>>>> Should >>>> > >>>>>>>>> we really create different ways of handling metadata, >>>> depending on >>>> > >>>>>>>>> its >>>> > >>>>>>>>> expected size? >>>> > >>>>>>>>> >>>> > >>>>>>>>> Compare this with labels. There is no upper limit to their >>>> number >>>> > >>>>>>>>> or >>>> > >>>>>>>>> size. Still, we have no plan of treating "large" labels >>>> differently >>>> > >>>>>>>>> from "short" labels. >>>> > >>>>>>>>> >>>> > >>>>>>>>> On top of that, we have by now gained the insight that >>>> metadata is >>>> > >>>>>>>>> changing over time and essentially has to be tracked per >>>> series. >>>> > >>>>>>>>> >>>> > >>>>>>>>> Or in other words: From a pure storage perspective, metadata >>>> > >>>>>>>>> behaves >>>> > >>>>>>>>> exactly the same as labels! (There are certainly huge >>>> differences >>>> > >>>>>>>>> semantically, but those only manifest themselves on the >>>> query >>>> > >>>>>>>>> level, >>>> > >>>>>>>>> i.e. how you treat it in PromQL etc.) >>>> > >>>>>>>>> >>>> > >>>>>>>>> (This is not exactly a new insight. This is more or less >>>> what I >>>> > >>>>>>>>> said >>>> > >>>>>>>>> during the 2016 dev summit, when we first discussed remote >>>> write. >>>> > >>>>>>>>> But >>>> > >>>>>>>>> I don't want to dwell on "told you so" moments... :o) >>>> > >>>>>>>>> >>>> > >>>>>>>>> There is a good reason why we don't just add metadata as >>>> "pseudo >>>> > >>>>>>>>> labels": As discussed a lot in the various design docs >>>> including >>>> > >>>>>>>>> Rob's >>>> > >>>>>>>>> one, it would blow up the data size significantly because >>>> HELP >>>> > >>>>>>>>> strings >>>> > >>>>>>>>> tend to be relatively long. >>>> > >>>>>>>>> >>>> > >>>>>>>>> And that's the point where I would like to take a step >>>> back: We are >>>> > >>>>>>>>> discussing to essentially treat something that is >>>> structurally the >>>> > >>>>>>>>> same thing in three different ways: Way 1 for labels as we >>>> know >>>> > >>>>>>>>> them. Way 2 for "small" metadata. Way 3 for "big" metadata. >>>> > >>>>>>>>> >>>> > >>>>>>>>> However, while labels tend to be shorter than HELP strings, >>>> there >>>> > >>>>>>>>> is >>>> > >>>>>>>>> the occasional use case with long or many labels. >>>> (Infamously, at >>>> > >>>>>>>>> SoundCloud, a binary accidentally put a whole HTML page >>>> into a >>>> > >>>>>>>>> label. That wasn't a use case, it was a bug, but the >>>> Prometheus >>>> > >>>>>>>>> server >>>> > >>>>>>>>> ingesting that was just chugging along as if nothing >>>> special had >>>> > >>>>>>>>> happened. It looked weird in the expression browser, >>>> though...) I'm >>>> > >>>>>>>>> sure any vendor offering Prometheus remote storage as a >>>> service >>>> > >>>>>>>>> will >>>> > >>>>>>>>> have a customer or two that use excessively long label >>>> names. If we >>>> > >>>>>>>>> have to deal with that, why not bite the bullet and treat >>>> metadata >>>> > >>>>>>>>> in >>>> > >>>>>>>>> the same way as labels in general? Or to phrase it in >>>> another way: >>>> > >>>>>>>>> Any >>>> > >>>>>>>>> solution for "big" metadata could be used for labels, too, >>>> to >>>> > >>>>>>>>> alleviate the pain with excessively long label names. >>>> > >>>>>>>>> >>>> > >>>>>>>>> Or most succintly: A robust and really good solution for >>>> > >>>>>>>>> "big" metadata in remote write will make remote write much >>>> more >>>> > >>>>>>>>> efficient if applied to labels, too. >>>> > >>>>>>>>> >>>> > >>>>>>>>> Imagine an NALSD tech interview question that boils down to >>>> "design >>>> > >>>>>>>>> Prometheus remote write". I bet that most of the better >>>> candidates >>>> > >>>>>>>>> will recognize that most of the payload will consist of >>>> series >>>> > >>>>>>>>> indentifiers (call them labels or whatever) and they will >>>> suggest >>>> > >>>>>>>>> to >>>> > >>>>>>>>> first transmit some kind of index and from then on only >>>> transmit >>>> > >>>>>>>>> short >>>> > >>>>>>>>> series IDs. The best candidates will then find out about >>>> all the >>>> > >>>>>>>>> problems with that: How to keep the protocol stateless, how >>>> to >>>> > >>>>>>>>> re-sync >>>> > >>>>>>>>> the index, how to update it if new series arrive etc. Those >>>> are >>>> > >>>>>>>>> certainly all good reasons why remote write as we know it >>>> does not >>>> > >>>>>>>>> transfer an index of series IDs. >>>> > >>>>>>>>> >>>> > >>>>>>>>> However, my point here is that we are now discussing >>>> exactly those >>>> > >>>>>>>>> problems when we talk about metadata transmission. Let's >>>> solve >>>> > >>>>>>>>> those >>>> > >>>>>>>>> problems and apply them to remote write in general! >>>> > >>>>>>>>> >>>> > >>>>>>>>> Some thoughts about that: >>>> > >>>>>>>>> >>>> > >>>>>>>>> Current remote write essentially transfers all labels for >>>> _every_ >>>> > >>>>>>>>> sample. This works reasonably well. Even if metadata blows >>>> up the >>>> > >>>>>>>>> data >>>> > >>>>>>>>> size by 5x or 10x, transfering the whole index of metadata >>>> and >>>> > >>>>>>>>> labels >>>> > >>>>>>>>> should remain feasible as long as we do it less frequently >>>> than >>>> > >>>>>>>>> once >>>> > >>>>>>>>> every 10 samples. It's something that could be done each >>>> time a >>>> > >>>>>>>>> remote-write receiver connects. From then on, we "only" >>>> have to >>>> > >>>>>>>>> track >>>> > >>>>>>>>> when new series (or series with new metadata) show up and >>>> transfer >>>> > >>>>>>>>> those. (I know it's not trivial, but we are already >>>> discussing >>>> > >>>>>>>>> possible solutions in the various design docs.) Whenever a >>>> > >>>>>>>>> remote-write receiver gets out of sync for some reason, it >>>> can >>>> > >>>>>>>>> simply >>>> > >>>>>>>>> cut the connection and start with a complete re-sync again. >>>> As >>>> > >>>>>>>>> long as >>>> > >>>>>>>>> that doesn't happen more often than once every 10 samples, >>>> we still >>>> > >>>>>>>>> have a net gain. Combining this with sharding is another >>>> challenge, >>>> > >>>>>>>>> but it doesn't appear unsolveable. >>>> > >>>>>>>>> >>>> > >>>>>>>>> -- >>>> > >>>>>>>>> Björn Rabenstein >>>> > >>>>>>>>> [PGP-ID] 0x851C3DA17D748D03 >>>> > >>>>>>>>> [email] [email protected] >>>> > >>>>>>>>> >>>> > >>>>>>>> -- >>>> > >>>>>>>> You received this message because you are subscribed to the >>>> Google >>>> > >>>>>>>> Groups "Prometheus Developers" group. >>>> > >>>>>>>> To unsubscribe from this group and stop receiving emails >>>> from it, >>>> > >>>>>>>> send an email to >>>> [email protected] >>>> > >>>>>>>> . >>>> > >>>>>>>> To view this discussion on the web visit >>>> > >>>>>>>> >>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com >>>> > >>>>>>>> < >>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com?utm_medium=email&utm_source=footer >>>> > >>>> > >>>>>>>> . >>>> > >>>>>>>> >>>> > >>>>>>> >>>> > >>>>>>> >>>> > >>>>>>> -- >>>> > >>>>>>> Brian Brazil >>>> > >>>>>>> www.robustperception.io >>>> > >>>>>>> >>>> > >>>>>> >>>> > >>>>> >>>> > >>>>> -- >>>> > >>>>> Brian Brazil >>>> > >>>>> www.robustperception.io >>>> > >>>>> >>>> > >>>> -- >>>> > >> You received this message because you are subscribed to the Google >>>> Groups >>>> > >> "Prometheus Developers" group. >>>> > >> To unsubscribe from this group and stop receiving emails from it, >>>> send an >>>> > >> email to [email protected]. >>>> > >> To view this discussion on the web visit >>>> > >> >>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com >>>> > >> < >>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com?utm_medium=email&utm_source=footer >>>> > >>>> > >> . >>>> > >> >>>> > > >>>> > >>>> > -- >>>> > Brian Brazil >>>> > www.robustperception.io >>>> > >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> Groups "Prometheus Developers" group. >>>> > To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [email protected]. >>>> > To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLouK0PKQMpmuWibEs3%3DDyrEXfN%2BbiUygfak4S_h0k30pw%40mail.gmail.com >>>> . >>>> >>>> -- >>>> Julien Pivotto >>>> @roidelapluie >>>> >>> >>> >>> -- >>> Brian Brazil >>> www.robustperception.io >>> >> -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLp2EVK2rBkJytUAaSbqC02cBJv_Crjter8eRx76pZUM_Q%40mail.gmail.com.

