Re: [prometheus-developers] Re: Remote Write Metadata propagation

Brian Brazil Wed, 19 Aug 2020 01:27:54 -0700

On Wed, 19 Aug 2020 at 09:20, Rob Skillington <[email protected]> wrote:


> Here's the results from testing:
> - node_exporter exporting 309 metrics each by turning on a lot of optional
>   collectors, all have help set, very few have unit set
> - running 8 on the host at 1s scrape interval, each with unique instance
> label
> - steady state ~137kb/sec without this change
> - steady state ~172kb/sec with this change
> - roughly 30% increase
>
> Graph here:
> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976
>
> How do we want to proceed? This could be fairly close to the higher end of
> the spectrum in terms of expected increase given the node_exporter metrics
> density and fairly verbose metadata.
>
> Even having said that however 30% is a fairly big increase and relatively
> large
> egress cost to have to swallow without any way to back out of this
> behavior.
>
> What do folks think of next steps?
>

It is on the high end, however this is going to be among the worst cases as
there's not going to be a lot of per-metric cardinality from the node
exporter. I bet if you greatly increased the number of targets (and reduced
the scrape interval to compensate) it'd be more reasonable. I think this is
just about okay.

Brian


>
>
> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington <[email protected]>
> wrote:
>
>> Agreed - I'll see what I can do in getting some numbers for a workload
>> collecting cAdvisor metrics, it seems to have a significant amount of
>> HELP set:
>>
>> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics
>>
>>
>> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
>> [email protected]> wrote:
>>
>>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto <[email protected]>
>>> wrote:
>>>
>>>> On 11 Aug 11:05, Brian Brazil wrote:
>>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan <[email protected]>
>>>> wrote:
>>>> >
>>>> > > I'm hesitant to add anything that significantly increases the
>>>> network
>>>> > > bandwidth usage or remote write while at the same time not giving
>>>> users a
>>>> > > way to tune the usage to their needs.
>>>> > >
>>>> > > I agree with Brian that we don't want the protocol itself to become
>>>> > > stateful by introducing something like negotiation. I'd also prefer
>>>> not to
>>>> > > introduce multiple ways to do things, though I'm hoping we can find
>>>> a way
>>>> > > to accommodate your use case while not ballooning average users
>>>> network
>>>> > > egress bill.
>>>> > >
>>>> > > I am fine with forcing the consuming end to be somewhat stateful
>>>> like in
>>>> > > the case of Josh's PR where all metadata is sent periodically and
>>>> must be
>>>> > > stored by the remote storage system.
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > > Overall I'd like to see some numbers regarding current network
>>>> bandwidth
>>>> > > of remote write, remote write with metadata via Josh's PR, and
>>>> remote write
>>>> > > with sending metadata for every series in a remote write payload.
>>>> > >
>>>> >
>>>> > I agree, I noticed that in Rob's PR and had the same thought.
>>>>
>>>> Remote bandwidth are likely to affect only people using remote write.
>>>>
>>>> Getting a view on the on-disk size of the WAL would be great too, as
>>>> that will affect everyone.
>>>>
>>>
>>> I'm not worried about that, it's only really on series creation so won't
>>> be noticed unless you have really high levels of churn.
>>>
>>> Brian
>>>
>>>
>>>>
>>>> >
>>>> > Brian
>>>> >
>>>> >
>>>> > >
>>>> > > Rob, I'll review your PR tomorrow but it looks like Julien and
>>>> Brian may
>>>> > > already have that covered.
>>>> > >
>>>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington <[email protected]
>>>> >
>>>> > > wrote:
>>>> > >
>>>> > >> Update: The PR now sends the fields over remote write from the WAL
>>>> and
>>>> > >> metadata
>>>> > >> is also updated in the WAL when any field changes.
>>>> > >>
>>>> > >> Now opened the PR against the primary repo:
>>>> > >> https://github.com/prometheus/prometheus/pull/7771
>>>> > >>
>>>> > >> I have tested this end-to-end with a modified M3 branch:
>>>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
>>>> > >> > {... "msg":"received
>>>> > >> series","labels":"{__name__="prometheus_rule_group_...
>>>> > >> >
>>>> iterations_total",instance="localhost:9090",job="prometheus01",role=...
>>>> > >> > "remote"}","type":"counter","unit":"","help":"The total number of
>>>> > >> scheduled...
>>>> > >> > rule group evaluations, whether executed or missed."}
>>>> > >>
>>>> > >> Tests still haven't been updated. Please any feedback on the
>>>> approach /
>>>> > >> data structures would be greatly appreciated.
>>>> > >>
>>>> > >> Would be good to know what others thoughts are on next steps.
>>>> > >>
>>>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington <
>>>> [email protected]>
>>>> > >> wrote:
>>>> > >>
>>>> > >>> Here's a draft PR that builds that propagates metadata to the WAL
>>>> and
>>>> > >>> the WAL
>>>> > >>> reader can read it back:
>>>> > >>> https://github.com/robskillington/prometheus/pull/1/files
>>>> > >>>
>>>> > >>> Would like a little bit of feedback before on the datatypes and
>>>> > >>> structure going
>>>> > >>> further if folks are open to that.
>>>> > >>>
>>>> > >>> There's a few things not happening:
>>>> > >>> - Remote write queue manager does not use or send these extra
>>>> fields yet.
>>>> > >>> - Head does not reset the "metadata" slice (not sure where
>>>> "series"
>>>> > >>> slice is
>>>> > >>>   reset in the head for pending series writes to WAL, want to do
>>>> in same
>>>> > >>> place).
>>>> > >>> - Metadata is not re-written on change yet.
>>>> > >>> - Tests.
>>>> > >>>
>>>> > >>>
>>>> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington <
>>>> [email protected]>
>>>> > >>> wrote:
>>>> > >>>
>>>> > >>>> Sounds good, I've updated the proposal with details on places in
>>>> which
>>>> > >>>> changes
>>>> > >>>> are required given the new approach:
>>>> > >>>>
>>>> > >>>>
>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
>>>> > >>>> [email protected]> wrote:
>>>> > >>>>
>>>> > >>>>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington <
>>>> [email protected]>
>>>> > >>>>> wrote:
>>>> > >>>>>
>>>> > >>>>>> True - I mean this could also be a blacklist by config
>>>> perhaps, so if
>>>> > >>>>>> you
>>>> > >>>>>> really don't want to have increased egress you can optionally
>>>> turn
>>>> > >>>>>> off sending
>>>> > >>>>>> the TYPE, HELP, UNIT or send them at different frequencies via
>>>> > >>>>>> config. We could
>>>> > >>>>>> package some sensible defaults so folks don't need to update
>>>> their
>>>> > >>>>>> config.
>>>> > >>>>>>
>>>> > >>>>>> The main intention is to enable these added features and make
>>>> it
>>>> > >>>>>> possible for
>>>> > >>>>>> various consumers to be able to adjust some of these
>>>> parameters if
>>>> > >>>>>> required
>>>> > >>>>>> since backends can be so different in their implementation.
>>>> For M3 I
>>>> > >>>>>> would be
>>>> > >>>>>> totally fine with the extra egress that should be mitigated
>>>> fairly
>>>> > >>>>>> considerably
>>>> > >>>>>> by Snappy and the fact that HELP is common across certain
>>>> metric
>>>> > >>>>>> families and
>>>> > >>>>>> receiving it every single Remote Write request.
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>> That's really a micro-optimisation. If you are that worried
>>>> about
>>>> > >>>>> bandwidth you'd run a sidecar specific to your remote backend
>>>> that was
>>>> > >>>>> stateful and far more efficient overall. Sending the full label
>>>> names and
>>>> > >>>>> values on every request is going to be far more than the
>>>> overhead of
>>>> > >>>>> metadata on top of that, so I don't see a need as it stands for
>>>> any of this
>>>> > >>>>> to be configurable.
>>>> > >>>>>
>>>> > >>>>> Brian
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>>>
>>>> > >>>>>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
>>>> > >>>>>> [email protected]> wrote:
>>>> > >>>>>>
>>>> > >>>>>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington <
>>>> [email protected]>
>>>> > >>>>>>> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>>> Hey Björn,
>>>> > >>>>>>>>
>>>> > >>>>>>>>
>>>> > >>>>>>>> Thanks for the detailed response. I've had a few back and
>>>> forths on
>>>> > >>>>>>>> this with
>>>> > >>>>>>>> Brian and Chris over IRC and CNCF Slack now too.
>>>> > >>>>>>>>
>>>> > >>>>>>>> I agree that fundamentally it seems naive to idealistically
>>>> model
>>>> > >>>>>>>> this around
>>>> > >>>>>>>> per metric name. It needs to be per series given what may
>>>> happen
>>>> > >>>>>>>> w.r.t.
>>>> > >>>>>>>> collision across targets, etc.
>>>> > >>>>>>>>
>>>> > >>>>>>>> Perhaps we can separate these discussions apart into two
>>>> > >>>>>>>> considerations:
>>>> > >>>>>>>>
>>>> > >>>>>>>> 1) Modeling of the data such that it is kept around for
>>>> > >>>>>>>> transmission (primarily
>>>> > >>>>>>>> we're focused on WAL here).
>>>> > >>>>>>>>
>>>> > >>>>>>>> 2) Transmission (and of which you allude to has many areas
>>>> for
>>>> > >>>>>>>> improvement).
>>>> > >>>>>>>>
>>>> > >>>>>>>> For (1) - it seems like this needs to be done per time
>>>> series,
>>>> > >>>>>>>> thankfully we
>>>> > >>>>>>>> actually already have modeled this to be stored per series
>>>> data
>>>> > >>>>>>>> just once in a
>>>> > >>>>>>>> single WAL file. I will write up my proposal here, but it
>>>> will
>>>> > >>>>>>>> surmount to
>>>> > >>>>>>>> essentially encoding the HELP, UNIT and TYPE to the WAL per
>>>> series
>>>> > >>>>>>>> similar to
>>>> > >>>>>>>> how labels for a series are encoded once per series in the
>>>> WAL.
>>>> > >>>>>>>> Since this
>>>> > >>>>>>>> optimization is in place, there's already a huge dampening
>>>> effect
>>>> > >>>>>>>> on how
>>>> > >>>>>>>> expensive it is to write out data about a series (e.g.
>>>> labels). We
>>>> > >>>>>>>> can always
>>>> > >>>>>>>> go and collect a sample WAL file and measure how much extra
>>>> size
>>>> > >>>>>>>> with/without
>>>> > >>>>>>>> HELP, UNIT and TYPE this would add, but it seems like it
>>>> won't
>>>> > >>>>>>>> fundamentally
>>>> > >>>>>>>> change the order of magnitude in terms of "information about
>>>> a
>>>> > >>>>>>>> timeseries
>>>> > >>>>>>>> storage size" vs "datapoints about a timeseries storage
>>>> size". One
>>>> > >>>>>>>> extra change
>>>> > >>>>>>>> would be re-encoding the series into the WAL if the HELP
>>>> changed
>>>> > >>>>>>>> for that
>>>> > >>>>>>>> series, just so that when HELP does change it can be up to
>>>> date
>>>> > >>>>>>>> from the view
>>>> > >>>>>>>> of whoever is reading the WAL (i.e. the Remote Write loop).
>>>> Since
>>>> > >>>>>>>> this entry
>>>> > >>>>>>>> needs to be loaded into memory for Remote Write today
>>>> anyway, with
>>>> > >>>>>>>> string
>>>> > >>>>>>>> interning as suggested by Chris, it won't change the memory
>>>> profile
>>>> > >>>>>>>> algorithmically of a Prometheus with Remote Write enabled.
>>>> There
>>>> > >>>>>>>> will be some
>>>> > >>>>>>>> overhead that at most would likely be similar to the label
>>>> data,
>>>> > >>>>>>>> but we aren't
>>>> > >>>>>>>> altering data structures (so won't change big-O magnitude of
>>>> memory
>>>> > >>>>>>>> being used),
>>>> > >>>>>>>> we're adding fields to existing data structures that exist
>>>> and
>>>> > >>>>>>>> string interning
>>>> > >>>>>>>> should actually make it much less onerous since there is a
>>>> large
>>>> > >>>>>>>> duplicative
>>>> > >>>>>>>> effect with HELP among time series.
>>>> > >>>>>>>>
>>>> > >>>>>>>> For (2) - now we have basically TYPE, HELP and UNIT all
>>>> available
>>>> > >>>>>>>> for
>>>> > >>>>>>>> transmission if we wanted to send it with every single
>>>> datapoint.
>>>> > >>>>>>>> While I think
>>>> > >>>>>>>> we should definitely examine HPACK like compression features
>>>> as you
>>>> > >>>>>>>> mentioned
>>>> > >>>>>>>> Björn, I think we should think more about separating that
>>>> kind of
>>>> > >>>>>>>> work into a
>>>> > >>>>>>>> Milestone 2 where this is considered.
>>>> > >>>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>> For the time being it's very plausible
>>>> > >>>>>>>> we could do some negotiation of the receiving Remote Write
>>>> endpoint
>>>> > >>>>>>>> by sending
>>>> > >>>>>>>> a "GET" to the remote write endpoint and seeing if it
>>>> responds with
>>>> > >>>>>>>> a
>>>> > >>>>>>>> "capabilities + preferences" response, and if the endpoint
>>>> > >>>>>>>> specifies that it
>>>> > >>>>>>>> would like to receive metadata all the time on every single
>>>> request
>>>> > >>>>>>>> and let
>>>> > >>>>>>>> Snappy take care of keeping size not ballooning too much, or
>>>> if it
>>>> > >>>>>>>> would like
>>>> > >>>>>>>> TYPE on every single datapoint, and HELP and UNIT every
>>>> > >>>>>>>> DESIRED_SECONDS or so.
>>>> > >>>>>>>> To enable a "send HELP every 10 minutes" feature we would
>>>> have to
>>>> > >>>>>>>> add to the
>>>> > >>>>>>>> datastructure that holds the LABELS, TYPE, HELP and UNIT for
>>>> each
>>>> > >>>>>>>> series a
>>>> > >>>>>>>> "last sent" timestamp to know when to resend to that
>>>> backend, but
>>>> > >>>>>>>> that seems
>>>> > >>>>>>>> entirely plausible and would not use more than 4 extra bytes.
>>>> > >>>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> Negotiation is fundamentally stateful, as the process that
>>>> receives
>>>> > >>>>>>> the first request may be a very different one from the one
>>>> that receives
>>>> > >>>>>>> the second - such as if an upgrade is in progress. Remote
>>>> write is intended
>>>> > >>>>>>> to be a very simple thing that's easy to implement on the
>>>> receiver end and
>>>> > >>>>>>> is a send-only request-based protocol, so request-time
>>>> negotiation is
>>>> > >>>>>>> basically out. Any negotiation needs to happen via the config
>>>> file, and
>>>> > >>>>>>> even then it'd be better if nothing ever needed to be
>>>> configured. Getting
>>>> > >>>>>>> all the users of a remote write to change their config file
>>>> or restart all
>>>> > >>>>>>> their Prometheus servers is not an easy task after all.
>>>> > >>>>>>>
>>>> > >>>>>>> Brian
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>>
>>>> > >>>>>>>> These thoughts are based on the discussion I've had and the
>>>> > >>>>>>>> thoughts on this
>>>> > >>>>>>>> thread. What's the feedback on this before I go ahead and
>>>> > >>>>>>>> re-iterate the design
>>>> > >>>>>>>> to more closely map to what I'm suggesting here?
>>>> > >>>>>>>>
>>>> > >>>>>>>> Best,
>>>> > >>>>>>>> Rob
>>>> > >>>>>>>>
>>>> > >>>>>>>> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein <
>>>> > >>>>>>>> [email protected]> wrote:
>>>> > >>>>>>>>
>>>> > >>>>>>>>> On 03.08.20 03:04, Rob Skillington wrote:
>>>> > >>>>>>>>> > Ok - I have a proposal which could be broken up into two
>>>> pieces,
>>>> > >>>>>>>>> first
>>>> > >>>>>>>>> > delivering TYPE per datapoint, the second consistently and
>>>> > >>>>>>>>> reliably HELP and
>>>> > >>>>>>>>> > UNIT once per unique metric name:
>>>> > >>>>>>>>> >
>>>> > >>>>>>>>>
>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo
>>>> > >>>>>>>>> > /edit#heading=h.bik9uwphqy3g
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Thanks for the doc. I have commented on it, but while doing
>>>> so, I
>>>> > >>>>>>>>> felt
>>>> > >>>>>>>>> the urge to comment more generally, which would not fit
>>>> well into
>>>> > >>>>>>>>> the
>>>> > >>>>>>>>> margin of a Google doc. My thoughts are also a bit out of
>>>> scope of
>>>> > >>>>>>>>> Rob's design doc and more about the general topic of remote
>>>> write
>>>> > >>>>>>>>> and
>>>> > >>>>>>>>> the equally general topic of metadata (about which we have
>>>> an
>>>> > >>>>>>>>> ongoing
>>>> > >>>>>>>>> discussion among the Prometheus developers).
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Disclaimer: I don't know the remote-write protocol very
>>>> well. My
>>>> > >>>>>>>>> hope
>>>> > >>>>>>>>> here is that my somewhat distant perspective is of some
>>>> value as it
>>>> > >>>>>>>>> allows to take a step back. However, I might just miss
>>>> crucial
>>>> > >>>>>>>>> details
>>>> > >>>>>>>>> that completely invalidate my thoughts. We'll see...
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> I do care a lot about metadata, though. (And ironically,
>>>> the reason
>>>> > >>>>>>>>> why I declared remote write "somebody else's problem" is
>>>> that I've
>>>> > >>>>>>>>> always disliked how it fundamentally ignores metadata.)
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Rob's document embraces the fact that metadata can change
>>>> over
>>>> > >>>>>>>>> time,
>>>> > >>>>>>>>> but it assumes that at any given time, there is only one
>>>> set of
>>>> > >>>>>>>>> metadata per unique metric name. It takes into account that
>>>> there
>>>> > >>>>>>>>> can
>>>> > >>>>>>>>> be drift, but it considers them an irregularity that will
>>>> only
>>>> > >>>>>>>>> happen
>>>> > >>>>>>>>> occasionally and iron out over time.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> In practice, however, metadata can be legitimately and
>>>> deliberately
>>>> > >>>>>>>>> different for different time series of the same name.
>>>> > >>>>>>>>> Instrumentation
>>>> > >>>>>>>>> libraries and even the exposition format inherently require
>>>> one
>>>> > >>>>>>>>> set of
>>>> > >>>>>>>>> metadata per metric name, but this is all only enforced
>>>> (and meant
>>>> > >>>>>>>>> to
>>>> > >>>>>>>>> be enforced) _per target_. Once the samples are ingested
>>>> (or even
>>>> > >>>>>>>>> sent
>>>> > >>>>>>>>> onwards via remote write), they have no notion of what
>>>> target they
>>>> > >>>>>>>>> came from. Furthermore, samples created by rule evaluation
>>>> don't
>>>> > >>>>>>>>> have
>>>> > >>>>>>>>> an originating target in the first place. (Which raises the
>>>> > >>>>>>>>> question
>>>> > >>>>>>>>> of metadata for recording rules, which is another can of
>>>> worms I'd
>>>> > >>>>>>>>> like to open eventually...)
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> (There is also the technical difficulty that the WAL has no
>>>> notion
>>>> > >>>>>>>>> of
>>>> > >>>>>>>>> bundling or referencing all the series with the same metric
>>>> name.
>>>> > >>>>>>>>> That
>>>> > >>>>>>>>> was commented about in the doc but is not my focus here.)
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Rob's doc sees TYPE as special because it is so cheap to
>>>> just add
>>>> > >>>>>>>>> to
>>>> > >>>>>>>>> every data point. That's correct, but it's giving me an
>>>> itch:
>>>> > >>>>>>>>> Should
>>>> > >>>>>>>>> we really create different ways of handling metadata,
>>>> depending on
>>>> > >>>>>>>>> its
>>>> > >>>>>>>>> expected size?
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Compare this with labels. There is no upper limit to their
>>>> number
>>>> > >>>>>>>>> or
>>>> > >>>>>>>>> size. Still, we have no plan of treating "large" labels
>>>> differently
>>>> > >>>>>>>>> from "short" labels.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> On top of that, we have by now gained the insight that
>>>> metadata is
>>>> > >>>>>>>>> changing over time and essentially has to be tracked per
>>>> series.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Or in other words: From a pure storage perspective, metadata
>>>> > >>>>>>>>> behaves
>>>> > >>>>>>>>> exactly the same as labels! (There are certainly huge
>>>> differences
>>>> > >>>>>>>>> semantically, but those only manifest themselves on the
>>>> query
>>>> > >>>>>>>>> level,
>>>> > >>>>>>>>> i.e. how you treat it in PromQL etc.)
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> (This is not exactly a new insight. This is more or less
>>>> what I
>>>> > >>>>>>>>> said
>>>> > >>>>>>>>> during the 2016 dev summit, when we first discussed remote
>>>> write.
>>>> > >>>>>>>>> But
>>>> > >>>>>>>>> I don't want to dwell on "told you so" moments... :o)
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> There is a good reason why we don't just add metadata as
>>>> "pseudo
>>>> > >>>>>>>>> labels": As discussed a lot in the various design docs
>>>> including
>>>> > >>>>>>>>> Rob's
>>>> > >>>>>>>>> one, it would blow up the data size significantly because
>>>> HELP
>>>> > >>>>>>>>> strings
>>>> > >>>>>>>>> tend to be relatively long.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> And that's the point where I would like to take a step
>>>> back: We are
>>>> > >>>>>>>>> discussing to essentially treat something that is
>>>> structurally the
>>>> > >>>>>>>>> same thing in three different ways: Way 1 for labels as we
>>>> know
>>>> > >>>>>>>>> them. Way 2 for "small" metadata. Way 3 for "big" metadata.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> However, while labels tend to be shorter than HELP strings,
>>>> there
>>>> > >>>>>>>>> is
>>>> > >>>>>>>>> the occasional use case with long or many labels.
>>>> (Infamously, at
>>>> > >>>>>>>>> SoundCloud, a binary accidentally put a whole HTML page
>>>> into a
>>>> > >>>>>>>>> label. That wasn't a use case, it was a bug, but the
>>>> Prometheus
>>>> > >>>>>>>>> server
>>>> > >>>>>>>>> ingesting that was just chugging along as if nothing
>>>> special had
>>>> > >>>>>>>>> happened. It looked weird in the expression browser,
>>>> though...) I'm
>>>> > >>>>>>>>> sure any vendor offering Prometheus remote storage as a
>>>> service
>>>> > >>>>>>>>> will
>>>> > >>>>>>>>> have a customer or two that use excessively long label
>>>> names. If we
>>>> > >>>>>>>>> have to deal with that, why not bite the bullet and treat
>>>> metadata
>>>> > >>>>>>>>> in
>>>> > >>>>>>>>> the same way as labels in general? Or to phrase it in
>>>> another way:
>>>> > >>>>>>>>> Any
>>>> > >>>>>>>>> solution for "big" metadata could be used for labels, too,
>>>> to
>>>> > >>>>>>>>> alleviate the pain with excessively long label names.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Or most succintly: A robust and really good solution for
>>>> > >>>>>>>>> "big" metadata in remote write will make remote write much
>>>> more
>>>> > >>>>>>>>> efficient if applied to labels, too.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Imagine an NALSD tech interview question that boils down to
>>>> "design
>>>> > >>>>>>>>> Prometheus remote write". I bet that most of the better
>>>> candidates
>>>> > >>>>>>>>> will recognize that most of the payload will consist of
>>>> series
>>>> > >>>>>>>>> indentifiers (call them labels or whatever) and they will
>>>> suggest
>>>> > >>>>>>>>> to
>>>> > >>>>>>>>> first transmit some kind of index and from then on only
>>>> transmit
>>>> > >>>>>>>>> short
>>>> > >>>>>>>>> series IDs. The best candidates will then find out about
>>>> all the
>>>> > >>>>>>>>> problems with that: How to keep the protocol stateless, how
>>>> to
>>>> > >>>>>>>>> re-sync
>>>> > >>>>>>>>> the index, how to update it if new series arrive etc. Those
>>>> are
>>>> > >>>>>>>>> certainly all good reasons why remote write as we know it
>>>> does not
>>>> > >>>>>>>>> transfer an index of series IDs.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> However, my point here is that we are now discussing
>>>> exactly those
>>>> > >>>>>>>>> problems when we talk about metadata transmission. Let's
>>>> solve
>>>> > >>>>>>>>> those
>>>> > >>>>>>>>> problems and apply them to remote write in general!
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Some thoughts about that:
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Current remote write essentially transfers all labels for
>>>> _every_
>>>> > >>>>>>>>> sample. This works reasonably well. Even if metadata blows
>>>> up the
>>>> > >>>>>>>>> data
>>>> > >>>>>>>>> size by 5x or 10x, transfering the whole index of metadata
>>>> and
>>>> > >>>>>>>>> labels
>>>> > >>>>>>>>> should remain feasible as long as we do it less frequently
>>>> than
>>>> > >>>>>>>>> once
>>>> > >>>>>>>>> every 10 samples. It's something that could be done each
>>>> time a
>>>> > >>>>>>>>> remote-write receiver connects. From then on, we "only"
>>>> have to
>>>> > >>>>>>>>> track
>>>> > >>>>>>>>> when new series (or series with new metadata) show up and
>>>> transfer
>>>> > >>>>>>>>> those. (I know it's not trivial, but we are already
>>>> discussing
>>>> > >>>>>>>>> possible solutions in the various design docs.) Whenever a
>>>> > >>>>>>>>> remote-write receiver gets out of sync for some reason, it
>>>> can
>>>> > >>>>>>>>> simply
>>>> > >>>>>>>>> cut the connection and start with a complete re-sync again.
>>>> As
>>>> > >>>>>>>>> long as
>>>> > >>>>>>>>> that doesn't happen more often than once every 10 samples,
>>>> we still
>>>> > >>>>>>>>> have a net gain. Combining this with sharding is another
>>>> challenge,
>>>> > >>>>>>>>> but it doesn't appear unsolveable.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> --
>>>> > >>>>>>>>> Björn Rabenstein
>>>> > >>>>>>>>> [PGP-ID] 0x851C3DA17D748D03
>>>> > >>>>>>>>> [email] [email protected]
>>>> > >>>>>>>>>
>>>> > >>>>>>>> --
>>>> > >>>>>>>> You received this message because you are subscribed to the
>>>> Google
>>>> > >>>>>>>> Groups "Prometheus Developers" group.
>>>> > >>>>>>>> To unsubscribe from this group and stop receiving emails
>>>> from it,
>>>> > >>>>>>>> send an email to
>>>> [email protected]
>>>> > >>>>>>>> .
>>>> > >>>>>>>> To view this discussion on the web visit
>>>> > >>>>>>>>
>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com
>>>> > >>>>>>>> <
>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>>>> >
>>>> > >>>>>>>> .
>>>> > >>>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> --
>>>> > >>>>>>> Brian Brazil
>>>> > >>>>>>> www.robustperception.io
>>>> > >>>>>>>
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>> --
>>>> > >>>>> Brian Brazil
>>>> > >>>>> www.robustperception.io
>>>> > >>>>>
>>>> > >>>> --
>>>> > >> You received this message because you are subscribed to the Google
>>>> Groups
>>>> > >> "Prometheus Developers" group.
>>>> > >> To unsubscribe from this group and stop receiving emails from it,
>>>> send an
>>>> > >> email to [email protected].
>>>> > >> To view this discussion on the web visit
>>>> > >>
>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com
>>>> > >> <
>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com?utm_medium=email&utm_source=footer
>>>> >
>>>> > >> .
>>>> > >>
>>>> > >
>>>> >
>>>> > --
>>>> > Brian Brazil
>>>> > www.robustperception.io
>>>> >
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> Groups "Prometheus Developers" group.
>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>> send an email to [email protected].
>>>> > To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLouK0PKQMpmuWibEs3%3DDyrEXfN%2BbiUygfak4S_h0k30pw%40mail.gmail.com
>>>> .
>>>>
>>>> --
>>>> Julien Pivotto
>>>> @roidelapluie
>>>>
>>>
>>>
>>> --
>>> Brian Brazil
>>> www.robustperception.io
>>>
>>

-- 
Brian Brazil
www.robustperception.io

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLp2EVK2rBkJytUAaSbqC02cBJv_Crjter8eRx76pZUM_Q%40mail.gmail.com.

Re: [prometheus-developers] Re: Remote Write Metadata propagation

Reply via email to