Re: [prometheus-developers] Re: Remote Write Metadata propagation

Rob Skillington Wed, 19 Aug 2020 01:47:48 -0700

To add a bit more detail to that example, I was actually using a
fairly tuned
remote write queue config that sent large batches since the batch send
deadline
was set to 1 minute longer with a max samples per send of 5,000. Here's
that
config:
```
remote_write:
  - url: http://localhost:3030/remote/write
    remote_timeout: 30s
    queue_config:
      capacity: 10000
      max_shards: 10
      min_shards: 3
      max_samples_per_send: 5000
      batch_send_deadline: 1m
      min_backoff: 50ms
      max_backoff: 1s
```


Using the default config we get worse utilization for both before/after
numbers
but the delta/difference is less:
- steady state ~177kb/sec without this change
- steady state ~210kb/sec with this change
- roughly 20% increase

Using config:
```
remote_write:
  - url: http://localhost:3030/remote/write
    remote_timeout: 30s
```

Implicitly the values for this config is:
- min shards 1
- max shards 1000
- max samples per send 100
- capacity 500
- batch send deadline 5s
- min backoff 30ms
- max backoff 100ms

On Wed, Aug 19, 2020 at 4:26 AM Brian Brazil <
[email protected]> wrote:

> On Wed, 19 Aug 2020 at 09:20, Rob Skillington <[email protected]> wrote:
>
>> Here's the results from testing:
>> - node_exporter exporting 309 metrics each by turning on a lot of
>> optional
>>   collectors, all have help set, very few have unit set
>> - running 8 on the host at 1s scrape interval, each with unique instance
>> label
>> - steady state ~137kb/sec without this change
>> - steady state ~172kb/sec with this change
>> - roughly 30% increase
>>
>> Graph here:
>> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976
>>
>> How do we want to proceed? This could be fairly close to the higher end of
>> the spectrum in terms of expected increase given the node_exporter
>> metrics
>> density and fairly verbose metadata.
>>
>> Even having said that however 30% is a fairly big increase and relatively
>> large
>> egress cost to have to swallow without any way to back out of this
>> behavior.
>>
>> What do folks think of next steps?
>>
>
> It is on the high end, however this is going to be among the worst cases
> as there's not going to be a lot of per-metric cardinality from the node
> exporter. I bet if you greatly increased the number of targets (and reduced
> the scrape interval to compensate) it'd be more reasonable. I think this is
> just about okay.
>
> Brian
>
>
>>
>>
>> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington <[email protected]>
>> wrote:
>>
>>> Agreed - I'll see what I can do in getting some numbers for a workload
>>> collecting cAdvisor metrics, it seems to have a significant amount of
>>> HELP set:
>>>
>>> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics
>>>
>>>
>>> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
>>> [email protected]> wrote:
>>>
>>>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto <
>>>> [email protected]> wrote:
>>>>
>>>>> On 11 Aug 11:05, Brian Brazil wrote:
>>>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > > I'm hesitant to add anything that significantly increases the
>>>>> network
>>>>> > > bandwidth usage or remote write while at the same time not giving
>>>>> users a
>>>>> > > way to tune the usage to their needs.
>>>>> > >
>>>>> > > I agree with Brian that we don't want the protocol itself to become
>>>>> > > stateful by introducing something like negotiation. I'd also
>>>>> prefer not to
>>>>> > > introduce multiple ways to do things, though I'm hoping we can
>>>>> find a way
>>>>> > > to accommodate your use case while not ballooning average users
>>>>> network
>>>>> > > egress bill.
>>>>> > >
>>>>> > > I am fine with forcing the consuming end to be somewhat stateful
>>>>> like in
>>>>> > > the case of Josh's PR where all metadata is sent periodically and
>>>>> must be
>>>>> > > stored by the remote storage system.
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > > Overall I'd like to see some numbers regarding current network
>>>>> bandwidth
>>>>> > > of remote write, remote write with metadata via Josh's PR, and
>>>>> remote write
>>>>> > > with sending metadata for every series in a remote write payload.
>>>>> > >
>>>>> >
>>>>> > I agree, I noticed that in Rob's PR and had the same thought.
>>>>>
>>>>> Remote bandwidth are likely to affect only people using remote write.
>>>>>
>>>>> Getting a view on the on-disk size of the WAL would be great too, as
>>>>> that will affect everyone.
>>>>>
>>>>
>>>> I'm not worried about that, it's only really on series creation so
>>>> won't be noticed unless you have really high levels of churn.
>>>>
>>>> Brian
>>>>
>>>>
>>>>>
>>>>> >
>>>>> > Brian
>>>>> >
>>>>> >
>>>>> > >
>>>>> > > Rob, I'll review your PR tomorrow but it looks like Julien and
>>>>> Brian may
>>>>> > > already have that covered.
>>>>> > >
>>>>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington <
>>>>> [email protected]>
>>>>> > > wrote:
>>>>> > >
>>>>> > >> Update: The PR now sends the fields over remote write from the
>>>>> WAL and
>>>>> > >> metadata
>>>>> > >> is also updated in the WAL when any field changes.
>>>>> > >>
>>>>> > >> Now opened the PR against the primary repo:
>>>>> > >> https://github.com/prometheus/prometheus/pull/7771
>>>>> > >>
>>>>> > >> I have tested this end-to-end with a modified M3 branch:
>>>>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
>>>>> > >> > {... "msg":"received
>>>>> > >> series","labels":"{__name__="prometheus_rule_group_...
>>>>> > >> >
>>>>> iterations_total",instance="localhost:9090",job="prometheus01",role=...
>>>>> > >> > "remote"}","type":"counter","unit":"","help":"The total number
>>>>> of
>>>>> > >> scheduled...
>>>>> > >> > rule group evaluations, whether executed or missed."}
>>>>> > >>
>>>>> > >> Tests still haven't been updated. Please any feedback on the
>>>>> approach /
>>>>> > >> data structures would be greatly appreciated.
>>>>> > >>
>>>>> > >> Would be good to know what others thoughts are on next steps.
>>>>> > >>
>>>>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington <
>>>>> [email protected]>
>>>>> > >> wrote:
>>>>> > >>
>>>>> > >>> Here's a draft PR that builds that propagates metadata to the
>>>>> WAL and
>>>>> > >>> the WAL
>>>>> > >>> reader can read it back:
>>>>> > >>> https://github.com/robskillington/prometheus/pull/1/files
>>>>> > >>>
>>>>> > >>> Would like a little bit of feedback before on the datatypes and
>>>>> > >>> structure going
>>>>> > >>> further if folks are open to that.
>>>>> > >>>
>>>>> > >>> There's a few things not happening:
>>>>> > >>> - Remote write queue manager does not use or send these extra
>>>>> fields yet.
>>>>> > >>> - Head does not reset the "metadata" slice (not sure where
>>>>> "series"
>>>>> > >>> slice is
>>>>> > >>>   reset in the head for pending series writes to WAL, want to do
>>>>> in same
>>>>> > >>> place).
>>>>> > >>> - Metadata is not re-written on change yet.
>>>>> > >>> - Tests.
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington <
>>>>> [email protected]>
>>>>> > >>> wrote:
>>>>> > >>>
>>>>> > >>>> Sounds good, I've updated the proposal with details on places
>>>>> in which
>>>>> > >>>> changes
>>>>> > >>>> are required given the new approach:
>>>>> > >>>>
>>>>> > >>>>
>>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
>>>>> > >>>> [email protected]> wrote:
>>>>> > >>>>
>>>>> > >>>>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington <
>>>>> [email protected]>
>>>>> > >>>>> wrote:
>>>>> > >>>>>
>>>>> > >>>>>> True - I mean this could also be a blacklist by config
>>>>> perhaps, so if
>>>>> > >>>>>> you
>>>>> > >>>>>> really don't want to have increased egress you can optionally
>>>>> turn
>>>>> > >>>>>> off sending
>>>>> > >>>>>> the TYPE, HELP, UNIT or send them at different frequencies via
>>>>> > >>>>>> config. We could
>>>>> > >>>>>> package some sensible defaults so folks don't need to update
>>>>> their
>>>>> > >>>>>> config.
>>>>> > >>>>>>
>>>>> > >>>>>> The main intention is to enable these added features and make
>>>>> it
>>>>> > >>>>>> possible for
>>>>> > >>>>>> various consumers to be able to adjust some of these
>>>>> parameters if
>>>>> > >>>>>> required
>>>>> > >>>>>> since backends can be so different in their implementation.
>>>>> For M3 I
>>>>> > >>>>>> would be
>>>>> > >>>>>> totally fine with the extra egress that should be mitigated
>>>>> fairly
>>>>> > >>>>>> considerably
>>>>> > >>>>>> by Snappy and the fact that HELP is common across certain
>>>>> metric
>>>>> > >>>>>> families and
>>>>> > >>>>>> receiving it every single Remote Write request.
>>>>> > >>>>>>
>>>>> > >>>>>
>>>>> > >>>>> That's really a micro-optimisation. If you are that worried
>>>>> about
>>>>> > >>>>> bandwidth you'd run a sidecar specific to your remote backend
>>>>> that was
>>>>> > >>>>> stateful and far more efficient overall. Sending the full
>>>>> label names and
>>>>> > >>>>> values on every request is going to be far more than the
>>>>> overhead of
>>>>> > >>>>> metadata on top of that, so I don't see a need as it stands
>>>>> for any of this
>>>>> > >>>>> to be configurable.
>>>>> > >>>>>
>>>>> > >>>>> Brian
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
>>>>> > >>>>>> [email protected]> wrote:
>>>>> > >>>>>>
>>>>> > >>>>>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington <
>>>>> [email protected]>
>>>>> > >>>>>>> wrote:
>>>>> > >>>>>>>
>>>>> > >>>>>>>> Hey Björn,
>>>>> > >>>>>>>>
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> Thanks for the detailed response. I've had a few back and
>>>>> forths on
>>>>> > >>>>>>>> this with
>>>>> > >>>>>>>> Brian and Chris over IRC and CNCF Slack now too.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> I agree that fundamentally it seems naive to idealistically
>>>>> model
>>>>> > >>>>>>>> this around
>>>>> > >>>>>>>> per metric name. It needs to be per series given what may
>>>>> happen
>>>>> > >>>>>>>> w.r.t.
>>>>> > >>>>>>>> collision across targets, etc.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> Perhaps we can separate these discussions apart into two
>>>>> > >>>>>>>> considerations:
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> 1) Modeling of the data such that it is kept around for
>>>>> > >>>>>>>> transmission (primarily
>>>>> > >>>>>>>> we're focused on WAL here).
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> 2) Transmission (and of which you allude to has many areas
>>>>> for
>>>>> > >>>>>>>> improvement).
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> For (1) - it seems like this needs to be done per time
>>>>> series,
>>>>> > >>>>>>>> thankfully we
>>>>> > >>>>>>>> actually already have modeled this to be stored per series
>>>>> data
>>>>> > >>>>>>>> just once in a
>>>>> > >>>>>>>> single WAL file. I will write up my proposal here, but it
>>>>> will
>>>>> > >>>>>>>> surmount to
>>>>> > >>>>>>>> essentially encoding the HELP, UNIT and TYPE to the WAL per
>>>>> series
>>>>> > >>>>>>>> similar to
>>>>> > >>>>>>>> how labels for a series are encoded once per series in the
>>>>> WAL.
>>>>> > >>>>>>>> Since this
>>>>> > >>>>>>>> optimization is in place, there's already a huge dampening
>>>>> effect
>>>>> > >>>>>>>> on how
>>>>> > >>>>>>>> expensive it is to write out data about a series (e.g.
>>>>> labels). We
>>>>> > >>>>>>>> can always
>>>>> > >>>>>>>> go and collect a sample WAL file and measure how much extra
>>>>> size
>>>>> > >>>>>>>> with/without
>>>>> > >>>>>>>> HELP, UNIT and TYPE this would add, but it seems like it
>>>>> won't
>>>>> > >>>>>>>> fundamentally
>>>>> > >>>>>>>> change the order of magnitude in terms of "information
>>>>> about a
>>>>> > >>>>>>>> timeseries
>>>>> > >>>>>>>> storage size" vs "datapoints about a timeseries storage
>>>>> size". One
>>>>> > >>>>>>>> extra change
>>>>> > >>>>>>>> would be re-encoding the series into the WAL if the HELP
>>>>> changed
>>>>> > >>>>>>>> for that
>>>>> > >>>>>>>> series, just so that when HELP does change it can be up to
>>>>> date
>>>>> > >>>>>>>> from the view
>>>>> > >>>>>>>> of whoever is reading the WAL (i.e. the Remote Write loop).
>>>>> Since
>>>>> > >>>>>>>> this entry
>>>>> > >>>>>>>> needs to be loaded into memory for Remote Write today
>>>>> anyway, with
>>>>> > >>>>>>>> string
>>>>> > >>>>>>>> interning as suggested by Chris, it won't change the memory
>>>>> profile
>>>>> > >>>>>>>> algorithmically of a Prometheus with Remote Write enabled.
>>>>> There
>>>>> > >>>>>>>> will be some
>>>>> > >>>>>>>> overhead that at most would likely be similar to the label
>>>>> data,
>>>>> > >>>>>>>> but we aren't
>>>>> > >>>>>>>> altering data structures (so won't change big-O magnitude
>>>>> of memory
>>>>> > >>>>>>>> being used),
>>>>> > >>>>>>>> we're adding fields to existing data structures that exist
>>>>> and
>>>>> > >>>>>>>> string interning
>>>>> > >>>>>>>> should actually make it much less onerous since there is a
>>>>> large
>>>>> > >>>>>>>> duplicative
>>>>> > >>>>>>>> effect with HELP among time series.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> For (2) - now we have basically TYPE, HELP and UNIT all
>>>>> available
>>>>> > >>>>>>>> for
>>>>> > >>>>>>>> transmission if we wanted to send it with every single
>>>>> datapoint.
>>>>> > >>>>>>>> While I think
>>>>> > >>>>>>>> we should definitely examine HPACK like compression
>>>>> features as you
>>>>> > >>>>>>>> mentioned
>>>>> > >>>>>>>> Björn, I think we should think more about separating that
>>>>> kind of
>>>>> > >>>>>>>> work into a
>>>>> > >>>>>>>> Milestone 2 where this is considered.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>> For the time being it's very plausible
>>>>> > >>>>>>>> we could do some negotiation of the receiving Remote Write
>>>>> endpoint
>>>>> > >>>>>>>> by sending
>>>>> > >>>>>>>> a "GET" to the remote write endpoint and seeing if it
>>>>> responds with
>>>>> > >>>>>>>> a
>>>>> > >>>>>>>> "capabilities + preferences" response, and if the endpoint
>>>>> > >>>>>>>> specifies that it
>>>>> > >>>>>>>> would like to receive metadata all the time on every single
>>>>> request
>>>>> > >>>>>>>> and let
>>>>> > >>>>>>>> Snappy take care of keeping size not ballooning too much,
>>>>> or if it
>>>>> > >>>>>>>> would like
>>>>> > >>>>>>>> TYPE on every single datapoint, and HELP and UNIT every
>>>>> > >>>>>>>> DESIRED_SECONDS or so.
>>>>> > >>>>>>>> To enable a "send HELP every 10 minutes" feature we would
>>>>> have to
>>>>> > >>>>>>>> add to the
>>>>> > >>>>>>>> datastructure that holds the LABELS, TYPE, HELP and UNIT
>>>>> for each
>>>>> > >>>>>>>> series a
>>>>> > >>>>>>>> "last sent" timestamp to know when to resend to that
>>>>> backend, but
>>>>> > >>>>>>>> that seems
>>>>> > >>>>>>>> entirely plausible and would not use more than 4 extra
>>>>> bytes.
>>>>> > >>>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>> Negotiation is fundamentally stateful, as the process that
>>>>> receives
>>>>> > >>>>>>> the first request may be a very different one from the one
>>>>> that receives
>>>>> > >>>>>>> the second - such as if an upgrade is in progress. Remote
>>>>> write is intended
>>>>> > >>>>>>> to be a very simple thing that's easy to implement on the
>>>>> receiver end and
>>>>> > >>>>>>> is a send-only request-based protocol, so request-time
>>>>> negotiation is
>>>>> > >>>>>>> basically out. Any negotiation needs to happen via the
>>>>> config file, and
>>>>> > >>>>>>> even then it'd be better if nothing ever needed to be
>>>>> configured. Getting
>>>>> > >>>>>>> all the users of a remote write to change their config file
>>>>> or restart all
>>>>> > >>>>>>> their Prometheus servers is not an easy task after all.
>>>>> > >>>>>>>
>>>>> > >>>>>>> Brian
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> These thoughts are based on the discussion I've had and the
>>>>> > >>>>>>>> thoughts on this
>>>>> > >>>>>>>> thread. What's the feedback on this before I go ahead and
>>>>> > >>>>>>>> re-iterate the design
>>>>> > >>>>>>>> to more closely map to what I'm suggesting here?
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> Best,
>>>>> > >>>>>>>> Rob
>>>>> > >>>>>>>>
>>>>> > >>>>>>>> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein <
>>>>> > >>>>>>>> [email protected]> wrote:
>>>>> > >>>>>>>>
>>>>> > >>>>>>>>> On 03.08.20 03:04, Rob Skillington wrote:
>>>>> > >>>>>>>>> > Ok - I have a proposal which could be broken up into two
>>>>> pieces,
>>>>> > >>>>>>>>> first
>>>>> > >>>>>>>>> > delivering TYPE per datapoint, the second consistently
>>>>> and
>>>>> > >>>>>>>>> reliably HELP and
>>>>> > >>>>>>>>> > UNIT once per unique metric name:
>>>>> > >>>>>>>>> >
>>>>> > >>>>>>>>>
>>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo
>>>>> > >>>>>>>>> > /edit#heading=h.bik9uwphqy3g
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Thanks for the doc. I have commented on it, but while
>>>>> doing so, I
>>>>> > >>>>>>>>> felt
>>>>> > >>>>>>>>> the urge to comment more generally, which would not fit
>>>>> well into
>>>>> > >>>>>>>>> the
>>>>> > >>>>>>>>> margin of a Google doc. My thoughts are also a bit out of
>>>>> scope of
>>>>> > >>>>>>>>> Rob's design doc and more about the general topic of
>>>>> remote write
>>>>> > >>>>>>>>> and
>>>>> > >>>>>>>>> the equally general topic of metadata (about which we have
>>>>> an
>>>>> > >>>>>>>>> ongoing
>>>>> > >>>>>>>>> discussion among the Prometheus developers).
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Disclaimer: I don't know the remote-write protocol very
>>>>> well. My
>>>>> > >>>>>>>>> hope
>>>>> > >>>>>>>>> here is that my somewhat distant perspective is of some
>>>>> value as it
>>>>> > >>>>>>>>> allows to take a step back. However, I might just miss
>>>>> crucial
>>>>> > >>>>>>>>> details
>>>>> > >>>>>>>>> that completely invalidate my thoughts. We'll see...
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> I do care a lot about metadata, though. (And ironically,
>>>>> the reason
>>>>> > >>>>>>>>> why I declared remote write "somebody else's problem" is
>>>>> that I've
>>>>> > >>>>>>>>> always disliked how it fundamentally ignores metadata.)
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Rob's document embraces the fact that metadata can change
>>>>> over
>>>>> > >>>>>>>>> time,
>>>>> > >>>>>>>>> but it assumes that at any given time, there is only one
>>>>> set of
>>>>> > >>>>>>>>> metadata per unique metric name. It takes into account
>>>>> that there
>>>>> > >>>>>>>>> can
>>>>> > >>>>>>>>> be drift, but it considers them an irregularity that will
>>>>> only
>>>>> > >>>>>>>>> happen
>>>>> > >>>>>>>>> occasionally and iron out over time.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> In practice, however, metadata can be legitimately and
>>>>> deliberately
>>>>> > >>>>>>>>> different for different time series of the same name.
>>>>> > >>>>>>>>> Instrumentation
>>>>> > >>>>>>>>> libraries and even the exposition format inherently
>>>>> require one
>>>>> > >>>>>>>>> set of
>>>>> > >>>>>>>>> metadata per metric name, but this is all only enforced
>>>>> (and meant
>>>>> > >>>>>>>>> to
>>>>> > >>>>>>>>> be enforced) _per target_. Once the samples are ingested
>>>>> (or even
>>>>> > >>>>>>>>> sent
>>>>> > >>>>>>>>> onwards via remote write), they have no notion of what
>>>>> target they
>>>>> > >>>>>>>>> came from. Furthermore, samples created by rule evaluation
>>>>> don't
>>>>> > >>>>>>>>> have
>>>>> > >>>>>>>>> an originating target in the first place. (Which raises the
>>>>> > >>>>>>>>> question
>>>>> > >>>>>>>>> of metadata for recording rules, which is another can of
>>>>> worms I'd
>>>>> > >>>>>>>>> like to open eventually...)
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> (There is also the technical difficulty that the WAL has
>>>>> no notion
>>>>> > >>>>>>>>> of
>>>>> > >>>>>>>>> bundling or referencing all the series with the same
>>>>> metric name.
>>>>> > >>>>>>>>> That
>>>>> > >>>>>>>>> was commented about in the doc but is not my focus here.)
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Rob's doc sees TYPE as special because it is so cheap to
>>>>> just add
>>>>> > >>>>>>>>> to
>>>>> > >>>>>>>>> every data point. That's correct, but it's giving me an
>>>>> itch:
>>>>> > >>>>>>>>> Should
>>>>> > >>>>>>>>> we really create different ways of handling metadata,
>>>>> depending on
>>>>> > >>>>>>>>> its
>>>>> > >>>>>>>>> expected size?
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Compare this with labels. There is no upper limit to their
>>>>> number
>>>>> > >>>>>>>>> or
>>>>> > >>>>>>>>> size. Still, we have no plan of treating "large" labels
>>>>> differently
>>>>> > >>>>>>>>> from "short" labels.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> On top of that, we have by now gained the insight that
>>>>> metadata is
>>>>> > >>>>>>>>> changing over time and essentially has to be tracked per
>>>>> series.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Or in other words: From a pure storage perspective,
>>>>> metadata
>>>>> > >>>>>>>>> behaves
>>>>> > >>>>>>>>> exactly the same as labels! (There are certainly huge
>>>>> differences
>>>>> > >>>>>>>>> semantically, but those only manifest themselves on the
>>>>> query
>>>>> > >>>>>>>>> level,
>>>>> > >>>>>>>>> i.e. how you treat it in PromQL etc.)
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> (This is not exactly a new insight. This is more or less
>>>>> what I
>>>>> > >>>>>>>>> said
>>>>> > >>>>>>>>> during the 2016 dev summit, when we first discussed remote
>>>>> write.
>>>>> > >>>>>>>>> But
>>>>> > >>>>>>>>> I don't want to dwell on "told you so" moments... :o)
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> There is a good reason why we don't just add metadata as
>>>>> "pseudo
>>>>> > >>>>>>>>> labels": As discussed a lot in the various design docs
>>>>> including
>>>>> > >>>>>>>>> Rob's
>>>>> > >>>>>>>>> one, it would blow up the data size significantly because
>>>>> HELP
>>>>> > >>>>>>>>> strings
>>>>> > >>>>>>>>> tend to be relatively long.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> And that's the point where I would like to take a step
>>>>> back: We are
>>>>> > >>>>>>>>> discussing to essentially treat something that is
>>>>> structurally the
>>>>> > >>>>>>>>> same thing in three different ways: Way 1 for labels as we
>>>>> know
>>>>> > >>>>>>>>> them. Way 2 for "small" metadata. Way 3 for "big" metadata.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> However, while labels tend to be shorter than HELP
>>>>> strings, there
>>>>> > >>>>>>>>> is
>>>>> > >>>>>>>>> the occasional use case with long or many labels.
>>>>> (Infamously, at
>>>>> > >>>>>>>>> SoundCloud, a binary accidentally put a whole HTML page
>>>>> into a
>>>>> > >>>>>>>>> label. That wasn't a use case, it was a bug, but the
>>>>> Prometheus
>>>>> > >>>>>>>>> server
>>>>> > >>>>>>>>> ingesting that was just chugging along as if nothing
>>>>> special had
>>>>> > >>>>>>>>> happened. It looked weird in the expression browser,
>>>>> though...) I'm
>>>>> > >>>>>>>>> sure any vendor offering Prometheus remote storage as a
>>>>> service
>>>>> > >>>>>>>>> will
>>>>> > >>>>>>>>> have a customer or two that use excessively long label
>>>>> names. If we
>>>>> > >>>>>>>>> have to deal with that, why not bite the bullet and treat
>>>>> metadata
>>>>> > >>>>>>>>> in
>>>>> > >>>>>>>>> the same way as labels in general? Or to phrase it in
>>>>> another way:
>>>>> > >>>>>>>>> Any
>>>>> > >>>>>>>>> solution for "big" metadata could be used for labels, too,
>>>>> to
>>>>> > >>>>>>>>> alleviate the pain with excessively long label names.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Or most succintly: A robust and really good solution for
>>>>> > >>>>>>>>> "big" metadata in remote write will make remote write much
>>>>> more
>>>>> > >>>>>>>>> efficient if applied to labels, too.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Imagine an NALSD tech interview question that boils down
>>>>> to "design
>>>>> > >>>>>>>>> Prometheus remote write". I bet that most of the better
>>>>> candidates
>>>>> > >>>>>>>>> will recognize that most of the payload will consist of
>>>>> series
>>>>> > >>>>>>>>> indentifiers (call them labels or whatever) and they will
>>>>> suggest
>>>>> > >>>>>>>>> to
>>>>> > >>>>>>>>> first transmit some kind of index and from then on only
>>>>> transmit
>>>>> > >>>>>>>>> short
>>>>> > >>>>>>>>> series IDs. The best candidates will then find out about
>>>>> all the
>>>>> > >>>>>>>>> problems with that: How to keep the protocol stateless,
>>>>> how to
>>>>> > >>>>>>>>> re-sync
>>>>> > >>>>>>>>> the index, how to update it if new series arrive etc.
>>>>> Those are
>>>>> > >>>>>>>>> certainly all good reasons why remote write as we know it
>>>>> does not
>>>>> > >>>>>>>>> transfer an index of series IDs.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> However, my point here is that we are now discussing
>>>>> exactly those
>>>>> > >>>>>>>>> problems when we talk about metadata transmission. Let's
>>>>> solve
>>>>> > >>>>>>>>> those
>>>>> > >>>>>>>>> problems and apply them to remote write in general!
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Some thoughts about that:
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> Current remote write essentially transfers all labels for
>>>>> _every_
>>>>> > >>>>>>>>> sample. This works reasonably well. Even if metadata blows
>>>>> up the
>>>>> > >>>>>>>>> data
>>>>> > >>>>>>>>> size by 5x or 10x, transfering the whole index of metadata
>>>>> and
>>>>> > >>>>>>>>> labels
>>>>> > >>>>>>>>> should remain feasible as long as we do it less frequently
>>>>> than
>>>>> > >>>>>>>>> once
>>>>> > >>>>>>>>> every 10 samples. It's something that could be done each
>>>>> time a
>>>>> > >>>>>>>>> remote-write receiver connects. From then on, we "only"
>>>>> have to
>>>>> > >>>>>>>>> track
>>>>> > >>>>>>>>> when new series (or series with new metadata) show up and
>>>>> transfer
>>>>> > >>>>>>>>> those. (I know it's not trivial, but we are already
>>>>> discussing
>>>>> > >>>>>>>>> possible solutions in the various design docs.) Whenever a
>>>>> > >>>>>>>>> remote-write receiver gets out of sync for some reason, it
>>>>> can
>>>>> > >>>>>>>>> simply
>>>>> > >>>>>>>>> cut the connection and start with a complete re-sync
>>>>> again. As
>>>>> > >>>>>>>>> long as
>>>>> > >>>>>>>>> that doesn't happen more often than once every 10 samples,
>>>>> we still
>>>>> > >>>>>>>>> have a net gain. Combining this with sharding is another
>>>>> challenge,
>>>>> > >>>>>>>>> but it doesn't appear unsolveable.
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>>> --
>>>>> > >>>>>>>>> Björn Rabenstein
>>>>> > >>>>>>>>> [PGP-ID] 0x851C3DA17D748D03
>>>>> > >>>>>>>>> [email] [email protected]
>>>>> > >>>>>>>>>
>>>>> > >>>>>>>> --
>>>>> > >>>>>>>> You received this message because you are subscribed to the
>>>>> Google
>>>>> > >>>>>>>> Groups "Prometheus Developers" group.
>>>>> > >>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>> from it,
>>>>> > >>>>>>>> send an email to
>>>>> [email protected]
>>>>> > >>>>>>>> .
>>>>> > >>>>>>>> To view this discussion on the web visit
>>>>> > >>>>>>>>
>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com
>>>>> > >>>>>>>> <
>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>>>>> >
>>>>> > >>>>>>>> .
>>>>> > >>>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>>
>>>>> > >>>>>>> --
>>>>> > >>>>>>> Brian Brazil
>>>>> > >>>>>>> www.robustperception.io
>>>>> > >>>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>
>>>>> > >>>>> --
>>>>> > >>>>> Brian Brazil
>>>>> > >>>>> www.robustperception.io
>>>>> > >>>>>
>>>>> > >>>> --
>>>>> > >> You received this message because you are subscribed to the
>>>>> Google Groups
>>>>> > >> "Prometheus Developers" group.
>>>>> > >> To unsubscribe from this group and stop receiving emails from it,
>>>>> send an
>>>>> > >> email to [email protected].
>>>>> > >> To view this discussion on the web visit
>>>>> > >>
>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com
>>>>> > >> <
>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com?utm_medium=email&utm_source=footer
>>>>> >
>>>>> > >> .
>>>>> > >>
>>>>> > >
>>>>> >
>>>>> > --
>>>>> > Brian Brazil
>>>>> > www.robustperception.io
>>>>> >
>>>>> > --
>>>>> > You received this message because you are subscribed to the Google
>>>>> Groups "Prometheus Developers" group.
>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to [email protected].
>>>>> > To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLouK0PKQMpmuWibEs3%3DDyrEXfN%2BbiUygfak4S_h0k30pw%40mail.gmail.com
>>>>> .
>>>>>
>>>>> --
>>>>> Julien Pivotto
>>>>> @roidelapluie
>>>>>
>>>>
>>>>
>>>> --
>>>> Brian Brazil
>>>> www.robustperception.io
>>>>
>>>
>
> --
> Brian Brazil
> www.robustperception.io
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CABakzZYuBL-LXQ1swOnTTq7Sfuvmo1mosyX1%3DWV1fc3PxdV36w%40mail.gmail.com.

Re: [prometheus-developers] Re: Remote Write Metadata propagation

Reply via email to