On Wed, 19 Aug 2020 at 09:47, Rob Skillington <[email protected]> wrote:
> To add a bit more detail to that example, I was actually using a > fairly tuned > remote write queue config that sent large batches since the batch send > deadline > was set to 1 minute longer with a max samples per send of 5,000. Here's > that > config: > ``` > remote_write: > - url: http://localhost:3030/remote/write > remote_timeout: 30s > queue_config: > capacity: 10000 > max_shards: 10 > min_shards: 3 > max_samples_per_send: 5000 > batch_send_deadline: 1m > min_backoff: 50ms > max_backoff: 1s > ``` > > Using the default config we get worse utilization for both before/after > numbers > but the delta/difference is less: > - steady state ~177kb/sec without this change > - steady state ~210kb/sec with this change > - roughly 20% increase > I think 20% is okay all things considered. Brian > > Using config: > ``` > remote_write: > - url: http://localhost:3030/remote/write > remote_timeout: 30s > ``` > > Implicitly the values for this config is: > - min shards 1 > - max shards 1000 > - max samples per send 100 > - capacity 500 > - batch send deadline 5s > - min backoff 30ms > - max backoff 100ms > > On Wed, Aug 19, 2020 at 4:26 AM Brian Brazil < > [email protected]> wrote: > >> On Wed, 19 Aug 2020 at 09:20, Rob Skillington <[email protected]> >> wrote: >> >>> Here's the results from testing: >>> - node_exporter exporting 309 metrics each by turning on a lot of >>> optional >>> collectors, all have help set, very few have unit set >>> - running 8 on the host at 1s scrape interval, each with unique instance >>> label >>> - steady state ~137kb/sec without this change >>> - steady state ~172kb/sec with this change >>> - roughly 30% increase >>> >>> Graph here: >>> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976 >>> >>> How do we want to proceed? This could be fairly close to the higher end >>> of >>> the spectrum in terms of expected increase given the node_exporter >>> metrics >>> density and fairly verbose metadata. >>> >>> Even having said that however 30% is a fairly big increase and >>> relatively large >>> egress cost to have to swallow without any way to back out of this >>> behavior. >>> >>> What do folks think of next steps? >>> >> >> It is on the high end, however this is going to be among the worst cases >> as there's not going to be a lot of per-metric cardinality from the node >> exporter. I bet if you greatly increased the number of targets (and reduced >> the scrape interval to compensate) it'd be more reasonable. I think this is >> just about okay. >> >> Brian >> >> >>> >>> >>> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington <[email protected]> >>> wrote: >>> >>>> Agreed - I'll see what I can do in getting some numbers for a workload >>>> collecting cAdvisor metrics, it seems to have a significant amount of >>>> HELP set: >>>> >>>> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics >>>> >>>> >>>> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil < >>>> [email protected]> wrote: >>>> >>>>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto < >>>>> [email protected]> wrote: >>>>> >>>>>> On 11 Aug 11:05, Brian Brazil wrote: >>>>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> > > I'm hesitant to add anything that significantly increases the >>>>>> network >>>>>> > > bandwidth usage or remote write while at the same time not giving >>>>>> users a >>>>>> > > way to tune the usage to their needs. >>>>>> > > >>>>>> > > I agree with Brian that we don't want the protocol itself to >>>>>> become >>>>>> > > stateful by introducing something like negotiation. I'd also >>>>>> prefer not to >>>>>> > > introduce multiple ways to do things, though I'm hoping we can >>>>>> find a way >>>>>> > > to accommodate your use case while not ballooning average users >>>>>> network >>>>>> > > egress bill. >>>>>> > > >>>>>> > > I am fine with forcing the consuming end to be somewhat stateful >>>>>> like in >>>>>> > > the case of Josh's PR where all metadata is sent periodically and >>>>>> must be >>>>>> > > stored by the remote storage system. >>>>>> > > >>>>>> > >>>>>> > >>>>>> > >>>>>> > > Overall I'd like to see some numbers regarding current network >>>>>> bandwidth >>>>>> > > of remote write, remote write with metadata via Josh's PR, and >>>>>> remote write >>>>>> > > with sending metadata for every series in a remote write payload. >>>>>> > > >>>>>> > >>>>>> > I agree, I noticed that in Rob's PR and had the same thought. >>>>>> >>>>>> Remote bandwidth are likely to affect only people using remote write. >>>>>> >>>>>> Getting a view on the on-disk size of the WAL would be great too, as >>>>>> that will affect everyone. >>>>>> >>>>> >>>>> I'm not worried about that, it's only really on series creation so >>>>> won't be noticed unless you have really high levels of churn. >>>>> >>>>> Brian >>>>> >>>>> >>>>>> >>>>>> > >>>>>> > Brian >>>>>> > >>>>>> > >>>>>> > > >>>>>> > > Rob, I'll review your PR tomorrow but it looks like Julien and >>>>>> Brian may >>>>>> > > already have that covered. >>>>>> > > >>>>>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington < >>>>>> [email protected]> >>>>>> > > wrote: >>>>>> > > >>>>>> > >> Update: The PR now sends the fields over remote write from the >>>>>> WAL and >>>>>> > >> metadata >>>>>> > >> is also updated in the WAL when any field changes. >>>>>> > >> >>>>>> > >> Now opened the PR against the primary repo: >>>>>> > >> https://github.com/prometheus/prometheus/pull/7771 >>>>>> > >> >>>>>> > >> I have tested this end-to-end with a modified M3 branch: >>>>>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata >>>>>> > >> > {... "msg":"received >>>>>> > >> series","labels":"{__name__="prometheus_rule_group_... >>>>>> > >> > >>>>>> iterations_total",instance="localhost:9090",job="prometheus01",role=... >>>>>> > >> > "remote"}","type":"counter","unit":"","help":"The total number >>>>>> of >>>>>> > >> scheduled... >>>>>> > >> > rule group evaluations, whether executed or missed."} >>>>>> > >> >>>>>> > >> Tests still haven't been updated. Please any feedback on the >>>>>> approach / >>>>>> > >> data structures would be greatly appreciated. >>>>>> > >> >>>>>> > >> Would be good to know what others thoughts are on next steps. >>>>>> > >> >>>>>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington < >>>>>> [email protected]> >>>>>> > >> wrote: >>>>>> > >> >>>>>> > >>> Here's a draft PR that builds that propagates metadata to the >>>>>> WAL and >>>>>> > >>> the WAL >>>>>> > >>> reader can read it back: >>>>>> > >>> https://github.com/robskillington/prometheus/pull/1/files >>>>>> > >>> >>>>>> > >>> Would like a little bit of feedback before on the datatypes and >>>>>> > >>> structure going >>>>>> > >>> further if folks are open to that. >>>>>> > >>> >>>>>> > >>> There's a few things not happening: >>>>>> > >>> - Remote write queue manager does not use or send these extra >>>>>> fields yet. >>>>>> > >>> - Head does not reset the "metadata" slice (not sure where >>>>>> "series" >>>>>> > >>> slice is >>>>>> > >>> reset in the head for pending series writes to WAL, want to >>>>>> do in same >>>>>> > >>> place). >>>>>> > >>> - Metadata is not re-written on change yet. >>>>>> > >>> - Tests. >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington < >>>>>> [email protected]> >>>>>> > >>> wrote: >>>>>> > >>> >>>>>> > >>>> Sounds good, I've updated the proposal with details on places >>>>>> in which >>>>>> > >>>> changes >>>>>> > >>>> are required given the new approach: >>>>>> > >>>> >>>>>> > >>>> >>>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit# >>>>>> > >>>> >>>>>> > >>>> >>>>>> > >>>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil < >>>>>> > >>>> [email protected]> wrote: >>>>>> > >>>> >>>>>> > >>>>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington < >>>>>> [email protected]> >>>>>> > >>>>> wrote: >>>>>> > >>>>> >>>>>> > >>>>>> True - I mean this could also be a blacklist by config >>>>>> perhaps, so if >>>>>> > >>>>>> you >>>>>> > >>>>>> really don't want to have increased egress you can >>>>>> optionally turn >>>>>> > >>>>>> off sending >>>>>> > >>>>>> the TYPE, HELP, UNIT or send them at different frequencies >>>>>> via >>>>>> > >>>>>> config. We could >>>>>> > >>>>>> package some sensible defaults so folks don't need to update >>>>>> their >>>>>> > >>>>>> config. >>>>>> > >>>>>> >>>>>> > >>>>>> The main intention is to enable these added features and >>>>>> make it >>>>>> > >>>>>> possible for >>>>>> > >>>>>> various consumers to be able to adjust some of these >>>>>> parameters if >>>>>> > >>>>>> required >>>>>> > >>>>>> since backends can be so different in their implementation. >>>>>> For M3 I >>>>>> > >>>>>> would be >>>>>> > >>>>>> totally fine with the extra egress that should be mitigated >>>>>> fairly >>>>>> > >>>>>> considerably >>>>>> > >>>>>> by Snappy and the fact that HELP is common across certain >>>>>> metric >>>>>> > >>>>>> families and >>>>>> > >>>>>> receiving it every single Remote Write request. >>>>>> > >>>>>> >>>>>> > >>>>> >>>>>> > >>>>> That's really a micro-optimisation. If you are that worried >>>>>> about >>>>>> > >>>>> bandwidth you'd run a sidecar specific to your remote backend >>>>>> that was >>>>>> > >>>>> stateful and far more efficient overall. Sending the full >>>>>> label names and >>>>>> > >>>>> values on every request is going to be far more than the >>>>>> overhead of >>>>>> > >>>>> metadata on top of that, so I don't see a need as it stands >>>>>> for any of this >>>>>> > >>>>> to be configurable. >>>>>> > >>>>> >>>>>> > >>>>> Brian >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>>> >>>>>> > >>>>>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil < >>>>>> > >>>>>> [email protected]> wrote: >>>>>> > >>>>>> >>>>>> > >>>>>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington < >>>>>> [email protected]> >>>>>> > >>>>>>> wrote: >>>>>> > >>>>>>> >>>>>> > >>>>>>>> Hey Björn, >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> Thanks for the detailed response. I've had a few back and >>>>>> forths on >>>>>> > >>>>>>>> this with >>>>>> > >>>>>>>> Brian and Chris over IRC and CNCF Slack now too. >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> I agree that fundamentally it seems naive to >>>>>> idealistically model >>>>>> > >>>>>>>> this around >>>>>> > >>>>>>>> per metric name. It needs to be per series given what may >>>>>> happen >>>>>> > >>>>>>>> w.r.t. >>>>>> > >>>>>>>> collision across targets, etc. >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> Perhaps we can separate these discussions apart into two >>>>>> > >>>>>>>> considerations: >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> 1) Modeling of the data such that it is kept around for >>>>>> > >>>>>>>> transmission (primarily >>>>>> > >>>>>>>> we're focused on WAL here). >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> 2) Transmission (and of which you allude to has many areas >>>>>> for >>>>>> > >>>>>>>> improvement). >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> For (1) - it seems like this needs to be done per time >>>>>> series, >>>>>> > >>>>>>>> thankfully we >>>>>> > >>>>>>>> actually already have modeled this to be stored per series >>>>>> data >>>>>> > >>>>>>>> just once in a >>>>>> > >>>>>>>> single WAL file. I will write up my proposal here, but it >>>>>> will >>>>>> > >>>>>>>> surmount to >>>>>> > >>>>>>>> essentially encoding the HELP, UNIT and TYPE to the WAL >>>>>> per series >>>>>> > >>>>>>>> similar to >>>>>> > >>>>>>>> how labels for a series are encoded once per series in the >>>>>> WAL. >>>>>> > >>>>>>>> Since this >>>>>> > >>>>>>>> optimization is in place, there's already a huge dampening >>>>>> effect >>>>>> > >>>>>>>> on how >>>>>> > >>>>>>>> expensive it is to write out data about a series (e.g. >>>>>> labels). We >>>>>> > >>>>>>>> can always >>>>>> > >>>>>>>> go and collect a sample WAL file and measure how much >>>>>> extra size >>>>>> > >>>>>>>> with/without >>>>>> > >>>>>>>> HELP, UNIT and TYPE this would add, but it seems like it >>>>>> won't >>>>>> > >>>>>>>> fundamentally >>>>>> > >>>>>>>> change the order of magnitude in terms of "information >>>>>> about a >>>>>> > >>>>>>>> timeseries >>>>>> > >>>>>>>> storage size" vs "datapoints about a timeseries storage >>>>>> size". One >>>>>> > >>>>>>>> extra change >>>>>> > >>>>>>>> would be re-encoding the series into the WAL if the HELP >>>>>> changed >>>>>> > >>>>>>>> for that >>>>>> > >>>>>>>> series, just so that when HELP does change it can be up to >>>>>> date >>>>>> > >>>>>>>> from the view >>>>>> > >>>>>>>> of whoever is reading the WAL (i.e. the Remote Write >>>>>> loop). Since >>>>>> > >>>>>>>> this entry >>>>>> > >>>>>>>> needs to be loaded into memory for Remote Write today >>>>>> anyway, with >>>>>> > >>>>>>>> string >>>>>> > >>>>>>>> interning as suggested by Chris, it won't change the >>>>>> memory profile >>>>>> > >>>>>>>> algorithmically of a Prometheus with Remote Write enabled. >>>>>> There >>>>>> > >>>>>>>> will be some >>>>>> > >>>>>>>> overhead that at most would likely be similar to the label >>>>>> data, >>>>>> > >>>>>>>> but we aren't >>>>>> > >>>>>>>> altering data structures (so won't change big-O magnitude >>>>>> of memory >>>>>> > >>>>>>>> being used), >>>>>> > >>>>>>>> we're adding fields to existing data structures that exist >>>>>> and >>>>>> > >>>>>>>> string interning >>>>>> > >>>>>>>> should actually make it much less onerous since there is a >>>>>> large >>>>>> > >>>>>>>> duplicative >>>>>> > >>>>>>>> effect with HELP among time series. >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> For (2) - now we have basically TYPE, HELP and UNIT all >>>>>> available >>>>>> > >>>>>>>> for >>>>>> > >>>>>>>> transmission if we wanted to send it with every single >>>>>> datapoint. >>>>>> > >>>>>>>> While I think >>>>>> > >>>>>>>> we should definitely examine HPACK like compression >>>>>> features as you >>>>>> > >>>>>>>> mentioned >>>>>> > >>>>>>>> Björn, I think we should think more about separating that >>>>>> kind of >>>>>> > >>>>>>>> work into a >>>>>> > >>>>>>>> Milestone 2 where this is considered. >>>>>> > >>>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>>> For the time being it's very plausible >>>>>> > >>>>>>>> we could do some negotiation of the receiving Remote Write >>>>>> endpoint >>>>>> > >>>>>>>> by sending >>>>>> > >>>>>>>> a "GET" to the remote write endpoint and seeing if it >>>>>> responds with >>>>>> > >>>>>>>> a >>>>>> > >>>>>>>> "capabilities + preferences" response, and if the endpoint >>>>>> > >>>>>>>> specifies that it >>>>>> > >>>>>>>> would like to receive metadata all the time on every >>>>>> single request >>>>>> > >>>>>>>> and let >>>>>> > >>>>>>>> Snappy take care of keeping size not ballooning too much, >>>>>> or if it >>>>>> > >>>>>>>> would like >>>>>> > >>>>>>>> TYPE on every single datapoint, and HELP and UNIT every >>>>>> > >>>>>>>> DESIRED_SECONDS or so. >>>>>> > >>>>>>>> To enable a "send HELP every 10 minutes" feature we would >>>>>> have to >>>>>> > >>>>>>>> add to the >>>>>> > >>>>>>>> datastructure that holds the LABELS, TYPE, HELP and UNIT >>>>>> for each >>>>>> > >>>>>>>> series a >>>>>> > >>>>>>>> "last sent" timestamp to know when to resend to that >>>>>> backend, but >>>>>> > >>>>>>>> that seems >>>>>> > >>>>>>>> entirely plausible and would not use more than 4 extra >>>>>> bytes. >>>>>> > >>>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> Negotiation is fundamentally stateful, as the process that >>>>>> receives >>>>>> > >>>>>>> the first request may be a very different one from the one >>>>>> that receives >>>>>> > >>>>>>> the second - such as if an upgrade is in progress. Remote >>>>>> write is intended >>>>>> > >>>>>>> to be a very simple thing that's easy to implement on the >>>>>> receiver end and >>>>>> > >>>>>>> is a send-only request-based protocol, so request-time >>>>>> negotiation is >>>>>> > >>>>>>> basically out. Any negotiation needs to happen via the >>>>>> config file, and >>>>>> > >>>>>>> even then it'd be better if nothing ever needed to be >>>>>> configured. Getting >>>>>> > >>>>>>> all the users of a remote write to change their config file >>>>>> or restart all >>>>>> > >>>>>>> their Prometheus servers is not an easy task after all. >>>>>> > >>>>>>> >>>>>> > >>>>>>> Brian >>>>>> > >>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> These thoughts are based on the discussion I've had and the >>>>>> > >>>>>>>> thoughts on this >>>>>> > >>>>>>>> thread. What's the feedback on this before I go ahead and >>>>>> > >>>>>>>> re-iterate the design >>>>>> > >>>>>>>> to more closely map to what I'm suggesting here? >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> Best, >>>>>> > >>>>>>>> Rob >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein < >>>>>> > >>>>>>>> [email protected]> wrote: >>>>>> > >>>>>>>> >>>>>> > >>>>>>>>> On 03.08.20 03:04, Rob Skillington wrote: >>>>>> > >>>>>>>>> > Ok - I have a proposal which could be broken up into >>>>>> two pieces, >>>>>> > >>>>>>>>> first >>>>>> > >>>>>>>>> > delivering TYPE per datapoint, the second consistently >>>>>> and >>>>>> > >>>>>>>>> reliably HELP and >>>>>> > >>>>>>>>> > UNIT once per unique metric name: >>>>>> > >>>>>>>>> > >>>>>> > >>>>>>>>> >>>>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo >>>>>> > >>>>>>>>> > /edit#heading=h.bik9uwphqy3g >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Thanks for the doc. I have commented on it, but while >>>>>> doing so, I >>>>>> > >>>>>>>>> felt >>>>>> > >>>>>>>>> the urge to comment more generally, which would not fit >>>>>> well into >>>>>> > >>>>>>>>> the >>>>>> > >>>>>>>>> margin of a Google doc. My thoughts are also a bit out of >>>>>> scope of >>>>>> > >>>>>>>>> Rob's design doc and more about the general topic of >>>>>> remote write >>>>>> > >>>>>>>>> and >>>>>> > >>>>>>>>> the equally general topic of metadata (about which we >>>>>> have an >>>>>> > >>>>>>>>> ongoing >>>>>> > >>>>>>>>> discussion among the Prometheus developers). >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Disclaimer: I don't know the remote-write protocol very >>>>>> well. My >>>>>> > >>>>>>>>> hope >>>>>> > >>>>>>>>> here is that my somewhat distant perspective is of some >>>>>> value as it >>>>>> > >>>>>>>>> allows to take a step back. However, I might just miss >>>>>> crucial >>>>>> > >>>>>>>>> details >>>>>> > >>>>>>>>> that completely invalidate my thoughts. We'll see... >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> I do care a lot about metadata, though. (And ironically, >>>>>> the reason >>>>>> > >>>>>>>>> why I declared remote write "somebody else's problem" is >>>>>> that I've >>>>>> > >>>>>>>>> always disliked how it fundamentally ignores metadata.) >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Rob's document embraces the fact that metadata can change >>>>>> over >>>>>> > >>>>>>>>> time, >>>>>> > >>>>>>>>> but it assumes that at any given time, there is only one >>>>>> set of >>>>>> > >>>>>>>>> metadata per unique metric name. It takes into account >>>>>> that there >>>>>> > >>>>>>>>> can >>>>>> > >>>>>>>>> be drift, but it considers them an irregularity that will >>>>>> only >>>>>> > >>>>>>>>> happen >>>>>> > >>>>>>>>> occasionally and iron out over time. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> In practice, however, metadata can be legitimately and >>>>>> deliberately >>>>>> > >>>>>>>>> different for different time series of the same name. >>>>>> > >>>>>>>>> Instrumentation >>>>>> > >>>>>>>>> libraries and even the exposition format inherently >>>>>> require one >>>>>> > >>>>>>>>> set of >>>>>> > >>>>>>>>> metadata per metric name, but this is all only enforced >>>>>> (and meant >>>>>> > >>>>>>>>> to >>>>>> > >>>>>>>>> be enforced) _per target_. Once the samples are ingested >>>>>> (or even >>>>>> > >>>>>>>>> sent >>>>>> > >>>>>>>>> onwards via remote write), they have no notion of what >>>>>> target they >>>>>> > >>>>>>>>> came from. Furthermore, samples created by rule >>>>>> evaluation don't >>>>>> > >>>>>>>>> have >>>>>> > >>>>>>>>> an originating target in the first place. (Which raises >>>>>> the >>>>>> > >>>>>>>>> question >>>>>> > >>>>>>>>> of metadata for recording rules, which is another can of >>>>>> worms I'd >>>>>> > >>>>>>>>> like to open eventually...) >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> (There is also the technical difficulty that the WAL has >>>>>> no notion >>>>>> > >>>>>>>>> of >>>>>> > >>>>>>>>> bundling or referencing all the series with the same >>>>>> metric name. >>>>>> > >>>>>>>>> That >>>>>> > >>>>>>>>> was commented about in the doc but is not my focus here.) >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Rob's doc sees TYPE as special because it is so cheap to >>>>>> just add >>>>>> > >>>>>>>>> to >>>>>> > >>>>>>>>> every data point. That's correct, but it's giving me an >>>>>> itch: >>>>>> > >>>>>>>>> Should >>>>>> > >>>>>>>>> we really create different ways of handling metadata, >>>>>> depending on >>>>>> > >>>>>>>>> its >>>>>> > >>>>>>>>> expected size? >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Compare this with labels. There is no upper limit to >>>>>> their number >>>>>> > >>>>>>>>> or >>>>>> > >>>>>>>>> size. Still, we have no plan of treating "large" labels >>>>>> differently >>>>>> > >>>>>>>>> from "short" labels. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> On top of that, we have by now gained the insight that >>>>>> metadata is >>>>>> > >>>>>>>>> changing over time and essentially has to be tracked per >>>>>> series. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Or in other words: From a pure storage perspective, >>>>>> metadata >>>>>> > >>>>>>>>> behaves >>>>>> > >>>>>>>>> exactly the same as labels! (There are certainly huge >>>>>> differences >>>>>> > >>>>>>>>> semantically, but those only manifest themselves on the >>>>>> query >>>>>> > >>>>>>>>> level, >>>>>> > >>>>>>>>> i.e. how you treat it in PromQL etc.) >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> (This is not exactly a new insight. This is more or less >>>>>> what I >>>>>> > >>>>>>>>> said >>>>>> > >>>>>>>>> during the 2016 dev summit, when we first discussed >>>>>> remote write. >>>>>> > >>>>>>>>> But >>>>>> > >>>>>>>>> I don't want to dwell on "told you so" moments... :o) >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> There is a good reason why we don't just add metadata as >>>>>> "pseudo >>>>>> > >>>>>>>>> labels": As discussed a lot in the various design docs >>>>>> including >>>>>> > >>>>>>>>> Rob's >>>>>> > >>>>>>>>> one, it would blow up the data size significantly because >>>>>> HELP >>>>>> > >>>>>>>>> strings >>>>>> > >>>>>>>>> tend to be relatively long. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> And that's the point where I would like to take a step >>>>>> back: We are >>>>>> > >>>>>>>>> discussing to essentially treat something that is >>>>>> structurally the >>>>>> > >>>>>>>>> same thing in three different ways: Way 1 for labels as >>>>>> we know >>>>>> > >>>>>>>>> them. Way 2 for "small" metadata. Way 3 for "big" >>>>>> metadata. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> However, while labels tend to be shorter than HELP >>>>>> strings, there >>>>>> > >>>>>>>>> is >>>>>> > >>>>>>>>> the occasional use case with long or many labels. >>>>>> (Infamously, at >>>>>> > >>>>>>>>> SoundCloud, a binary accidentally put a whole HTML page >>>>>> into a >>>>>> > >>>>>>>>> label. That wasn't a use case, it was a bug, but the >>>>>> Prometheus >>>>>> > >>>>>>>>> server >>>>>> > >>>>>>>>> ingesting that was just chugging along as if nothing >>>>>> special had >>>>>> > >>>>>>>>> happened. It looked weird in the expression browser, >>>>>> though...) I'm >>>>>> > >>>>>>>>> sure any vendor offering Prometheus remote storage as a >>>>>> service >>>>>> > >>>>>>>>> will >>>>>> > >>>>>>>>> have a customer or two that use excessively long label >>>>>> names. If we >>>>>> > >>>>>>>>> have to deal with that, why not bite the bullet and treat >>>>>> metadata >>>>>> > >>>>>>>>> in >>>>>> > >>>>>>>>> the same way as labels in general? Or to phrase it in >>>>>> another way: >>>>>> > >>>>>>>>> Any >>>>>> > >>>>>>>>> solution for "big" metadata could be used for labels, >>>>>> too, to >>>>>> > >>>>>>>>> alleviate the pain with excessively long label names. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Or most succintly: A robust and really good solution for >>>>>> > >>>>>>>>> "big" metadata in remote write will make remote write >>>>>> much more >>>>>> > >>>>>>>>> efficient if applied to labels, too. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Imagine an NALSD tech interview question that boils down >>>>>> to "design >>>>>> > >>>>>>>>> Prometheus remote write". I bet that most of the better >>>>>> candidates >>>>>> > >>>>>>>>> will recognize that most of the payload will consist of >>>>>> series >>>>>> > >>>>>>>>> indentifiers (call them labels or whatever) and they will >>>>>> suggest >>>>>> > >>>>>>>>> to >>>>>> > >>>>>>>>> first transmit some kind of index and from then on only >>>>>> transmit >>>>>> > >>>>>>>>> short >>>>>> > >>>>>>>>> series IDs. The best candidates will then find out about >>>>>> all the >>>>>> > >>>>>>>>> problems with that: How to keep the protocol stateless, >>>>>> how to >>>>>> > >>>>>>>>> re-sync >>>>>> > >>>>>>>>> the index, how to update it if new series arrive etc. >>>>>> Those are >>>>>> > >>>>>>>>> certainly all good reasons why remote write as we know it >>>>>> does not >>>>>> > >>>>>>>>> transfer an index of series IDs. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> However, my point here is that we are now discussing >>>>>> exactly those >>>>>> > >>>>>>>>> problems when we talk about metadata transmission. Let's >>>>>> solve >>>>>> > >>>>>>>>> those >>>>>> > >>>>>>>>> problems and apply them to remote write in general! >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Some thoughts about that: >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> Current remote write essentially transfers all labels for >>>>>> _every_ >>>>>> > >>>>>>>>> sample. This works reasonably well. Even if metadata >>>>>> blows up the >>>>>> > >>>>>>>>> data >>>>>> > >>>>>>>>> size by 5x or 10x, transfering the whole index of >>>>>> metadata and >>>>>> > >>>>>>>>> labels >>>>>> > >>>>>>>>> should remain feasible as long as we do it less >>>>>> frequently than >>>>>> > >>>>>>>>> once >>>>>> > >>>>>>>>> every 10 samples. It's something that could be done each >>>>>> time a >>>>>> > >>>>>>>>> remote-write receiver connects. From then on, we "only" >>>>>> have to >>>>>> > >>>>>>>>> track >>>>>> > >>>>>>>>> when new series (or series with new metadata) show up and >>>>>> transfer >>>>>> > >>>>>>>>> those. (I know it's not trivial, but we are already >>>>>> discussing >>>>>> > >>>>>>>>> possible solutions in the various design docs.) Whenever a >>>>>> > >>>>>>>>> remote-write receiver gets out of sync for some reason, >>>>>> it can >>>>>> > >>>>>>>>> simply >>>>>> > >>>>>>>>> cut the connection and start with a complete re-sync >>>>>> again. As >>>>>> > >>>>>>>>> long as >>>>>> > >>>>>>>>> that doesn't happen more often than once every 10 >>>>>> samples, we still >>>>>> > >>>>>>>>> have a net gain. Combining this with sharding is another >>>>>> challenge, >>>>>> > >>>>>>>>> but it doesn't appear unsolveable. >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>>> -- >>>>>> > >>>>>>>>> Björn Rabenstein >>>>>> > >>>>>>>>> [PGP-ID] 0x851C3DA17D748D03 >>>>>> > >>>>>>>>> [email] [email protected] >>>>>> > >>>>>>>>> >>>>>> > >>>>>>>> -- >>>>>> > >>>>>>>> You received this message because you are subscribed to >>>>>> the Google >>>>>> > >>>>>>>> Groups "Prometheus Developers" group. >>>>>> > >>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>> from it, >>>>>> > >>>>>>>> send an email to >>>>>> [email protected] >>>>>> > >>>>>>>> . >>>>>> > >>>>>>>> To view this discussion on the web visit >>>>>> > >>>>>>>> >>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com >>>>>> > >>>>>>>> < >>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZaQGfVK5OAfKRP2nxBnp168GML5r_ok_f%3DyVeUdC6e2EQ%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>>> > >>>>>> > >>>>>>>> . >>>>>> > >>>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> -- >>>>>> > >>>>>>> Brian Brazil >>>>>> > >>>>>>> www.robustperception.io >>>>>> > >>>>>>> >>>>>> > >>>>>> >>>>>> > >>>>> >>>>>> > >>>>> -- >>>>>> > >>>>> Brian Brazil >>>>>> > >>>>> www.robustperception.io >>>>>> > >>>>> >>>>>> > >>>> -- >>>>>> > >> You received this message because you are subscribed to the >>>>>> Google Groups >>>>>> > >> "Prometheus Developers" group. >>>>>> > >> To unsubscribe from this group and stop receiving emails from >>>>>> it, send an >>>>>> > >> email to [email protected]. >>>>>> > >> To view this discussion on the web visit >>>>>> > >> >>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com >>>>>> > >> < >>>>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZb%2BX-ErewAKEyg54_FVRmTSypbnNFmR-8ZayfU_WiTMFw%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>>> > >>>>>> > >> . >>>>>> > >> >>>>>> > > >>>>>> > >>>>>> > -- >>>>>> > Brian Brazil >>>>>> > www.robustperception.io >>>>>> > >>>>>> > -- >>>>>> > You received this message because you are subscribed to the Google >>>>>> Groups "Prometheus Developers" group. >>>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> > To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLouK0PKQMpmuWibEs3%3DDyrEXfN%2BbiUygfak4S_h0k30pw%40mail.gmail.com >>>>>> . >>>>>> >>>>>> -- >>>>>> Julien Pivotto >>>>>> @roidelapluie >>>>>> >>>>> >>>>> >>>>> -- >>>>> Brian Brazil >>>>> www.robustperception.io >>>>> >>>> >> >> -- >> Brian Brazil >> www.robustperception.io >> > -- Brian Brazil www.robustperception.io -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLoWTwM%2B1a-M%2BxPEyihxtYSvyna9m5F%3DXW_Sihs2zoLpgg%40mail.gmail.com.

