Ok - I have a proposal which could be broken up into two pieces, first delivering TYPE per datapoint, the second consistently and reliably HELP and UNIT once per unique metric name: https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#heading=h.bik9uwphqy3g
Would love to get some feedback on it. Thanks for the consideration. Is there anyone in particular I should reach out to ask for feedback from directly? Best, Rob On Tue, Jul 21, 2020 at 5:55 PM Rob Skillington <[email protected]> wrote: > Also want to point out that with just TYPE you can do things such as know > it's a histogram type and then suggest using "sum(rate(...)) by (le)" with > a one click button in a UI which again is significantly harder without that > information. > > The reason it becomes important though is some systems (i.e. StackDriver) > require this schema/metric information the first time you record a sample. > So you really want the very basics of it the first time you receive that > sample (i.e. at least TYPE): > > Defines a metric type and its schema. Once a metric descriptor is created, >> deleting or altering it stops data collection and makes the metric type's >> existing data unusable. >> The following are specific rules for service defined Monitoring metric >> descriptors: >> type, metricKind, valueType and description fields are all required. The >> unit field must be specified if the valueType is any of DOUBLE, INT64, >> DISTRIBUTION. >> Maximum of default 500 metric descriptors per service is allowed. >> Maximum of default 10 labels per metric descriptor is allowed. > > > https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors > > Just an example, but other systems and definitely systems that want to do > processing of metrics on the way in would prefer at very least things like > TYPE and maybe ideally UNIT too are specified. > > > On Tue, Jul 21, 2020 at 5:49 PM Rob Skillington <[email protected]> > wrote: > >> Hey Chris, >> >> Apologies on the delay to your response. >> >> Yes I think that even just TYPE would be a great first step. I am working >> on a very small one pager that outlines perhaps how we get from here to >> that future you talk about. >> >> In terms of downstream processing, just having the TYPE on every single >> sample would be a huge step forward as it enables the ability to do >> stateless processing of the metric (i.e. downsampling and working out >> whether counter resets need to be detected during downsampling of this >> single individual sample). >> >> Also you can imagine this enables the ability to suggest certain >> functions that can be applied, i.e. auto-suggest rate(...) should be >> applied without needing to analyze or use best effort heuristics of the >> actual values of a time series. >> >> Completely agreed that solving this for UNIT and HELP is more difficult >> and that information would likely be much nicer to be sent/stored per >> metric name rather than per time-series sample. >> >> I'll send out the Google doc for some comments shortly. >> >> Transactional approach is interesting, it could be difficult given that >> this information can flap (i.e. start with some value for HELP/UNIT but a >> different target of the same application has a different value) and hence >> that means ordering is important and dealing with transactional order could >> be a hard problem. I agree that making this deterministic if possible would >> be great. Maybe it could be something like a token that is sent alongside >> the first remote write payload, and if that continuation token that the >> receiver sees means it missed some part of the stream then it can go and do >> a full sync and from there on in receive updates/additions in a >> transactional way from the stream over remote write. Just a random thought >> though and requires more exploration / different solutions being listed to >> weigh up pros/cons/complexity/etc. >> >> Best, >> Rob >> >> >> >> On Thu, Jul 16, 2020 at 4:39 PM Chris Marchbanks <[email protected]> >> wrote: >> >>> Hi Rob, >>> >>> I would also like metadata to become stateless, and view 6815 >>> <https://github.com/prometheus/prometheus/pull/6815> only as a first >>> step, and the start of an output format. Currently, there is a work in >>> progress design doc, and another topic for an upcoming dev summit, for >>> allowing use cases where metadata needs to be in the same request as the >>> samples. >>> >>> Generally, I (and some others I have talked to) don't want to send all >>> the metadata with every sample as that is very repetitive, specifically for >>> histograms and metrics with many series. Instead, I would like remote write >>> requests to become transaction based, at which point all the metadata from >>> that scrape/transaction can be added to the metadata field introduced >>> to the proto in 6815 >>> <https://github.com/prometheus/prometheus/pull/6815> and then each >>> sample can be linked to a metadata entry without as much duplication. That >>> is very broad strokes, and I am sure it will be refined or changed >>> completely with more usage. >>> >>> That said, TYPE and UNIT are much smaller than metric name and help >>> text, and I would support adding those to a linked metadata entry before >>> remote write becomes transactional. Would that satisfy your use cases? >>> >>> Chris >>> >>> On Thu, Jul 16, 2020 at 1:43 PM Rob Skillington <[email protected]> >>> wrote: >>> >>>> Typo: "community request" should be: "community contribution that >>>> duplicates some of PR 6815" >>>> >>>> On Thu, Jul 16, 2020 at 3:27 PM Rob Skillington <[email protected]> >>>> wrote: >>>> >>>>> Firstly: Thanks a lot for sharing the dev summit notes, they are >>>>> greatly appreciated. Also thank you for a great PromCon! >>>>> >>>>> In regards to prometheus remote write metadata propagation consensus, >>>>> is there any plans/projects/collaborations that can be done to perhaps >>>>> plan >>>>> work on a protocol that might help others in the ecosystem offer the same >>>>> benefits to Prometheus ecosystem projects that operate on a per write >>>>> request basis (i.e. stateless processing of a write request)? >>>>> >>>>> I understand https://github.com/prometheus/prometheus/pull/6815 unblocks >>>>> feature development on top of Prometheus for users with specific >>>>> architectures, however it is a non-starter for a lot of other projects, >>>>> especially for third party exporters to systems that are unowned by end >>>>> users (i.e. writing a StackDriver remote write endpoint that targeted >>>>> StackDriver, the community is unable to change the implementation of >>>>> StackDriver itself to cache/statefully make metrics metadata available at >>>>> ingestion time to StackDriver). >>>>> >>>>> Obviously I have a vested interest since as a remote write target, M3 >>>>> has several stateless components before TSDB ingestion and flowing the >>>>> entire metadata to a distributed set of DB nodes that own a different set >>>>> of the metrics space from each other node this has implications on M3 >>>>> itself of course too (i.e. it is non-trivial to map metric name -> DB node >>>>> without some messy stateful cache sitting somewhere in the architecture >>>>> which adds operational burdens to end users). >>>>> >>>>> I suppose what I'm asking is, are maintainers open to a community >>>>> request that duplicates some of >>>>> https://github.com/prometheus/prometheus/pull/6815 but sends just >>>>> metric TYPE and UNIT per datapoint (which would need to be captured by the >>>>> WAL if feature is enabled) to a backend so it can statefully be processed >>>>> correctly without needing a sync of a global set of metadata to a backend? >>>>> >>>>> And if not, what are the plans here and how can we collaborate to make >>>>> this data useful to other consumers in the Prometheus ecosystem. >>>>> >>>>> Best intentions, >>>>> Rob >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Prometheus Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/prometheus-developers/CABakzZbvZeyKLXfK08aiXgGcZso%3D8A0H1JBT9jwBzf6rCiUmVw%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/prometheus-developers/CABakzZbvZeyKLXfK08aiXgGcZso%3D8A0H1JBT9jwBzf6rCiUmVw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CABakzZY%3DShJyw5am7n56OGBgJ3dTNrEcCiiSFmiBc1QUYZP2Kw%40mail.gmail.com.

