Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Hello, I have rewritten most of the proposal. Though I think that there is some more research that needs to be done to get the Metric specification perfect. I plan to do more research, and would like to ask you all for more help to make this proposal better. In particular, now that the metrics format by default is designed to allow metrics to pass through to monitoring collection systems such as Dropwizard and Stackdriver, they need to be complete enough to be compatible with these systems. I think some changes will be needed to fulfill this, but I wanted to send out this document, which contains the general idea, and continue refining it. Please take a look and let me know what you think. https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit Major Revision: April 17, 2018 The design has been reworked, to use a metric format which resembles Dropwizard and Stackdriver formats, allowing metrics to be passed through. The generic bytes payload style of metrics is still available but is reserved for complex use cases which do not fit into these typical metrics collection systems. Note: This document isn’t 100% complete, there are a few areas which need to be improved, though our discussion and more research I want to complete these details. Please share any thoughts that you have. 1. The metric specification and Metric proto schemas may need revisions: 1. The distribution format needs to be refined so that its compatible with Stackdriver and Dropwizard. The current example format is. A second distribution format need. 2. Annotations needs to be examine in detail, if there are first class annotations which should be supported to pass through properly to Dropwizard and Stackdriver. 3. Aggregation functions may need parameters. For example Top(n) may need to be parameterized. How should this best be supported. On Tue, Apr 17, 2018 at 11:10 AM Ben Chamberswrote: > That sounds like a very reasonable choice -- given the discussion seemed > to be focusing on the differences between these two categories, separating > them will allow the proposal (and implementation) to address each category > in the best way possible without needing to make compromises. > > Looking forward to the updated proposal. > > On Tue, Apr 17, 2018 at 10:53 AM Alex Amato wrote: > >> Hello, >> >> I just wanted to give an update . >> >> After some discussion, I've realized that its best to break up the two >> concepts, with two separate way of reporting monitoring data. These two >> categories are: >> >>1. Metrics - Counters, Gauges, Distributions. These are well defined >>concepts for monitoring information and ned to integrate with existing >>metrics collection systems such as Dropwizard and Stackdriver. Most >> metrics >>will go through this model, which will allow runners to process new >> metrics >>without adding extra code to support them, forwarding them to metric >>collection systems. >>2. Monitoring State - This supports general monitoring data which may >>not fit into the standard model for Metrics. For example an I/O source may >>provide a table of filenames+metadata, for files which are old and >> blocking >>the system. I will propose a general approach, similar to the URN+payload >>approach used in the doc right now. >> >> One thing to keep in mind -- even though it makes sense to allow each I/O > source to define their own monitoring state, this then shifts > responsibility for collecting that information to each runner and > displaying that information to every consumer. It would be reasonable to > see if there could be a set of 10 or so that covered most of the cases that > could become the "standard" set (eg., watermark information, performance > information, etc.). > > >> I will rewrite most of the doc and propose separating these two very >> different use cases, one which optimizes for integration with existing >> monitoring systems. The other which optimizes for flexibility, allowing >> more complex and custom metrics formats for other debugging scenarios. >> >> I just wanted to give a brief update on the direction of this change, >> before writing it up in full detail. >> >> >> On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshaw >> wrote: >> >>> I agree that the user/system dichotomy is false, the real question of >>> how counters can be scoped to avoid accidental (or even intentional) >>> interference. A system that entirely controls the interaction between the >>> "user" (from its perspective) and the underlying system can do this by >>> prefixing all requested "user" counters with a prefix it will not use >>> itself. Of course this breaks down whenever the wrapping isn't complete >>> (either on the production or consumption side), but may be worth doing for >>> some components (like the SDKs that
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
That sounds like a very reasonable choice -- given the discussion seemed to be focusing on the differences between these two categories, separating them will allow the proposal (and implementation) to address each category in the best way possible without needing to make compromises. Looking forward to the updated proposal. On Tue, Apr 17, 2018 at 10:53 AM Alex Amatowrote: > Hello, > > I just wanted to give an update . > > After some discussion, I've realized that its best to break up the two > concepts, with two separate way of reporting monitoring data. These two > categories are: > >1. Metrics - Counters, Gauges, Distributions. These are well defined >concepts for monitoring information and ned to integrate with existing >metrics collection systems such as Dropwizard and Stackdriver. Most metrics >will go through this model, which will allow runners to process new metrics >without adding extra code to support them, forwarding them to metric >collection systems. >2. Monitoring State - This supports general monitoring data which may >not fit into the standard model for Metrics. For example an I/O source may >provide a table of filenames+metadata, for files which are old and blocking >the system. I will propose a general approach, similar to the URN+payload >approach used in the doc right now. > > One thing to keep in mind -- even though it makes sense to allow each I/O source to define their own monitoring state, this then shifts responsibility for collecting that information to each runner and displaying that information to every consumer. It would be reasonable to see if there could be a set of 10 or so that covered most of the cases that could become the "standard" set (eg., watermark information, performance information, etc.). > I will rewrite most of the doc and propose separating these two very > different use cases, one which optimizes for integration with existing > monitoring systems. The other which optimizes for flexibility, allowing > more complex and custom metrics formats for other debugging scenarios. > > I just wanted to give a brief update on the direction of this change, > before writing it up in full detail. > > > On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshaw > wrote: > >> I agree that the user/system dichotomy is false, the real question of how >> counters can be scoped to avoid accidental (or even intentional) >> interference. A system that entirely controls the interaction between the >> "user" (from its perspective) and the underlying system can do this by >> prefixing all requested "user" counters with a prefix it will not use >> itself. Of course this breaks down whenever the wrapping isn't complete >> (either on the production or consumption side), but may be worth doing for >> some components (like the SDKs that value being able to provide this >> isolation for better behavior). Actual (human) end users are likely to be >> much less careful about avoiding conflicts than library authors who in turn >> are generally less careful than authors of the system itself. >> >> We could alternatively allow for specifying fully qualified URNs for >> counter names in the SDK APIs, and letting "normal" user counters be in the >> empty namespace rather than something like beam:metrics:{user,other,...}, >> perhaps with SDKs prohibiting certain conflicting prefixes (which is less >> than ideal). A layer above the SDK that has similar absolute control over >> its "users" would have a similar decision to make. >> >> >> On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowles wrote: >> >>> One reason I resist the user/system distinction is that Beam is a >>> multi-party system with at least SDK, runner, and pipeline. Often there may >>> be a DSL like SQL or Scio, or similarly someone may be building a platform >>> for their company where there is no user authoring the pipeline. Should >>> Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to >>> tack on the prefix? It looks like it is the SDK harness? Are there just >>> three namespaces "runner", "sdk", and "user"? Most of what you'd think >>> of as "user" version "system" should simply be the different between >>> dynamically defined & typed metrics and fields in control plane protos. If >>> that layer of the namespaces is not finite and limited, who can extend make >>> a valid extension? Just some questions that I think would flesh out the >>> meaning of the "user" prefix. >>> >>> Kenn >>> >>> On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler >>> wrote: >>> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw wrote: > On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler > wrote: > >> Thanks, Robert! >> >> I think my lack of clarity is around the MetricSpec. Maybe what's in >> my head and what's being proposed are the
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Hello, I just wanted to give an update . After some discussion, I've realized that its best to break up the two concepts, with two separate way of reporting monitoring data. These two categories are: 1. Metrics - Counters, Gauges, Distributions. These are well defined concepts for monitoring information and ned to integrate with existing metrics collection systems such as Dropwizard and Stackdriver. Most metrics will go through this model, which will allow runners to process new metrics without adding extra code to support them, forwarding them to metric collection systems. 2. Monitoring State - This supports general monitoring data which may not fit into the standard model for Metrics. For example an I/O source may provide a table of filenames+metadata, for files which are old and blocking the system. I will propose a general approach, similar to the URN+payload approach used in the doc right now. I will rewrite most of the doc and propose separating these two very different use cases, one which optimizes for integration with existing monitoring systems. The other which optimizes for flexibility, allowing more complex and custom metrics formats for other debugging scenarios. I just wanted to give a brief update on the direction of this change, before writing it up in full detail. On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshawwrote: > I agree that the user/system dichotomy is false, the real question of how > counters can be scoped to avoid accidental (or even intentional) > interference. A system that entirely controls the interaction between the > "user" (from its perspective) and the underlying system can do this by > prefixing all requested "user" counters with a prefix it will not use > itself. Of course this breaks down whenever the wrapping isn't complete > (either on the production or consumption side), but may be worth doing for > some components (like the SDKs that value being able to provide this > isolation for better behavior). Actual (human) end users are likely to be > much less careful about avoiding conflicts than library authors who in turn > are generally less careful than authors of the system itself. > > We could alternatively allow for specifying fully qualified URNs for > counter names in the SDK APIs, and letting "normal" user counters be in the > empty namespace rather than something like beam:metrics:{user,other,...}, > perhaps with SDKs prohibiting certain conflicting prefixes (which is less > than ideal). A layer above the SDK that has similar absolute control over > its "users" would have a similar decision to make. > > > On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowles wrote: > >> One reason I resist the user/system distinction is that Beam is a >> multi-party system with at least SDK, runner, and pipeline. Often there may >> be a DSL like SQL or Scio, or similarly someone may be building a platform >> for their company where there is no user authoring the pipeline. Should >> Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to >> tack on the prefix? It looks like it is the SDK harness? Are there just >> three namespaces "runner", "sdk", and "user"? Most of what you'd think >> of as "user" version "system" should simply be the different between >> dynamically defined & typed metrics and fields in control plane protos. If >> that layer of the namespaces is not finite and limited, who can extend make >> a valid extension? Just some questions that I think would flesh out the >> meaning of the "user" prefix. >> >> Kenn >> >> On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler >> wrote: >> >>> >>> >>> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw >>> wrote: >>> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler wrote: > Thanks, Robert! > > I think my lack of clarity is around the MetricSpec. Maybe what's in > my head and what's being proposed are the same thing. When I read that > the > MetricSpec describes the proto structure, that sound kind of complicated > to > me. But I may be misinterpreting it. What I picture is something like a > MetricSpec that looks like (note: my picture looks a lot like Stackdriver > :): > > { > name: "my_timer" > name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep requiring namespaces). Or "beam:metric:[some non-user designation]" >>> >>> Sure. Looks good. >>> >>> labels: { "ptransform" } > How does an SDK act on this information? >>> >>> The SDK is obligated to submit any metric values for that spec with a >>> "ptransform" -> "transformName" in the labels field. Autogenerating code >>> from the spec to avoid typos should be easy. >>> >>> > type: GAUGE > value_type: int64 > I was lumping type and value_type into the
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
I agree that the user/system dichotomy is false, the real question of how counters can be scoped to avoid accidental (or even intentional) interference. A system that entirely controls the interaction between the "user" (from its perspective) and the underlying system can do this by prefixing all requested "user" counters with a prefix it will not use itself. Of course this breaks down whenever the wrapping isn't complete (either on the production or consumption side), but may be worth doing for some components (like the SDKs that value being able to provide this isolation for better behavior). Actual (human) end users are likely to be much less careful about avoiding conflicts than library authors who in turn are generally less careful than authors of the system itself. We could alternatively allow for specifying fully qualified URNs for counter names in the SDK APIs, and letting "normal" user counters be in the empty namespace rather than something like beam:metrics:{user,other,...}, perhaps with SDKs prohibiting certain conflicting prefixes (which is less than ideal). A layer above the SDK that has similar absolute control over its "users" would have a similar decision to make. On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowleswrote: > One reason I resist the user/system distinction is that Beam is a > multi-party system with at least SDK, runner, and pipeline. Often there may > be a DSL like SQL or Scio, or similarly someone may be building a platform > for their company where there is no user authoring the pipeline. Should > Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to > tack on the prefix? It looks like it is the SDK harness? Are there just > three namespaces "runner", "sdk", and "user"? Most of what you'd think > of as "user" version "system" should simply be the different between > dynamically defined & typed metrics and fields in control plane protos. If > that layer of the namespaces is not finite and limited, who can extend make > a valid extension? Just some questions that I think would flesh out the > meaning of the "user" prefix. > > Kenn > > On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler wrote: > >> >> >> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw >> wrote: >> >>> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler >>> wrote: >>> Thanks, Robert! I think my lack of clarity is around the MetricSpec. Maybe what's in my head and what's being proposed are the same thing. When I read that the MetricSpec describes the proto structure, that sound kind of complicated to me. But I may be misinterpreting it. What I picture is something like a MetricSpec that looks like (note: my picture looks a lot like Stackdriver :): { name: "my_timer" >>> >>> name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep >>> requiring namespaces). Or "beam:metric:[some non-user designation]" >>> >> >> Sure. Looks good. >> >> >>> >>> labels: { "ptransform" } >>> >>> How does an SDK act on this information? >>> >> >> The SDK is obligated to submit any metric values for that spec with a >> "ptransform" -> "transformName" in the labels field. Autogenerating code >> from the spec to avoid typos should be easy. >> >> >>> >>> type: GAUGE value_type: int64 >>> >>> I was lumping type and value_type into the same field, as a urn for >>> possibly extensibility, as they're tightly coupled (e.g. quantiles, >>> distributions). >>> >> >> My inclination is that keeping this set relatively small and fixed to a >> set that can be readily exported to external monitoring systems is more >> useful than the added indirection to support extensibility. Lumping >> together seems reasonable. >> >> >>> >>> units: SECONDS description: "Times my stuff" >>> >>> Are both of these optional metadata, in the form of key-value field, for >>> flattened into the field itself (along with every other kind of metadata >>> you may want to attach)? >>> >> >> Optional metadata in the form of fixed fields. Is there a use case for >> arbitrary metadata? What would you do with it when exporting? >> >> >>> >>> } Then metrics submitted would look like: { name: "my_timer" labels: {"ptransform": "MyTransform"} int_value: 100 } >>> >>> Yes, or value could be a bytes field that is encoded according to >>> [value_]type above, if we want that extensibility (e.g. if we want to >>> bundle the pardo sub-timings together, we'd need a proto for the value, but >>> that seems to specific to hard code into the basic structure). >>> >>> >> The simplicity coming from the fact that there's only one proto format for the spec and for the value. The only thing that varies are the entries in the map and the value field set. It's pretty easy to establish contracts around this type of spec and even
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
One reason I resist the user/system distinction is that Beam is a multi-party system with at least SDK, runner, and pipeline. Often there may be a DSL like SQL or Scio, or similarly someone may be building a platform for their company where there is no user authoring the pipeline. Should Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to tack on the prefix? It looks like it is the SDK harness? Are there just three namespaces "runner", "sdk", and "user"? Most of what you'd think of as "user" version "system" should simply be the different between dynamically defined & typed metrics and fields in control plane protos. If that layer of the namespaces is not finite and limited, who can extend make a valid extension? Just some questions that I think would flesh out the meaning of the "user" prefix. Kenn On Fri, Apr 13, 2018 at 5:26 PM Andrea Foeglerwrote: > > > On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler >> wrote: >> >>> Thanks, Robert! >>> >>> I think my lack of clarity is around the MetricSpec. Maybe what's in my >>> head and what's being proposed are the same thing. When I read that the >>> MetricSpec describes the proto structure, that sound kind of complicated to >>> me. But I may be misinterpreting it. What I picture is something like a >>> MetricSpec that looks like (note: my picture looks a lot like Stackdriver >>> :): >>> >>> { >>> name: "my_timer" >>> >> >> name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep >> requiring namespaces). Or "beam:metric:[some non-user designation]" >> > > Sure. Looks good. > > >> >> labels: { "ptransform" } >>> >> >> How does an SDK act on this information? >> > > The SDK is obligated to submit any metric values for that spec with a > "ptransform" -> "transformName" in the labels field. Autogenerating code > from the spec to avoid typos should be easy. > > >> >> >>> type: GAUGE >>> value_type: int64 >>> >> >> I was lumping type and value_type into the same field, as a urn for >> possibly extensibility, as they're tightly coupled (e.g. quantiles, >> distributions). >> > > My inclination is that keeping this set relatively small and fixed to a > set that can be readily exported to external monitoring systems is more > useful than the added indirection to support extensibility. Lumping > together seems reasonable. > > >> >> >>> units: SECONDS >>> description: "Times my stuff" >>> >> >> Are both of these optional metadata, in the form of key-value field, for >> flattened into the field itself (along with every other kind of metadata >> you may want to attach)? >> > > Optional metadata in the form of fixed fields. Is there a use case for > arbitrary metadata? What would you do with it when exporting? > > >> >> >>> } >>> >>> Then metrics submitted would look like: >>> { >>> name: "my_timer" >>> labels: {"ptransform": "MyTransform"} >>> int_value: 100 >>> } >>> >> >> Yes, or value could be a bytes field that is encoded according to >> [value_]type above, if we want that extensibility (e.g. if we want to >> bundle the pardo sub-timings together, we'd need a proto for the value, but >> that seems to specific to hard code into the basic structure). >> >> > The simplicity coming from the fact that there's only one proto format for >>> the spec and for the value. The only thing that varies are the entries in >>> the map and the value field set. It's pretty easy to establish contracts >>> around this type of spec and even generate protos for use the in SDK that >>> make the expectations explicit. >>> >>> >>> On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw >>> wrote: >>> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: > > Or just "beam:counter::" or even > "beam:metric::" since metrics have a type separate from > their name. > I proposed keeping the "user" in there to avoid possible clashes with the system namespaces. (No preference on counter vs. metric, I wasn't trying to imply counter = SumInts) On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler wrote: > I like the generalization from entity -> labels. I view the purpose > of those fields to provide context. And labels feel like they supports a > richer set of contexts. > If we think such a generalization provides value, I'm fine with doing that now, as sets or key-value maps, if we have good enough examples to justify this. > The URN concept gets a little tricky. I totally agree that the > context fields should not be embedded in the name. > There's a "name" which is the identifier that can be used to > communicate what context values are supported / allowed for metrics with > that name (for example, element_count expects a ptransform ID).
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 4:30 PM Alex Amatowrote: > There are a few more confusing concepts in this thread > *Name* > >- Name can mean a *"string name"* used to refer to a metric in a >metrics system such as stackdriver, i.e. "ElementCount", "ExecutionTime" >- Name can mean a set of *context* fields added to a counter, either >embedded in a complex string, or in a structured name. Typically referring >to *aggregation entities, *which define how the metric updates get >aggregated into final metric values, i.e. all Metric updates with the same >field are aggregated together. > - e.g.my_ptransform_id-ElementCount > - e.g.{ name : 'ElementCount', 'ptransform_name' : > 'my_ptransform_id' } >- The *URN* of a Metric, which identifies a proto to use in a payload >field for the Metric and MetricSpec. Note: The string name, can literally >be the URN value in most cases, except for metrics which can specify a >separate name (i.e. user counters). > > @Robert, > You have proposed that metrics should contain the following parts, I still > don't fully understand what you mean by each one. > >- Name - Why is a name a URN + bytes payload? What type of name are >you referring to, *string name*? *context*? *URN*? Or something else. > > As you say above, the URN can literally be the string name. I see no reason why this can't be the case for user counters as well (the user counter name becoming part of the urn). The payload, should we decide to keep it, is "part" of the name because it helps identify what exactly we're counting. I.e. {urnX, payload1} would be distinct from {urnX, payload2}. The only reason to have a payload is to avoid sticking stuff that would be ugly to parse into the URN. > >- Entity - This is how the metric is aggregated together. If I >understand you correctly. And you correctly point out that a singular >entity is not sufficient, a set of labels may be more appropriate. > > Alternatively, the entity/labels specifies possible sub-partitions of the metric identified by its name (as above). > >- Value - *Are you saying this is just the metric value, not including >any fields related to entity or name.* > > Exactly. Like "5077." For some types it would be composite. The type also indicates how it's encoded (e.g. as bytes, or which field of a oneof should be populated). > >- Type - I am not clear at all on what this is or what it would look >like. Are you referring to units, like milliseconds/seconds? Why it >wouldn't be part of the value payload. Is this some sort of reason to >separate it out from the value? What if the value has multiple fields for >example. > > Type would be "beam:metric_type:sum:ints" or "beam:metric_type:distribution:doubles." We could separate "data type" from "aggregation type" if desired, though of course the full cross-product doesn't makes sense. We could put the unit in the type (e.g. sum_durations != sum_ints), but, preferably, I'd put this as metadata on the counter spec. It is often fully determined by the URN, but provided so one can reason about the metric without having to interpret the URN. It also means we don't have to have a separate URN for each user metric type. (In fact, any metric the runner doesn't understand would be treated as a user metric, and aggregated as such if it understand the type.) Some pros and cons as I see them > Pros: > >- More separation and flexibility for an SDK to specify labels >separately from the value/type. Though, maybe I don't understand enough, >and I am not so sure this is a con over just having the URN payload contain >everything in itself. > > We can't interpret a URN payload unless we know the URN. Separating things out allows us to act on metrics without interpreting the URN (both for unknown URNs, and simplifying the logic by not having to do lookups on the URN everywhere). > Cons: > >- I think this means that the SDK must properly pick two separate >payloads and populate them correctly. We can run into issues where. > - Having one URN which specifies all the fields you would need to > populate for a specific metric avoids this, this was a concern brought > up > by Luke. The runner would then be responsible for packaging metrics up > to > send to external monitoring systems. > > I'm not following you here. We'd return exactly what Andrea suggested. > > @Andrea, please correct me if I misunderstand > Thank you for the metric spec example in your last response, I think that > makes the idea much more clear. > > Using your approach I see the following pros and cons > Pros: > >- Runners have a cleaner more reusable codepath to forwarding metrics >to external monitoring systems. This will mean less work on the runner side >to support each metric (perhaps none in many cases). >- SDKs may need less code as well to package up new
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 3:28 PM Andrea Foeglerwrote: > Thanks, Robert! > > I think my lack of clarity is around the MetricSpec. Maybe what's in my > head and what's being proposed are the same thing. When I read that the > MetricSpec describes the proto structure, that sound kind of complicated to > me. But I may be misinterpreting it. What I picture is something like a > MetricSpec that looks like (note: my picture looks a lot like Stackdriver > :): > > { > name: "my_timer" > name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep requiring namespaces). Or "beam:metric:[some non-user designation]" labels: { "ptransform" } > How does an SDK act on this information? > type: GAUGE > value_type: int64 > I was lumping type and value_type into the same field, as a urn for possibly extensibility, as they're tightly coupled (e.g. quantiles, distributions). > units: SECONDS > description: "Times my stuff" > Are both of these optional metadata, in the form of key-value field, for flattened into the field itself (along with every other kind of metadata you may want to attach)? > } > > Then metrics submitted would look like: > { > name: "my_timer" > labels: {"ptransform": "MyTransform"} > int_value: 100 > } > Yes, or value could be a bytes field that is encoded according to [value_]type above, if we want that extensibility (e.g. if we want to bundle the pardo sub-timings together, we'd need a proto for the value, but that seems to specific to hard code into the basic structure). > The simplicity coming from the fact that there's only one proto format for > the spec and for the value. The only thing that varies are the entries in > the map and the value field set. It's pretty easy to establish contracts > around this type of spec and even generate protos for use the in SDK that > make the expectations explicit. > > > On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: >> >>> >>> Or just "beam:counter::" or even >>> "beam:metric::" since metrics have a type separate from >>> their name. >>> >> >> I proposed keeping the "user" in there to avoid possible clashes with the >> system namespaces. (No preference on counter vs. metric, I wasn't trying to >> imply counter = SumInts) >> >> >> On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler >> wrote: >> >>> I like the generalization from entity -> labels. I view the purpose of >>> those fields to provide context. And labels feel like they supports a >>> richer set of contexts. >>> >> >> If we think such a generalization provides value, I'm fine with doing >> that now, as sets or key-value maps, if we have good enough examples to >> justify this. >> >> >>> The URN concept gets a little tricky. I totally agree that the context >>> fields should not be embedded in the name. >>> There's a "name" which is the identifier that can be used to communicate >>> what context values are supported / allowed for metrics with that name (for >>> example, element_count expects a ptransform ID). But then there's the >>> context. In Stackdriver, this context is a map of key-value pairs; the >>> type is considered metadata associated with the name, but not communicated >>> with the value. >>> >> >> I'm not quite following you here. If context contains a ptransform id, >> then it cannot be associated with a single name. >> >> >>> Could the URN be "beam:namespace:name" and every metric have a map of >>> key-value pairs for context? >>> >> >> The URN is the name. Something like >> "beam:metric:ptransform_execution_times:v1." >> >> >>> Not sure where this fits in the discussion or if this is handled >>> somewhere, but allowing for a metric configuration that's provided >>> independently of the value allows for configuring "type", "units", etc in a >>> uniform way without having to encode them in the metric name / value. >>> Stackdriver expects each metric type has been configured ahead of time with >>> these annotations / metadata. Then values are reported separately. For >>> system metrics, the definitions can be packaged with the SDK. For user >>> metrics, they'd be defined at runtime. >>> >> >> This feels like the metrics spec, that specifies that the metric with >> name/URN X has this type plus a bunch of other metadata (e.g. units, if >> they're not implicit in the type? This gets into whether the type should be >> Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...} + >> units metadata). >> >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
That's a great summary Alex, thanks! This doesn't address all your questions, but in terms of how I see the MetricSpec being specified / shared is something like this: SDKs just share the same MetricSpec file which defines all the system metrics guaranteed by Beam. SDK-specific additions can be handled with an addendum. That spec can be read by the SDK and by the Runner. The SDK is responsible for populating the metric values according to the spec for all system metrics. The runner doesn't really need the spec for user-defined metrics, since there's really nothing to do with them but forward them along. I think this should eliminate any concerns around misspellings and such. It would even be pretty simple to automatically generate protos for each system MetricSpec and the code to convert from the proto to the MetricSpec. I do think runners should treat any metrics they don't know about just like user metrics - metrics to be forwarded to downstream monitoring tools. I think I'm unconvinced this Metrics API should handle cases like the I/O files case. On Fri, Apr 13, 2018 at 4:30 PM Alex Amatowrote: > There are a few more confusing concepts in this thread > *Name* > >- Name can mean a *"string name"* used to refer to a metric in a >metrics system such as stackdriver, i.e. "ElementCount", "ExecutionTime" >- Name can mean a set of *context* fields added to a counter, either >embedded in a complex string, or in a structured name. Typically referring >to *aggregation entities, *which define how the metric updates get >aggregated into final metric values, i.e. all Metric updates with the same >field are aggregated together. > - e.g.my_ptransform_id-ElementCount > - e.g.{ name : 'ElementCount', 'ptransform_name' : > 'my_ptransform_id' } >- The *URN* of a Metric, which identifies a proto to use in a payload >field for the Metric and MetricSpec. Note: The string name, can literally >be the URN value in most cases, except for metrics which can specify a >separate name (i.e. user counters). > > > > @Robert, > You have proposed that metrics should contain the following parts, I still > don't fully understand what you mean by each one. > >- Name - Why is a name a URN + bytes payload? What type of name are >you referring to, *string name*? *context*? *URN*? Or something else. >- Entity - This is how the metric is aggregated together. If I >understand you correctly. And you correctly point out that a singular >entity is not sufficient, a set of labels may be more appropriate. >- Value - *Are you saying this is just the metric value, not including >any fields related to entity or name.* >- Type - I am not clear at all on what this is or what it would look >like. Are you referring to units, like milliseconds/seconds? Why it >wouldn't be part of the value payload. Is this some sort of reason to >separate it out from the value? What if the value has multiple fields for >example. > > Some pros and cons as I see them > Pros: > >- More separation and flexibility for an SDK to specify labels >separately from the value/type. Though, maybe I don't understand enough, >and I am not so sure this is a con over just having the URN payload contain >everything in itself. > > Cons: > >- I think this means that the SDK must properly pick two separate >payloads and populate them correctly. We can run into issues where. > - Having one URN which specifies all the fields you would need to > populate for a specific metric avoids this, this was a concern brought > up > by Luke. The runner would then be responsible for packaging metrics up > to > send to external monitoring systems. > > > @Andrea, please correct me if I misunderstand > Thank you for the metric spec example in your last response, I think that > makes the idea much more clear. > > Using your approach I see the following pros and cons > Pros: > >- Runners have a cleaner more reusable codepath to forwarding metrics >to external monitoring systems. This will mean less work on the runner side >to support each metric (perhaps none in many cases). >- SDKs may need less code as well to package up new metrics. >- As long\ as we expect SDKs to only send cataloged/requested metrics, >we can avoid the issues of SDKs creating too many metrics, metrics the >runner/engine don't understand, etc. > > Cons: > >- Luke's concern with this approach was that this spec ends up boiling >down to just the name, in this case "my_timer". His concern is that with >many SDK implementations, we can have bugs using the wrong string name for >counters, or populating them with the wrong values. > - Note how the ParDoExecution time example in the doc lets you > build the group of metrics together, rather than reporting three > different > ones from SDK->Runner.
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
There are a few more confusing concepts in this thread *Name* - Name can mean a *"string name"* used to refer to a metric in a metrics system such as stackdriver, i.e. "ElementCount", "ExecutionTime" - Name can mean a set of *context* fields added to a counter, either embedded in a complex string, or in a structured name. Typically referring to *aggregation entities, *which define how the metric updates get aggregated into final metric values, i.e. all Metric updates with the same field are aggregated together. - e.g.my_ptransform_id-ElementCount - e.g.{ name : 'ElementCount', 'ptransform_name' : 'my_ptransform_id' } - The *URN* of a Metric, which identifies a proto to use in a payload field for the Metric and MetricSpec. Note: The string name, can literally be the URN value in most cases, except for metrics which can specify a separate name (i.e. user counters). @Robert, You have proposed that metrics should contain the following parts, I still don't fully understand what you mean by each one. - Name - Why is a name a URN + bytes payload? What type of name are you referring to, *string name*? *context*? *URN*? Or something else. - Entity - This is how the metric is aggregated together. If I understand you correctly. And you correctly point out that a singular entity is not sufficient, a set of labels may be more appropriate. - Value - *Are you saying this is just the metric value, not including any fields related to entity or name.* - Type - I am not clear at all on what this is or what it would look like. Are you referring to units, like milliseconds/seconds? Why it wouldn't be part of the value payload. Is this some sort of reason to separate it out from the value? What if the value has multiple fields for example. Some pros and cons as I see them Pros: - More separation and flexibility for an SDK to specify labels separately from the value/type. Though, maybe I don't understand enough, and I am not so sure this is a con over just having the URN payload contain everything in itself. Cons: - I think this means that the SDK must properly pick two separate payloads and populate them correctly. We can run into issues where. - Having one URN which specifies all the fields you would need to populate for a specific metric avoids this, this was a concern brought up by Luke. The runner would then be responsible for packaging metrics up to send to external monitoring systems. @Andrea, please correct me if I misunderstand Thank you for the metric spec example in your last response, I think that makes the idea much more clear. Using your approach I see the following pros and cons Pros: - Runners have a cleaner more reusable codepath to forwarding metrics to external monitoring systems. This will mean less work on the runner side to support each metric (perhaps none in many cases). - SDKs may need less code as well to package up new metrics. - As long\ as we expect SDKs to only send cataloged/requested metrics, we can avoid the issues of SDKs creating too many metrics, metrics the runner/engine don't understand, etc. Cons: - Luke's concern with this approach was that this spec ends up boiling down to just the name, in this case "my_timer". His concern is that with many SDK implementations, we can have bugs using the wrong string name for counters, or populating them with the wrong values. - Note how the ParDoExecution time example in the doc lets you build the group of metrics together, rather than reporting three different ones from SDK->Runner. This sort of thing can make it more clear how to fill in metrics in the SDK side. Then the RunnerHarness is responsible for packaging the metrics up for monitoring systems, not the SDK side. - Ruling out URNs+payloads altogether (Though, I don't think you are suggesting this) is less extensible for custom runners+sdks+engines. I.e. the table of I/O files example. It also rules out sending parameters for a metric from the runner->SDK. - Populating each metric spec in code in each Runner could be similarly error prone. Instead of just stating "urn:namespace:my_timer", you must specify this and each runner must get it correct: - { name: "my_timer" labels: { "ptransform" } type: GAUGE value_type: int64 units: SECONDS description: "Times my stuff" } - Would the MetricSpec be passed like that from the RunnerHarness to the SDK? This part I am not so clear on. - Do we want runners to accept and forward metrics they don't know about? That was another concern, was to not accept them until both the SDK and Runner have been updated to accept them. Consider the performance implications of an SDK sending a noisy metric. This being said, I think some of this can be mitigated. 1. Could a
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Thanks, Robert! I think my lack of clarity is around the MetricSpec. Maybe what's in my head and what's being proposed are the same thing. When I read that the MetricSpec describes the proto structure, that sound kind of complicated to me. But I may be misinterpreting it. What I picture is something like a MetricSpec that looks like (note: my picture looks a lot like Stackdriver :): { name: "my_timer" labels: { "ptransform" } type: GAUGE value_type: int64 units: SECONDS description: "Times my stuff" } Then metrics submitted would look like: { name: "my_timer" labels: {"ptransform": "MyTransform"} int_value: 100 } The simplicity coming from the fact that there's only one proto format for the spec and for the value. The only thing that varies are the entries in the map and the value field set. It's pretty easy to establish contracts around this type of spec and even generate protos for use the in SDK that make the expectations explicit. On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshawwrote: > On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: > >> >> Or just "beam:counter::" or even >> "beam:metric::" since metrics have a type separate from >> their name. >> > > I proposed keeping the "user" in there to avoid possible clashes with the > system namespaces. (No preference on counter vs. metric, I wasn't trying to > imply counter = SumInts) > > > On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler wrote: > >> I like the generalization from entity -> labels. I view the purpose of >> those fields to provide context. And labels feel like they supports a >> richer set of contexts. >> > > If we think such a generalization provides value, I'm fine with doing that > now, as sets or key-value maps, if we have good enough examples to justify > this. > > >> The URN concept gets a little tricky. I totally agree that the context >> fields should not be embedded in the name. >> There's a "name" which is the identifier that can be used to communicate >> what context values are supported / allowed for metrics with that name (for >> example, element_count expects a ptransform ID). But then there's the >> context. In Stackdriver, this context is a map of key-value pairs; the >> type is considered metadata associated with the name, but not communicated >> with the value. >> > > I'm not quite following you here. If context contains a ptransform id, > then it cannot be associated with a single name. > > >> Could the URN be "beam:namespace:name" and every metric have a map of >> key-value pairs for context? >> > > The URN is the name. Something like > "beam:metric:ptransform_execution_times:v1." > > >> Not sure where this fits in the discussion or if this is handled >> somewhere, but allowing for a metric configuration that's provided >> independently of the value allows for configuring "type", "units", etc in a >> uniform way without having to encode them in the metric name / value. >> Stackdriver expects each metric type has been configured ahead of time with >> these annotations / metadata. Then values are reported separately. For >> system metrics, the definitions can be packaged with the SDK. For user >> metrics, they'd be defined at runtime. >> > > This feels like the metrics spec, that specifies that the metric with > name/URN X has this type plus a bunch of other metadata (e.g. units, if > they're not implicit in the type? This gets into whether the type should be > Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...} + > units metadata). > > >> >> >> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: >> >>> >>> >>> On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw >>> wrote: >>> On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles wrote: > On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw > wrote: > >> Also, the only use for payloads is because "User Counter" is >> currently a single URN, rather than using the namespacing characteristics >> of URNs to map user names onto distinct metric names. >> > > Can they be URNs? I don't see value in having a "user metric" URN > where you then have to look elsewhere for what the real name is. > Yes, that was my point with the parenthetical statement. I would rather have "beam:counter:user:use_provide_namespace:user_provide_name" than use the payload field for this. So if we're going to keep the payload field, we need more compelling usecases. >>> >>> Or just "beam:counter::" or even >>> "beam:metric::" since metrics have a type separate from >>> their name. >>> >>> Kenn >>> >>> A payload avoids the messiness of having to pack (and parse) arbitrary > parameters into a name though.) If we're going to choose names that the > system and sdks agree to have specific meanings, and to avoid accidental > collisions, making
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowleswrote: > > Or just "beam:counter::" or even > "beam:metric::" since metrics have a type separate from > their name. > I proposed keeping the "user" in there to avoid possible clashes with the system namespaces. (No preference on counter vs. metric, I wasn't trying to imply counter = SumInts) On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler wrote: > I like the generalization from entity -> labels. I view the purpose of > those fields to provide context. And labels feel like they supports a > richer set of contexts. > If we think such a generalization provides value, I'm fine with doing that now, as sets or key-value maps, if we have good enough examples to justify this. > The URN concept gets a little tricky. I totally agree that the context > fields should not be embedded in the name. > There's a "name" which is the identifier that can be used to communicate > what context values are supported / allowed for metrics with that name (for > example, element_count expects a ptransform ID). But then there's the > context. In Stackdriver, this context is a map of key-value pairs; the > type is considered metadata associated with the name, but not communicated > with the value. > I'm not quite following you here. If context contains a ptransform id, then it cannot be associated with a single name. > Could the URN be "beam:namespace:name" and every metric have a map of > key-value pairs for context? > The URN is the name. Something like "beam:metric:ptransform_execution_times:v1." > Not sure where this fits in the discussion or if this is handled > somewhere, but allowing for a metric configuration that's provided > independently of the value allows for configuring "type", "units", etc in a > uniform way without having to encode them in the metric name / value. > Stackdriver expects each metric type has been configured ahead of time with > these annotations / metadata. Then values are reported separately. For > system metrics, the definitions can be packaged with the SDK. For user > metrics, they'd be defined at runtime. > This feels like the metrics spec, that specifies that the metric with name/URN X has this type plus a bunch of other metadata (e.g. units, if they're not implicit in the type? This gets into whether the type should be Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...} + units metadata). > > > On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: > >> >> >> On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw >> wrote: >> >>> On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles wrote: >>> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw wrote: > Also, the only use for payloads is because "User Counter" is currently > a single URN, rather than using the namespacing characteristics of URNs to > map user names onto distinct metric names. > Can they be URNs? I don't see value in having a "user metric" URN where you then have to look elsewhere for what the real name is. >>> >>> Yes, that was my point with the parenthetical statement. I would rather >>> have "beam:counter:user:use_provide_namespace:user_provide_name" than use >>> the payload field for this. So if we're going to keep the payload field, we >>> need more compelling usecases. >>> >> >> Or just "beam:counter::" or even >> "beam:metric::" since metrics have a type separate from >> their name. >> >> Kenn >> >> >>> A payload avoids the messiness of having to pack (and parse) arbitrary parameters into a name though.) If we're going to choose names that the system and sdks agree to have specific meanings, and to avoid accidental collisions, making them full-fledged documented URNs has value. >>> Value is the "payload". Likely worth changing the name to avoid confusion with the payload above. It's bytes because it depends on the type. I would try to avoid nesting it too deeply (e.g. a payload within a payload). If we thing the types are generally limited, another option would be a oneof field (with a bytes option just in case) for transparency. There are pros and cons going this route. Type is what I proposed we add, instead of it being implicit in the name (and unknowable if one does not recognize the name). This makes things more open-ended and easier to evolve and work with. Entity could be generalized to Label, or LabelSet if desired. But as mentioned I think it makes sense to pull this out as a separate field, especially when it makes sense to aggregate a single named counter across labels as well as for a single label (e.g. execution time of composite transforms). - Robert On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler wrote: > Hi
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
I like the generalization from entity -> labels. I view the purpose of those fields to provide context. And labels feel like they supports a richer set of contexts. The URN concept gets a little tricky. I totally agree that the context fields should not be embedded in the name. There's a "name" which is the identifier that can be used to communicate what context values are supported / allowed for metrics with that name (for example, element_count expects a ptransform ID). But then there's the context. In Stackdriver, this context is a map of key-value pairs; the type is considered metadata associated with the name, but not communicated with the value. Could the URN be "beam:namespace:name" and every metric have a map of key-value pairs for context? Not sure where this fits in the discussion or if this is handled somewhere, but allowing for a metric configuration that's provided independently of the value allows for configuring "type", "units", etc in a uniform way without having to encode them in the metric name / value. Stackdriver expects each metric type has been configured ahead of time with these annotations / metadata. Then values are reported separately. For system metrics, the definitions can be packaged with the SDK. For user metrics, they'd be defined at runtime. On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowleswrote: > > > On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles wrote: >> >>> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw >>> wrote: >>> Also, the only use for payloads is because "User Counter" is currently a single URN, rather than using the namespacing characteristics of URNs to map user names onto distinct metric names. >>> >>> Can they be URNs? I don't see value in having a "user metric" URN where >>> you then have to look elsewhere for what the real name is. >>> >> >> Yes, that was my point with the parenthetical statement. I would rather >> have "beam:counter:user:use_provide_namespace:user_provide_name" than use >> the payload field for this. So if we're going to keep the payload field, we >> need more compelling usecases. >> > > Or just "beam:counter::" or even > "beam:metric::" since metrics have a type separate from > their name. > > Kenn > > >> A payload avoids the messiness of having to pack (and parse) arbitrary >>> parameters into a name though.) If we're going to choose names that the >>> system and sdks agree to have specific meanings, and to avoid accidental >>> collisions, making them full-fledged documented URNs has value. >>> >> >>> Value is the "payload". Likely worth changing the name to avoid >>> confusion with the payload above. It's bytes because it depends on the >>> type. I would try to avoid nesting it too deeply (e.g. a payload within a >>> payload). If we thing the types are generally limited, another option would >>> be a oneof field (with a bytes option just in case) for transparency. There >>> are pros and cons going this route. >>> >>> Type is what I proposed we add, instead of it being implicit in the name >>> (and unknowable if one does not recognize the name). This makes things more >>> open-ended and easier to evolve and work with. >>> >>> Entity could be generalized to Label, or LabelSet if desired. But as >>> mentioned I think it makes sense to pull this out as a separate field, >>> especially when it makes sense to aggregate a single named counter across >>> labels as well as for a single label (e.g. execution time of composite >>> transforms). >>> >>> - Robert >>> >>> >>> >>> On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler >>> wrote: >>> Hi folks - Before we totally go down the path of highly structured metric protos, I'd like to propose considering a simple metrics interface between the SDK and the runner. Something more generic and closer to what most monitoring systems would use. To use Spark as an example, the Metric system uses a simple metric format of name, value and type to report all metrics in a single structure, regardless of the source or context of the metric. https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html The subsystems have contracts for what metrics they will expose and how they are calculated: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html Codifying the system metrics in the SDK seems perfectly reasonable - no reason to make the notion of metric generic at that level. But at the point the metric is leaving the SDK and going to the runner, a simpler, generic encoding of the metrics might make it easier to adapt and maintain system. The generic format can include information about downstream consumers, if that's
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshawwrote: > On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles wrote: > >> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw >> wrote: >> >>> Also, the only use for payloads is because "User Counter" is currently a >>> single URN, rather than using the namespacing characteristics of URNs to >>> map user names onto distinct metric names. >>> >> >> Can they be URNs? I don't see value in having a "user metric" URN where >> you then have to look elsewhere for what the real name is. >> > > Yes, that was my point with the parenthetical statement. I would rather > have "beam:counter:user:use_provide_namespace:user_provide_name" than use > the payload field for this. So if we're going to keep the payload field, we > need more compelling usecases. > Or just "beam:counter::" or even "beam:metric::" since metrics have a type separate from their name. Kenn > A payload avoids the messiness of having to pack (and parse) arbitrary >> parameters into a name though.) If we're going to choose names that the >> system and sdks agree to have specific meanings, and to avoid accidental >> collisions, making them full-fledged documented URNs has value. >> > >> Value is the "payload". Likely worth changing the name to avoid confusion >> with the payload above. It's bytes because it depends on the type. I would >> try to avoid nesting it too deeply (e.g. a payload within a payload). If we >> thing the types are generally limited, another option would be a oneof >> field (with a bytes option just in case) for transparency. There are pros >> and cons going this route. >> >> Type is what I proposed we add, instead of it being implicit in the name >> (and unknowable if one does not recognize the name). This makes things more >> open-ended and easier to evolve and work with. >> >> Entity could be generalized to Label, or LabelSet if desired. But as >> mentioned I think it makes sense to pull this out as a separate field, >> especially when it makes sense to aggregate a single named counter across >> labels as well as for a single label (e.g. execution time of composite >> transforms). >> >> - Robert >> >> >> >> On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler >> wrote: >> >>> Hi folks - >>> >>> Before we totally go down the path of highly structured metric protos, >>> I'd like to propose considering a simple metrics interface between the SDK >>> and the runner. Something more generic and closer to what most monitoring >>> systems would use. >>> >>> To use Spark as an example, the Metric system uses a simple metric >>> format of name, value and type to report all metrics in a single structure, >>> regardless of the source or context of the metric. >>> >>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html >>> >>> The subsystems have contracts for what metrics they will expose and how >>> they are calculated: >>> >>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html >>> >>> Codifying the system metrics in the SDK seems perfectly reasonable - no >>> reason to make the notion of metric generic at that level. But at the >>> point the metric is leaving the SDK and going to the runner, a simpler, >>> generic encoding of the metrics might make it easier to adapt and maintain >>> system. The generic format can include information about downstream >>> consumers, if that's useful. >>> >>> Spark supports a number of Metric Sinks - external monitoring systems. >>> If runners receive a simple list of metrics, implementing any number of >>> Sinks for Beam would be straightforward and would generally be a one time >>> implementation. If instead all system metrics are sent embedded in a >>> highly structured, semantically meaningful structure, runner code would >>> need to be updated to support exporting the new metric. We seem to be >>> heading in the direction of "if you don't understand this metric, you can't >>> use it / export it". But most systems seem to assume metrics are really >>> simple named values that can be handled a priori. >>> >>> So I guess my primary question is: Is it necessary for Beam to treat >>> metrics as highly semantic, arbitrarily complex data? Or could they >>> possibly be the sort of simple named values as they are in most monitoring >>> systems and in Spark? With the SDK potentially providing scaffolding to >>> add meaning and structure, but simplifying that out before leaving SDK >>> code. Is the coupling to a semantically meaningful structure between the >>> SDK and runner and necessary complexity? >>> >>> Andrea >>> >>> >>> >>> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw >>> wrote: >>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato wrote: > > *Thank you for this clarification. I think the table of files fits > into the
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowleswrote: > On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw > wrote: > >> Also, the only use for payloads is because "User Counter" is currently a >> single URN, rather than using the namespacing characteristics of URNs to >> map user names onto distinct metric names. >> > > Can they be URNs? I don't see value in having a "user metric" URN where > you then have to look elsewhere for what the real name is. > Yes, that was my point with the parenthetical statement. I would rather have "beam:counter:user:use_provide_namespace:user_provide_name" than use the payload field for this. So if we're going to keep the payload field, we need more compelling usecases. > A payload avoids the messiness of having to pack (and parse) arbitrary > parameters into a name though.) If we're going to choose names that the > system and sdks agree to have specific meanings, and to avoid accidental > collisions, making them full-fledged documented URNs has value. > > Value is the "payload". Likely worth changing the name to avoid confusion > with the payload above. It's bytes because it depends on the type. I would > try to avoid nesting it too deeply (e.g. a payload within a payload). If we > thing the types are generally limited, another option would be a oneof > field (with a bytes option just in case) for transparency. There are pros > and cons going this route. > > Type is what I proposed we add, instead of it being implicit in the name > (and unknowable if one does not recognize the name). This makes things more > open-ended and easier to evolve and work with. > > Entity could be generalized to Label, or LabelSet if desired. But as > mentioned I think it makes sense to pull this out as a separate field, > especially when it makes sense to aggregate a single named counter across > labels as well as for a single label (e.g. execution time of composite > transforms). > > - Robert > > > > On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler > wrote: > >> Hi folks - >> >> Before we totally go down the path of highly structured metric protos, >> I'd like to propose considering a simple metrics interface between the SDK >> and the runner. Something more generic and closer to what most monitoring >> systems would use. >> >> To use Spark as an example, the Metric system uses a simple metric format >> of name, value and type to report all metrics in a single structure, >> regardless of the source or context of the metric. >> >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html >> >> The subsystems have contracts for what metrics they will expose and how >> they are calculated: >> >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html >> >> Codifying the system metrics in the SDK seems perfectly reasonable - no >> reason to make the notion of metric generic at that level. But at the >> point the metric is leaving the SDK and going to the runner, a simpler, >> generic encoding of the metrics might make it easier to adapt and maintain >> system. The generic format can include information about downstream >> consumers, if that's useful. >> >> Spark supports a number of Metric Sinks - external monitoring systems. >> If runners receive a simple list of metrics, implementing any number of >> Sinks for Beam would be straightforward and would generally be a one time >> implementation. If instead all system metrics are sent embedded in a >> highly structured, semantically meaningful structure, runner code would >> need to be updated to support exporting the new metric. We seem to be >> heading in the direction of "if you don't understand this metric, you can't >> use it / export it". But most systems seem to assume metrics are really >> simple named values that can be handled a priori. >> >> So I guess my primary question is: Is it necessary for Beam to treat >> metrics as highly semantic, arbitrarily complex data? Or could they >> possibly be the sort of simple named values as they are in most monitoring >> systems and in Spark? With the SDK potentially providing scaffolding to >> add meaning and structure, but simplifying that out before leaving SDK >> code. Is the coupling to a semantically meaningful structure between the >> SDK and runner and necessary complexity? >> >> Andrea >> >> >> >> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw >> wrote: >> >>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato wrote: >>> *Thank you for this clarification. I think the table of files fits into the model as one of type string-set (with union as aggregation). * Its not a list of files, its a list of metadata for each file, several pieces of data per file. Are you proposing that there would be separate URNs as well for each entity being measured then, so the the URN
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshawwrote: > Also, the only use for payloads is because "User Counter" is currently a > single URN, rather than using the namespacing characteristics of URNs to > map user names onto distinct metric names. > Can they be URNs? I don't see value in having a "user metric" URN where you then have to look elsewhere for what the real name is. Kenn A payload avoids the messiness of having to pack (and parse) arbitrary > parameters into a name though.) If we're going to choose names that the > system and sdks agree to have specific meanings, and to avoid accidental > collisions, making them full-fledged documented URNs has value. > > Value is the "payload". Likely worth changing the name to avoid confusion > with the payload above. It's bytes because it depends on the type. I would > try to avoid nesting it too deeply (e.g. a payload within a payload). If we > thing the types are generally limited, another option would be a oneof > field (with a bytes option just in case) for transparency. There are pros > and cons going this route. > > Type is what I proposed we add, instead of it being implicit in the name > (and unknowable if one does not recognize the name). This makes things more > open-ended and easier to evolve and work with. > > Entity could be generalized to Label, or LabelSet if desired. But as > mentioned I think it makes sense to pull this out as a separate field, > especially when it makes sense to aggregate a single named counter across > labels as well as for a single label (e.g. execution time of composite > transforms). > > - Robert > > > > On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler > wrote: > >> Hi folks - >> >> Before we totally go down the path of highly structured metric protos, >> I'd like to propose considering a simple metrics interface between the SDK >> and the runner. Something more generic and closer to what most monitoring >> systems would use. >> >> To use Spark as an example, the Metric system uses a simple metric format >> of name, value and type to report all metrics in a single structure, >> regardless of the source or context of the metric. >> >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html >> >> The subsystems have contracts for what metrics they will expose and how >> they are calculated: >> >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html >> >> Codifying the system metrics in the SDK seems perfectly reasonable - no >> reason to make the notion of metric generic at that level. But at the >> point the metric is leaving the SDK and going to the runner, a simpler, >> generic encoding of the metrics might make it easier to adapt and maintain >> system. The generic format can include information about downstream >> consumers, if that's useful. >> >> Spark supports a number of Metric Sinks - external monitoring systems. >> If runners receive a simple list of metrics, implementing any number of >> Sinks for Beam would be straightforward and would generally be a one time >> implementation. If instead all system metrics are sent embedded in a >> highly structured, semantically meaningful structure, runner code would >> need to be updated to support exporting the new metric. We seem to be >> heading in the direction of "if you don't understand this metric, you can't >> use it / export it". But most systems seem to assume metrics are really >> simple named values that can be handled a priori. >> >> So I guess my primary question is: Is it necessary for Beam to treat >> metrics as highly semantic, arbitrarily complex data? Or could they >> possibly be the sort of simple named values as they are in most monitoring >> systems and in Spark? With the SDK potentially providing scaffolding to >> add meaning and structure, but simplifying that out before leaving SDK >> code. Is the coupling to a semantically meaningful structure between the >> SDK and runner and necessary complexity? >> >> Andrea >> >> >> >> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw >> wrote: >> >>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato wrote: >>> *Thank you for this clarification. I think the table of files fits into the model as one of type string-set (with union as aggregation). * Its not a list of files, its a list of metadata for each file, several pieces of data per file. Are you proposing that there would be separate URNs as well for each entity being measured then, so the the URN defines the type of entity being measured. "urn.beam.metrics.PCollectionByteCount" is a URN for always for PCollection entities "urn.beam.metrics.PTransformExecutionTime" is a URN is always for PTransform entities >>> >>> Yes. FWIW, it may not even be needed to put this in the name, e.g. >>> execution times are
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
+1 to keeping things simple, both in code and the model to understand. I like thinking of things as (name, value, type) triples. Historically, we've packed the entity name (e.g. PTransform name) into the string name field and parsed it out in various places; I think it's worth pulling this out and making it explicit instead, so metrics would be (name, entity, value, type) tuples. In the current proposal: Name is the URN + a possible bytes payload. (Actually, it's a bit unclear if there's any relationship between counters with the same name and different payloads. Also, the only use for payloads is because "User Counter" is currently a single URN, rather than using the namespacing characteristics of URNs to map user names onto distinct metric names. A payload avoids the messiness of having to pack (and parse) arbitrary parameters into a name though.) If we're going to choose names that the system and sdks agree to have specific meanings, and to avoid accidental collisions, making them full-fledged documented URNs has value. Value is the "payload". Likely worth changing the name to avoid confusion with the payload above. It's bytes because it depends on the type. I would try to avoid nesting it too deeply (e.g. a payload within a payload). If we thing the types are generally limited, another option would be a oneof field (with a bytes option just in case) for transparency. There are pros and cons going this route. Type is what I proposed we add, instead of it being implicit in the name (and unknowable if one does not recognize the name). This makes things more open-ended and easier to evolve and work with. Entity could be generalized to Label, or LabelSet if desired. But as mentioned I think it makes sense to pull this out as a separate field, especially when it makes sense to aggregate a single named counter across labels as well as for a single label (e.g. execution time of composite transforms). - Robert On Fri, Apr 13, 2018 at 12:36 PM Andrea Foeglerwrote: > Hi folks - > > Before we totally go down the path of highly structured metric protos, I'd > like to propose considering a simple metrics interface between the SDK and > the runner. Something more generic and closer to what most monitoring > systems would use. > > To use Spark as an example, the Metric system uses a simple metric format > of name, value and type to report all metrics in a single structure, > regardless of the source or context of the metric. > > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html > > The subsystems have contracts for what metrics they will expose and how > they are calculated: > > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html > > Codifying the system metrics in the SDK seems perfectly reasonable - no > reason to make the notion of metric generic at that level. But at the > point the metric is leaving the SDK and going to the runner, a simpler, > generic encoding of the metrics might make it easier to adapt and maintain > system. The generic format can include information about downstream > consumers, if that's useful. > > Spark supports a number of Metric Sinks - external monitoring systems. If > runners receive a simple list of metrics, implementing any number of Sinks > for Beam would be straightforward and would generally be a one time > implementation. If instead all system metrics are sent embedded in a > highly structured, semantically meaningful structure, runner code would > need to be updated to support exporting the new metric. We seem to be > heading in the direction of "if you don't understand this metric, you can't > use it / export it". But most systems seem to assume metrics are really > simple named values that can be handled a priori. > > So I guess my primary question is: Is it necessary for Beam to treat > metrics as highly semantic, arbitrarily complex data? Or could they > possibly be the sort of simple named values as they are in most monitoring > systems and in Spark? With the SDK potentially providing scaffolding to > add meaning and structure, but simplifying that out before leaving SDK > code. Is the coupling to a semantically meaningful structure between the > SDK and runner and necessary complexity? > > Andrea > > > > On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato wrote: >> >>> >>> *Thank you for this clarification. I think the table of files fits into >>> the model as one of type string-set (with union as aggregation). * >>> Its not a list of files, its a list of metadata for each file, several >>> pieces of data per file. >>> >>> Are you proposing that there would be separate URNs as well for each >>> entity being measured then, so the the URN defines the type of entity being >>> measured. >>> "urn.beam.metrics.PCollectionByteCount" is a URN for
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 10:10 AM Alex Amatowrote: > > *Thank you for this clarification. I think the table of files fits into > the model as one of type string-set (with union as aggregation). * > Its not a list of files, its a list of metadata for each file, several > pieces of data per file. > > Are you proposing that there would be separate URNs as well for each > entity being measured then, so the the URN defines the type of entity being > measured. > "urn.beam.metrics.PCollectionByteCount" is a URN for always for > PCollection entities > "urn.beam.metrics.PTransformExecutionTime" is a URN is always for > PTransform entities > Yes. FWIW, it may not even be needed to put this in the name, e.g. execution times are never for PCollections, and even if they were it'd be semantically a very different beast (which should not re-use the same URN). *message MetricSpec {* > * // (Required) A URN that describes the accompanying payload.* > * // For any URN that is not recognized (by whomever is inspecting* > * // it) the parameter payload should be treated as opaque and* > * // passed as-is.* > * string urn = 1;* > > * // (Optional) The data specifying any parameters to the URN. If* > * // the URN does not require any arguments, this may be omitted.* > * bytes parameters_payload = 2;* > > * // (Required) A URN that describes the type of values this metric* > * // records (e.g. durations that should be summed).* > *}* > > *message Metric[Values] {* > * // (Required) The original requesting MetricSpec.* > * MetricSpec metric_spec = 1;* > > * // A mapping of entities to (encoded) values.* > * map values;* > This ignores the non-unqiueness of entity identifiers. This is why in my > doc, I have specified the entity type and its string identifier > @Ken, I believe you have pointed this out in the past, that uniqueness is > only guaranteed within a type of entity (all PCollections), but not between > entities (A Pcollection and PTransform may have the same identifier). > See above for why this is not an issue. The extra complexity (in protos and code), the inability to use them as map keys, and the fact that they'll be 100% redundant for all entities for a given metric convinces me that it's not worth creating and tracking an enum for the type alongside the id. > *}* > > On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles wrote: >> >>> >>> To Robert's proto: >>> >>> // A mapping of entities to (encoded) values. map values; >>> >>> Are the keys here the names of the metrics, aka what is used for URNs in >>> the doc? >>> >> They're the entities to which a metric is attached, e.g. a PTransform, a >> PCollection, or perhaps a process/worker. >> >> >>> } On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles wrote: > >> Agree with all of this. It echoes a thread on the doc that I was >> going to bring here. Let's keep it simple and use concrete use cases to >> drive additional abstraction if/when it becomes compelling. >> >> Kenn >> >> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers >> wrote: >> >>> Sounds perfect. Just wanted to make sure that "custom metrics of >>> supported type" didn't include new ways of aggregating ints. As long as >>> that means we have a fixed set of aggregations (that align with what >>> what >>> users want and metrics back end support) it seems like we are doing user >>> metrics right. >>> >>> - Ben >>> >>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> Maybe leave it out until proven it is needed. ATM counters are used a lot but others are less mainstream so being too fine from the start can just add complexity and bugs in impls IMHO. Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : > By "type" of metric, I mean both the data types (including their > encoding) and accumulator strategy. So sumint would be a type, as > would > double-distribution. > > On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < > bjchamb...@gmail.com> wrote: > >> When you say type do you mean accumulator type, result type, or >> accumulator strategy? Specifically, what is the "type" of sumint, >> sumlong, >> meanlong, etc? >> >> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >> rober...@google.com> wrote: >> >>> Fully custom metric types is the "more speculative and >>> difficult" feature that I was proposing we kick down the road (and >>> may >>> never get to). What I'm suggesting is that we support custom >>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowleswrote: > > To Robert's proto: > > // A mapping of entities to (encoded) values. >> map values; >> > > Are the keys here the names of the metrics, aka what is used for URNs in > the doc? > >> They're the entities to which a metric is attached, e.g. a PTransform, a PCollection, or perhaps a process/worker. > } >> >> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles wrote: >>> Agree with all of this. It echoes a thread on the doc that I was going to bring here. Let's keep it simple and use concrete use cases to drive additional abstraction if/when it becomes compelling. Kenn On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers wrote: > Sounds perfect. Just wanted to make sure that "custom metrics of > supported type" didn't include new ways of aggregating ints. As long as > that means we have a fixed set of aggregations (that align with what what > users want and metrics back end support) it seems like we are doing user > metrics right. > > - Ben > > On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < > rmannibu...@gmail.com> wrote: > >> Maybe leave it out until proven it is needed. ATM counters are used a >> lot but others are less mainstream so being too fine from the start can >> just add complexity and bugs in impls IMHO. >> >> Le 12 avr. 2018 08:06, "Robert Bradshaw" a >> écrit : >> >>> By "type" of metric, I mean both the data types (including their >>> encoding) and accumulator strategy. So sumint would be a type, as would >>> double-distribution. >>> >>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers >>> wrote: >>> When you say type do you mean accumulator type, result type, or accumulator strategy? Specifically, what is the "type" of sumint, sumlong, meanlong, etc? On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw wrote: > Fully custom metric types is the "more speculative and difficult" > feature that I was proposing we kick down the road (and may never get > to). > What I'm suggesting is that we support custom metrics of standard > type. > > On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers > wrote: > >> The metric api is designed to prevent user defined metric types >> based on the fact they just weren't used enough to justify support. >> >> Is there a reason we are bringing that complexity back? Shouldn't >> we just need the ability for the standard set plus any special system >> metrivs? >> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >> rober...@google.com> wrote: >> >>> Thanks. I think this has simplified things. >>> >>> One thing that has occurred to me is that we're conflating the >>> idea of custom metrics and custom metric types. I would propose >>> the MetricSpec field be augmented with an additional field "type" >>> which is >>> a urn specifying the type of metric it is (i.e. the contents of its >>> payload, as well as the form of aggregation). Summing or maxing >>> over ints >>> would be a typical example. Though we could pursue making this >>> opaque to >>> the runner in the long run, that's a more speculative (and >>> difficult) >>> feature to tackle. This would allow the runner to at least >>> aggregate and >>> report/return to the SDK metrics that it did not itself understand >>> the >>> semantic meaning of. (It would probably simplify much of the >>> specialization >>> in the runner itself for metrics that it *did* understand as well.) >>> >>> In addition, rather than having UserMetricOfTypeX for every type >>> X one would have a single URN for UserMetric and it spec would >>> designate >>> the type and payload designate the (qualified) name. >>> >>> - Robert >>> >>> >>> >>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>> wrote: >>> Thank you everyone for your feedback so far. I have made a revision today which is to make all metrics refer to a primary entity, so I have restructured some of the protos a little bit. The point of this change was to futureproof the possibility of allowing custom user metrics, with custom aggregation functions for its metric updates. Now that each metric has an aggregation_entity
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Thu, Apr 12, 2018 at 8:17 PM Alex Amatowrote: > I agree that there is some confusion about concepts. Here are several > concepts which have come up in discussions, as I see them (not official > names). > > *Metric* > >- For the purposes of my document, I have been referring to a Metric >as any sort of information the SDK can send to the Runner > - This does not mean only quantitative, aggregated values. > - This can include other useful '*monitoring information*', for > supporting debugging/monitoring scenarios such as > - A table of files which are not yet finished reading, causing a > streaming pipeline to be blocked > - It has been pointed out to me, that when many people hear metric, >a very specific thing comes to mind, in particular quantitative, >aggregated values. *That is NOT what my document is limited to. I >consider both that type of metric, and more arbitrary 'monitoring >information', like a table of files with statuses in the proposal.* >- Perhaps there should be another word for this concept, yet I have >not yet come up with a good one, "monitoring information", "monitoring >item" perhaps. > > > *Metric types/Metric classes* > >- A collection of information reported on >ProcessBundleProgressResponse and ProcessBundleResponse from the SDK to the >RunnerHarness. > - e.g. execution time of par do functions. >- In my proposal they are defined by a URN and two structs which are >serialized into a MetricSpec and Metric bytes payload field, for requesting >and responding to the metrics. > - e.g. beam:metric:ptransform_execution_times:v1 defines the > information needed to describe how a ptransform >- All metrics which are passed across the FN API have a *metric type* > > > *User metrics* > >- A metric added by a pipeline writer, using an SDK API to create >these. >- In my proposal the various *UserMetric types are a Metric Type. * > - e.g. “urn:beam:metric:user_distribution_data:v1” and > “urn:beam:metric:user_counter_data:v1” > define two metric types for packaging these user metrics and > communicating them across the FN API. > - SDK writers would need to write code to package the user metrics > from SDK API calls into their associated metric types to send them > across > the FN API. > > *Custom metric types* > >- A metric type which is not included in a catalog of first class beam >metrics. This can be thought of as metrics a custom engine+runner+sdk >(system as a whole) collects which is not part of the beam model. > - e.g. a closed source runner can define its own URNs and metrics, > extending the beam model > - for example an I/O source specific to a closed source > engine+runner+sdk may export a table of files it is reading with > statuses > as a custom metric type > > > *Custom User Metrics with Custom Metric Types * > >- Not proposed to support by the doc >- A user specified metric, written by a pipeline writer with a custom >metric type, likely would be implemented using a general mechanism to >attach the custom metric. >- May have a custom user specified aggregation function as well. > > > *Reporting metrics to external systems such as drop wizard* > >- My doc does not specifically cover this, it assumes that a runner >harness would be responsible for reporting metrics in formats specific to >those external systems, such as Drop Wizard. It assumes that the >URNs+Metric types provided will be specified enough so that it would be >possible to make such a translation. >- Each metric type would need to be handled in the RunnerHarness, to >collect and report the metric to an external system >- Some concern has come up about this, and if this should dictate the >format of the metrics which the SDK sends to the RunnerHarness of the FN >API, rather than using the more custom URN+payload approach. >- Though there could be URNs specifically designed to do this, the > intention of the design in the doc is to not require SDKs to give string > "names" to metrics, just to fill in URN payloads, and the Runner Harness > will pick names for metrics if needed to send to external systems. > > Just wanted to clarify this a bit. I hope the example of the table of > files being a more complex metric type describes the usage of custom metric > types. I'll update the doc with this > Thank you for this clarification. I think the table of files fits into the model as one of type string-set (with union as aggregation). > @Robert, I am not sure if you are proposing anything that is not in the > current form of the doc. > Yes, I am. Currently, the URN of the metric spec specifies both (1) the semantic meaning of this metric (i.e. what exactly is being instrumented, whether that be
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
I agree that there is some confusion about concepts. Here are several concepts which have come up in discussions, as I see them (not official names). *Metric* - For the purposes of my document, I have been referring to a Metric as any sort of information the SDK can send to the Runner - This does not mean only quantitative, aggregated values. - This can include other useful '*monitoring information*', for supporting debugging/monitoring scenarios such as - A table of files which are not yet finished reading, causing a streaming pipeline to be blocked - It has been pointed out to me, that when many people hear metric, a very specific thing comes to mind, in particular quantitative, aggregated values. *That is NOT what my document is limited to. I consider both that type of metric, and more arbitrary 'monitoring information', like a table of files with statuses in the proposal.* - Perhaps there should be another word for this concept, yet I have not yet come up with a good one, "monitoring information", "monitoring item" perhaps. *Metric types/Metric classes* - A collection of information reported on ProcessBundleProgressResponse and ProcessBundleResponse from the SDK to the RunnerHarness. - e.g. execution time of par do functions. - In my proposal they are defined by a URN and two structs which are serialized into a MetricSpec and Metric bytes payload field, for requesting and responding to the metrics. - e.g. beam:metric:ptransform_execution_times:v1 defines the information needed to describe how a ptransform - All metrics which are passed across the FN API have a *metric type* *User metrics* - A metric added by a pipeline writer, using an SDK API to create these. - In my proposal the various *UserMetric types are a Metric Type. * - e.g. “urn:beam:metric:user_distribution_data:v1” and “urn:beam:metric:user_counter_data:v1” define two metric types for packaging these user metrics and communicating them across the FN API. - SDK writers would need to write code to package the user metrics from SDK API calls into their associated metric types to send them across the FN API. *Custom metric types* - A metric type which is not included in a catalog of first class beam metrics. This can be thought of as metrics a custom engine+runner+sdk (system as a whole) collects which is not part of the beam model. - e.g. a closed source runner can define its own URNs and metrics, extending the beam model - for example an I/O source specific to a closed source engine+runner+sdk may export a table of files it is reading with statuses as a custom metric type *Custom User Metrics with Custom Metric Types * - Not proposed to support by the doc - A user specified metric, written by a pipeline writer with a custom metric type, likely would be implemented using a general mechanism to attach the custom metric. - May have a custom user specified aggregation function as well. *Reporting metrics to external systems such as drop wizard* - My doc does not specifically cover this, it assumes that a runner harness would be responsible for reporting metrics in formats specific to those external systems, such as Drop Wizard. It assumes that the URNs+Metric types provided will be specified enough so that it would be possible to make such a translation. - Each metric type would need to be handled in the RunnerHarness, to collect and report the metric to an external system - Some concern has come up about this, and if this should dictate the format of the metrics which the SDK sends to the RunnerHarness of the FN API, rather than using the more custom URN+payload approach. - Though there could be URNs specifically designed to do this, the intention of the design in the doc is to not require SDKs to give string "names" to metrics, just to fill in URN payloads, and the Runner Harness will pick names for metrics if needed to send to external systems. Just wanted to clarify this a bit. I hope the example of the table of files being a more complex metric type describes the usage of custom metric types. I'll update the doc with this @Robert, I am not sure if you are proposing anything that is not in the current form of the doc. On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowleswrote: > Agree with all of this. It echoes a thread on the doc that I was going to > bring here. Let's keep it simple and use concrete use cases to drive > additional abstraction if/when it becomes compelling. > > Kenn > > On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers wrote: > >> Sounds perfect. Just wanted to make sure that "custom metrics of >> supported type" didn't include new ways of aggregating ints. As long as >> that means we have a fixed set of aggregations
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Agree with all of this. It echoes a thread on the doc that I was going to bring here. Let's keep it simple and use concrete use cases to drive additional abstraction if/when it becomes compelling. Kenn On Thu, Apr 12, 2018 at 9:21 AM Ben Chamberswrote: > Sounds perfect. Just wanted to make sure that "custom metrics of supported > type" didn't include new ways of aggregating ints. As long as that means we > have a fixed set of aggregations (that align with what what users want and > metrics back end support) it seems like we are doing user metrics right. > > - Ben > > On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau > wrote: > >> Maybe leave it out until proven it is needed. ATM counters are used a lot >> but others are less mainstream so being too fine from the start can just >> add complexity and bugs in impls IMHO. >> >> Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : >> >>> By "type" of metric, I mean both the data types (including their >>> encoding) and accumulator strategy. So sumint would be a type, as would >>> double-distribution. >>> >>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers >>> wrote: >>> When you say type do you mean accumulator type, result type, or accumulator strategy? Specifically, what is the "type" of sumint, sumlong, meanlong, etc? On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw wrote: > Fully custom metric types is the "more speculative and difficult" > feature that I was proposing we kick down the road (and may never get to). > What I'm suggesting is that we support custom metrics of standard type. > > On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers > wrote: > >> The metric api is designed to prevent user defined metric types based >> on the fact they just weren't used enough to justify support. >> >> Is there a reason we are bringing that complexity back? Shouldn't we >> just need the ability for the standard set plus any special system >> metrivs? >> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >> wrote: >> >>> Thanks. I think this has simplified things. >>> >>> One thing that has occurred to me is that we're conflating the idea >>> of custom metrics and custom metric types. I would propose the >>> MetricSpec >>> field be augmented with an additional field "type" which is a urn >>> specifying the type of metric it is (i.e. the contents of its payload, >>> as >>> well as the form of aggregation). Summing or maxing over ints would be a >>> typical example. Though we could pursue making this opaque to the >>> runner in >>> the long run, that's a more speculative (and difficult) feature to >>> tackle. >>> This would allow the runner to at least aggregate and report/return to >>> the >>> SDK metrics that it did not itself understand the semantic meaning of. >>> (It >>> would probably simplify much of the specialization in the runner itself >>> for >>> metrics that it *did* understand as well.) >>> >>> In addition, rather than having UserMetricOfTypeX for every type X >>> one would have a single URN for UserMetric and it spec would designate >>> the >>> type and payload designate the (qualified) name. >>> >>> - Robert >>> >>> >>> >>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>> wrote: >>> Thank you everyone for your feedback so far. I have made a revision today which is to make all metrics refer to a primary entity, so I have restructured some of the protos a little bit. The point of this change was to futureproof the possibility of allowing custom user metrics, with custom aggregation functions for its metric updates. Now that each metric has an aggregation_entity associated with it (e.g. PCollection, PTransform), we can design an approach which forwards the opaque bytes metric updates, without deserializing them. These are forwarded to user provided code which then would deserialize the metric update payloads and perform the custom aggregations. I think it has also simplified some of the URN metric protos, as they do not need to keep track of ptransform names inside themselves now. The result is simpler structures, for the metrics as the entities are pulled outside of the metric. I have mentioned this in the doc now, and wanted to draw attention to this particular revision. On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: > I've gathered a lot of feedback so far and want
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Sounds perfect. Just wanted to make sure that "custom metrics of supported type" didn't include new ways of aggregating ints. As long as that means we have a fixed set of aggregations (that align with what what users want and metrics back end support) it seems like we are doing user metrics right. - Ben On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucauwrote: > Maybe leave it out until proven it is needed. ATM counters are used a lot > but others are less mainstream so being too fine from the start can just > add complexity and bugs in impls IMHO. > > Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : > >> By "type" of metric, I mean both the data types (including their >> encoding) and accumulator strategy. So sumint would be a type, as would >> double-distribution. >> >> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers >> wrote: >> >>> When you say type do you mean accumulator type, result type, or >>> accumulator strategy? Specifically, what is the "type" of sumint, sumlong, >>> meanlong, etc? >>> >>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw >>> wrote: >>> Fully custom metric types is the "more speculative and difficult" feature that I was proposing we kick down the road (and may never get to). What I'm suggesting is that we support custom metrics of standard type. On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers wrote: > The metric api is designed to prevent user defined metric types based > on the fact they just weren't used enough to justify support. > > Is there a reason we are bringing that complexity back? Shouldn't we > just need the ability for the standard set plus any special system > metrivs? > On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw > wrote: > >> Thanks. I think this has simplified things. >> >> One thing that has occurred to me is that we're conflating the idea >> of custom metrics and custom metric types. I would propose the MetricSpec >> field be augmented with an additional field "type" which is a urn >> specifying the type of metric it is (i.e. the contents of its payload, as >> well as the form of aggregation). Summing or maxing over ints would be a >> typical example. Though we could pursue making this opaque to the runner >> in >> the long run, that's a more speculative (and difficult) feature to >> tackle. >> This would allow the runner to at least aggregate and report/return to >> the >> SDK metrics that it did not itself understand the semantic meaning of. >> (It >> would probably simplify much of the specialization in the runner itself >> for >> metrics that it *did* understand as well.) >> >> In addition, rather than having UserMetricOfTypeX for every type X >> one would have a single URN for UserMetric and it spec would designate >> the >> type and payload designate the (qualified) name. >> >> - Robert >> >> >> >> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >> wrote: >> >>> Thank you everyone for your feedback so far. >>> I have made a revision today which is to make all metrics refer to a >>> primary entity, so I have restructured some of the protos a little bit. >>> >>> The point of this change was to futureproof the possibility of >>> allowing custom user metrics, with custom aggregation functions for its >>> metric updates. >>> Now that each metric has an aggregation_entity associated with it >>> (e.g. PCollection, PTransform), we can design an approach which forwards >>> the opaque bytes metric updates, without deserializing them. These are >>> forwarded to user provided code which then would deserialize the metric >>> update payloads and perform the custom aggregations. >>> >>> I think it has also simplified some of the URN metric protos, as >>> they do not need to keep track of ptransform names inside themselves >>> now. >>> The result is simpler structures, for the metrics as the entities are >>> pulled outside of the metric. >>> >>> I have mentioned this in the doc now, and wanted to draw attention >>> to this particular revision. >>> >>> >>> >>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>> wrote: >>> I've gathered a lot of feedback so far and want to make a decision by Friday, and begin working on related PRs next week. Please make sure that you provide your feedback before then and I will post the final decisions made to this thread Friday afternoon. On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: > Nice, I created a short link so people can refer to it easily
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Maybe leave it out until proven it is needed. ATM counters are used a lot but others are less mainstream so being too fine from the start can just add complexity and bugs in impls IMHO. Le 12 avr. 2018 08:06, "Robert Bradshaw"a écrit : > By "type" of metric, I mean both the data types (including their encoding) > and accumulator strategy. So sumint would be a type, as would > double-distribution. > > On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers > wrote: > >> When you say type do you mean accumulator type, result type, or >> accumulator strategy? Specifically, what is the "type" of sumint, sumlong, >> meanlong, etc? >> >> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw >> wrote: >> >>> Fully custom metric types is the "more speculative and difficult" >>> feature that I was proposing we kick down the road (and may never get to). >>> What I'm suggesting is that we support custom metrics of standard type. >>> >>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers >>> wrote: >>> The metric api is designed to prevent user defined metric types based on the fact they just weren't used enough to justify support. Is there a reason we are bringing that complexity back? Shouldn't we just need the ability for the standard set plus any special system metrivs? On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw wrote: > Thanks. I think this has simplified things. > > One thing that has occurred to me is that we're conflating the idea of > custom metrics and custom metric types. I would propose the MetricSpec > field be augmented with an additional field "type" which is a urn > specifying the type of metric it is (i.e. the contents of its payload, as > well as the form of aggregation). Summing or maxing over ints would be a > typical example. Though we could pursue making this opaque to the runner > in > the long run, that's a more speculative (and difficult) feature to tackle. > This would allow the runner to at least aggregate and report/return to the > SDK metrics that it did not itself understand the semantic meaning of. (It > would probably simplify much of the specialization in the runner itself > for > metrics that it *did* understand as well.) > > In addition, rather than having UserMetricOfTypeX for every type X one > would have a single URN for UserMetric and it spec would designate the > type > and payload designate the (qualified) name. > > - Robert > > > > On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: > >> Thank you everyone for your feedback so far. >> I have made a revision today which is to make all metrics refer to a >> primary entity, so I have restructured some of the protos a little bit. >> >> The point of this change was to futureproof the possibility of >> allowing custom user metrics, with custom aggregation functions for its >> metric updates. >> Now that each metric has an aggregation_entity associated with it >> (e.g. PCollection, PTransform), we can design an approach which forwards >> the opaque bytes metric updates, without deserializing them. These are >> forwarded to user provided code which then would deserialize the metric >> update payloads and perform the custom aggregations. >> >> I think it has also simplified some of the URN metric protos, as they >> do not need to keep track of ptransform names inside themselves now. The >> result is simpler structures, for the metrics as the entities are pulled >> outside of the metric. >> >> I have mentioned this in the doc now, and wanted to draw attention to >> this particular revision. >> >> >> >> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >> wrote: >> >>> I've gathered a lot of feedback so far and want to make a decision >>> by Friday, and begin working on related PRs next week. >>> >>> Please make sure that you provide your feedback before then and I >>> will post the final decisions made to this thread Friday afternoon. >>> >>> >>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>> wrote: >>> Nice, I created a short link so people can refer to it easily in future discussions, website, etc. https://s.apache.org/beam-fn-api-metrics Thanks for sharing. On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < rober...@google.com> wrote: > Thanks for the nice writeup. I added some comments. > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >> >> Hello beam community, >> >> Thank you everyone for your initial feedback
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
By "type" of metric, I mean both the data types (including their encoding) and accumulator strategy. So sumint would be a type, as would double-distribution. On Wed, Apr 11, 2018 at 10:39 PM Ben Chamberswrote: > When you say type do you mean accumulator type, result type, or > accumulator strategy? Specifically, what is the "type" of sumint, sumlong, > meanlong, etc? > > On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw wrote: > >> Fully custom metric types is the "more speculative and difficult" feature >> that I was proposing we kick down the road (and may never get to). What I'm >> suggesting is that we support custom metrics of standard type. >> >> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers >> wrote: >> >>> The metric api is designed to prevent user defined metric types based on >>> the fact they just weren't used enough to justify support. >>> >>> Is there a reason we are bringing that complexity back? Shouldn't we >>> just need the ability for the standard set plus any special system metrivs? >>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >>> wrote: >>> Thanks. I think this has simplified things. One thing that has occurred to me is that we're conflating the idea of custom metrics and custom metric types. I would propose the MetricSpec field be augmented with an additional field "type" which is a urn specifying the type of metric it is (i.e. the contents of its payload, as well as the form of aggregation). Summing or maxing over ints would be a typical example. Though we could pursue making this opaque to the runner in the long run, that's a more speculative (and difficult) feature to tackle. This would allow the runner to at least aggregate and report/return to the SDK metrics that it did not itself understand the semantic meaning of. (It would probably simplify much of the specialization in the runner itself for metrics that it *did* understand as well.) In addition, rather than having UserMetricOfTypeX for every type X one would have a single URN for UserMetric and it spec would designate the type and payload designate the (qualified) name. - Robert On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: > Thank you everyone for your feedback so far. > I have made a revision today which is to make all metrics refer to a > primary entity, so I have restructured some of the protos a little bit. > > The point of this change was to futureproof the possibility of > allowing custom user metrics, with custom aggregation functions for its > metric updates. > Now that each metric has an aggregation_entity associated with it > (e.g. PCollection, PTransform), we can design an approach which forwards > the opaque bytes metric updates, without deserializing them. These are > forwarded to user provided code which then would deserialize the metric > update payloads and perform the custom aggregations. > > I think it has also simplified some of the URN metric protos, as they > do not need to keep track of ptransform names inside themselves now. The > result is simpler structures, for the metrics as the entities are pulled > outside of the metric. > > I have mentioned this in the doc now, and wanted to draw attention to > this particular revision. > > > > On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: > >> I've gathered a lot of feedback so far and want to make a decision by >> Friday, and begin working on related PRs next week. >> >> Please make sure that you provide your feedback before then and I >> will post the final decisions made to this thread Friday afternoon. >> >> >> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >> wrote: >> >>> Nice, I created a short link so people can refer to it easily in >>> future discussions, website, etc. >>> >>> https://s.apache.org/beam-fn-api-metrics >>> >>> Thanks for sharing. >>> >>> >>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>> rober...@google.com> wrote: >>> > Thanks for the nice writeup. I added some comments. >>> > >>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >>> wrote: >>> >> >>> >> Hello beam community, >>> >> >>> >> Thank you everyone for your initial feedback on this proposal so >>> far. I >>> >> have made some revisions based on the feedback. There were some >>> larger >>> >> questions asking about alternatives. For each of these I have >>> added a >>> >> section tagged with [Alternatives] and discussed my >>> recommendation as well >>> >> as as few other choices we considered. >>> >> >>> >> I would appreciate more
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
When you say type do you mean accumulator type, result type, or accumulator strategy? Specifically, what is the "type" of sumint, sumlong, meanlong, etc? On Wed, Apr 11, 2018, 9:38 PM Robert Bradshawwrote: > Fully custom metric types is the "more speculative and difficult" feature > that I was proposing we kick down the road (and may never get to). What I'm > suggesting is that we support custom metrics of standard type. > > On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers wrote: > >> The metric api is designed to prevent user defined metric types based on >> the fact they just weren't used enough to justify support. >> >> Is there a reason we are bringing that complexity back? Shouldn't we just >> need the ability for the standard set plus any special system metrivs? >> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >> wrote: >> >>> Thanks. I think this has simplified things. >>> >>> One thing that has occurred to me is that we're conflating the idea of >>> custom metrics and custom metric types. I would propose the MetricSpec >>> field be augmented with an additional field "type" which is a urn >>> specifying the type of metric it is (i.e. the contents of its payload, as >>> well as the form of aggregation). Summing or maxing over ints would be a >>> typical example. Though we could pursue making this opaque to the runner in >>> the long run, that's a more speculative (and difficult) feature to tackle. >>> This would allow the runner to at least aggregate and report/return to the >>> SDK metrics that it did not itself understand the semantic meaning of. (It >>> would probably simplify much of the specialization in the runner itself for >>> metrics that it *did* understand as well.) >>> >>> In addition, rather than having UserMetricOfTypeX for every type X one >>> would have a single URN for UserMetric and it spec would designate the type >>> and payload designate the (qualified) name. >>> >>> - Robert >>> >>> >>> >>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: >>> Thank you everyone for your feedback so far. I have made a revision today which is to make all metrics refer to a primary entity, so I have restructured some of the protos a little bit. The point of this change was to futureproof the possibility of allowing custom user metrics, with custom aggregation functions for its metric updates. Now that each metric has an aggregation_entity associated with it (e.g. PCollection, PTransform), we can design an approach which forwards the opaque bytes metric updates, without deserializing them. These are forwarded to user provided code which then would deserialize the metric update payloads and perform the custom aggregations. I think it has also simplified some of the URN metric protos, as they do not need to keep track of ptransform names inside themselves now. The result is simpler structures, for the metrics as the entities are pulled outside of the metric. I have mentioned this in the doc now, and wanted to draw attention to this particular revision. On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: > I've gathered a lot of feedback so far and want to make a decision by > Friday, and begin working on related PRs next week. > > Please make sure that you provide your feedback before then and I will > post the final decisions made to this thread Friday afternoon. > > > On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: > >> Nice, I created a short link so people can refer to it easily in >> future discussions, website, etc. >> >> https://s.apache.org/beam-fn-api-metrics >> >> Thanks for sharing. >> >> >> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >> wrote: >> > Thanks for the nice writeup. I added some comments. >> > >> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >> wrote: >> >> >> >> Hello beam community, >> >> >> >> Thank you everyone for your initial feedback on this proposal so >> far. I >> >> have made some revisions based on the feedback. There were some >> larger >> >> questions asking about alternatives. For each of these I have >> added a >> >> section tagged with [Alternatives] and discussed my recommendation >> as well >> >> as as few other choices we considered. >> >> >> >> I would appreciate more feedback on the revised proposal. Please >> take >> >> another look and let me know >> >> >> >> >> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >> >> >> >> Etienne, I would appreciate it if you could please take another >> look after >> >> the revisions I have made as well.
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Fully custom metric types is the "more speculative and difficult" feature that I was proposing we kick down the road (and may never get to). What I'm suggesting is that we support custom metrics of standard type. On Wed, Apr 11, 2018 at 5:52 PM Ben Chamberswrote: > The metric api is designed to prevent user defined metric types based on > the fact they just weren't used enough to justify support. > > Is there a reason we are bringing that complexity back? Shouldn't we just > need the ability for the standard set plus any special system metrivs? > On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw wrote: > >> Thanks. I think this has simplified things. >> >> One thing that has occurred to me is that we're conflating the idea of >> custom metrics and custom metric types. I would propose the MetricSpec >> field be augmented with an additional field "type" which is a urn >> specifying the type of metric it is (i.e. the contents of its payload, as >> well as the form of aggregation). Summing or maxing over ints would be a >> typical example. Though we could pursue making this opaque to the runner in >> the long run, that's a more speculative (and difficult) feature to tackle. >> This would allow the runner to at least aggregate and report/return to the >> SDK metrics that it did not itself understand the semantic meaning of. (It >> would probably simplify much of the specialization in the runner itself for >> metrics that it *did* understand as well.) >> >> In addition, rather than having UserMetricOfTypeX for every type X one >> would have a single URN for UserMetric and it spec would designate the type >> and payload designate the (qualified) name. >> >> - Robert >> >> >> >> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: >> >>> Thank you everyone for your feedback so far. >>> I have made a revision today which is to make all metrics refer to a >>> primary entity, so I have restructured some of the protos a little bit. >>> >>> The point of this change was to futureproof the possibility of allowing >>> custom user metrics, with custom aggregation functions for its metric >>> updates. >>> Now that each metric has an aggregation_entity associated with it (e.g. >>> PCollection, PTransform), we can design an approach which forwards the >>> opaque bytes metric updates, without deserializing them. These are >>> forwarded to user provided code which then would deserialize the metric >>> update payloads and perform the custom aggregations. >>> >>> I think it has also simplified some of the URN metric protos, as they do >>> not need to keep track of ptransform names inside themselves now. The >>> result is simpler structures, for the metrics as the entities are pulled >>> outside of the metric. >>> >>> I have mentioned this in the doc now, and wanted to draw attention to >>> this particular revision. >>> >>> >>> >>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: >>> I've gathered a lot of feedback so far and want to make a decision by Friday, and begin working on related PRs next week. Please make sure that you provide your feedback before then and I will post the final decisions made to this thread Friday afternoon. On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: > Nice, I created a short link so people can refer to it easily in > future discussions, website, etc. > > https://s.apache.org/beam-fn-api-metrics > > Thanks for sharing. > > > On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw > wrote: > > Thanks for the nice writeup. I added some comments. > > > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato > wrote: > >> > >> Hello beam community, > >> > >> Thank you everyone for your initial feedback on this proposal so > far. I > >> have made some revisions based on the feedback. There were some > larger > >> questions asking about alternatives. For each of these I have added > a > >> section tagged with [Alternatives] and discussed my recommendation > as well > >> as as few other choices we considered. > >> > >> I would appreciate more feedback on the revised proposal. Please > take > >> another look and let me know > >> > >> > https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit > >> > >> Etienne, I would appreciate it if you could please take another > look after > >> the revisions I have made as well. > >> > >> Thanks again, > >> Alex > >> > > >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
The metric api is designed to prevent user defined metric types based on the fact they just weren't used enough to justify support. Is there a reason we are bringing that complexity back? Shouldn't we just need the ability for the standard set plus any special system metrivs? On Wed, Apr 11, 2018, 5:43 PM Robert Bradshawwrote: > Thanks. I think this has simplified things. > > One thing that has occurred to me is that we're conflating the idea of > custom metrics and custom metric types. I would propose the MetricSpec > field be augmented with an additional field "type" which is a urn > specifying the type of metric it is (i.e. the contents of its payload, as > well as the form of aggregation). Summing or maxing over ints would be a > typical example. Though we could pursue making this opaque to the runner in > the long run, that's a more speculative (and difficult) feature to tackle. > This would allow the runner to at least aggregate and report/return to the > SDK metrics that it did not itself understand the semantic meaning of. (It > would probably simplify much of the specialization in the runner itself for > metrics that it *did* understand as well.) > > In addition, rather than having UserMetricOfTypeX for every type X one > would have a single URN for UserMetric and it spec would designate the type > and payload designate the (qualified) name. > > - Robert > > > > On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: > >> Thank you everyone for your feedback so far. >> I have made a revision today which is to make all metrics refer to a >> primary entity, so I have restructured some of the protos a little bit. >> >> The point of this change was to futureproof the possibility of allowing >> custom user metrics, with custom aggregation functions for its metric >> updates. >> Now that each metric has an aggregation_entity associated with it (e.g. >> PCollection, PTransform), we can design an approach which forwards the >> opaque bytes metric updates, without deserializing them. These are >> forwarded to user provided code which then would deserialize the metric >> update payloads and perform the custom aggregations. >> >> I think it has also simplified some of the URN metric protos, as they do >> not need to keep track of ptransform names inside themselves now. The >> result is simpler structures, for the metrics as the entities are pulled >> outside of the metric. >> >> I have mentioned this in the doc now, and wanted to draw attention to >> this particular revision. >> >> >> >> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: >> >>> I've gathered a lot of feedback so far and want to make a decision by >>> Friday, and begin working on related PRs next week. >>> >>> Please make sure that you provide your feedback before then and I will >>> post the final decisions made to this thread Friday afternoon. >>> >>> >>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: >>> Nice, I created a short link so people can refer to it easily in future discussions, website, etc. https://s.apache.org/beam-fn-api-metrics Thanks for sharing. On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw wrote: > Thanks for the nice writeup. I added some comments. > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >> >> Hello beam community, >> >> Thank you everyone for your initial feedback on this proposal so far. I >> have made some revisions based on the feedback. There were some larger >> questions asking about alternatives. For each of these I have added a >> section tagged with [Alternatives] and discussed my recommendation as well >> as as few other choices we considered. >> >> I would appreciate more feedback on the revised proposal. Please take >> another look and let me know >> >> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >> >> Etienne, I would appreciate it if you could please take another look after >> the revisions I have made as well. >> >> Thanks again, >> Alex >> > >>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Thank you everyone for your feedback so far. I have made a revision today which is to make all metrics refer to a primary entity, so I have restructured some of the protos a little bit. The point of this change was to futureproof the possibility of allowing custom user metrics, with custom aggregation functions for its metric updates. Now that each metric has an aggregation_entity associated with it (e.g. PCollection, PTransform), we can design an approach which forwards the opaque bytes metric updates, without deserializing them. These are forwarded to user provided code which then would deserialize the metric update payloads and perform the custom aggregations. I think it has also simplified some of the URN metric protos, as they do not need to keep track of ptransform names inside themselves now. The result is simpler structures, for the metrics as the entities are pulled outside of the metric. I have mentioned this in the doc now, and wanted to draw attention to this particular revision. On Tue, Apr 10, 2018 at 9:53 AM Alex Amatowrote: > I've gathered a lot of feedback so far and want to make a decision by > Friday, and begin working on related PRs next week. > > Please make sure that you provide your feedback before then and I will > post the final decisions made to this thread Friday afternoon. > > > On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: > >> Nice, I created a short link so people can refer to it easily in >> future discussions, website, etc. >> >> https://s.apache.org/beam-fn-api-metrics >> >> Thanks for sharing. >> >> >> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >> wrote: >> > Thanks for the nice writeup. I added some comments. >> > >> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >> >> >> >> Hello beam community, >> >> >> >> Thank you everyone for your initial feedback on this proposal so far. I >> >> have made some revisions based on the feedback. There were some larger >> >> questions asking about alternatives. For each of these I have added a >> >> section tagged with [Alternatives] and discussed my recommendation as >> well >> >> as as few other choices we considered. >> >> >> >> I would appreciate more feedback on the revised proposal. Please take >> >> another look and let me know >> >> >> >> >> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >> >> >> >> Etienne, I would appreciate it if you could please take another look >> after >> >> the revisions I have made as well. >> >> >> >> Thanks again, >> >> Alex >> >> >> > >> >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
I've gathered a lot of feedback so far and want to make a decision by Friday, and begin working on related PRs next week. Please make sure that you provide your feedback before then and I will post the final decisions made to this thread Friday afternoon. On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejíawrote: > Nice, I created a short link so people can refer to it easily in > future discussions, website, etc. > > https://s.apache.org/beam-fn-api-metrics > > Thanks for sharing. > > > On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw > wrote: > > Thanks for the nice writeup. I added some comments. > > > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: > >> > >> Hello beam community, > >> > >> Thank you everyone for your initial feedback on this proposal so far. I > >> have made some revisions based on the feedback. There were some larger > >> questions asking about alternatives. For each of these I have added a > >> section tagged with [Alternatives] and discussed my recommendation as > well > >> as as few other choices we considered. > >> > >> I would appreciate more feedback on the revised proposal. Please take > >> another look and let me know > >> > >> > https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit > >> > >> Etienne, I would appreciate it if you could please take another look > after > >> the revisions I have made as well. > >> > >> Thanks again, > >> Alex > >> > > >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Nice, I created a short link so people can refer to it easily in future discussions, website, etc. https://s.apache.org/beam-fn-api-metrics Thanks for sharing. On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshawwrote: > Thanks for the nice writeup. I added some comments. > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >> >> Hello beam community, >> >> Thank you everyone for your initial feedback on this proposal so far. I >> have made some revisions based on the feedback. There were some larger >> questions asking about alternatives. For each of these I have added a >> section tagged with [Alternatives] and discussed my recommendation as well >> as as few other choices we considered. >> >> I would appreciate more feedback on the revised proposal. Please take >> another look and let me know >> >> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >> >> Etienne, I would appreciate it if you could please take another look after >> the revisions I have made as well. >> >> Thanks again, >> Alex >> >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Thanks for the nice writeup. I added some comments. On Wed, Apr 4, 2018 at 1:53 PM Alex Amatowrote: > Hello beam community, > > Thank you everyone for your initial feedback on this proposal so far. I > have made some revisions based on the feedback. There were some larger > questions asking about alternatives. For each of these I have added a > section tagged with [Alternatives] and discussed my recommendation as well > as as few other choices we considered. > > I would appreciate more feedback on the revised proposal. Please take > another look and let me know > > https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit > > Etienne, I would appreciate it if you could please take another look after > the revisions I have made as well. > > Thanks again, > Alex > >