Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-19 Thread Alex Amato
Hello, I have rewritten most of the proposal. Though I think that there is some more research that needs to be done to get the Metric specification perfect. I plan to do more research, and would like to ask you all for more help to make this proposal better. In particular, now that the metrics

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-17 Thread Ben Chambers
That sounds like a very reasonable choice -- given the discussion seemed to be focusing on the differences between these two categories, separating them will allow the proposal (and implementation) to address each category in the best way possible without needing to make compromises. Looking

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-17 Thread Alex Amato
Hello, I just wanted to give an update . After some discussion, I've realized that its best to break up the two concepts, with two separate way of reporting monitoring data. These two categories are: 1. Metrics - Counters, Gauges, Distributions. These are well defined concepts for

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-16 Thread Robert Bradshaw
I agree that the user/system dichotomy is false, the real question of how counters can be scoped to avoid accidental (or even intentional) interference. A system that entirely controls the interaction between the "user" (from its perspective) and the underlying system can do this by prefixing all

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-14 Thread Kenneth Knowles
One reason I resist the user/system distinction is that Beam is a multi-party system with at least SDK, runner, and pipeline. Often there may be a DSL like SQL or Scio, or similarly someone may be building a platform for their company where there is no user authoring the pipeline. Should Scio,

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
On Fri, Apr 13, 2018 at 4:30 PM Alex Amato wrote: > There are a few more confusing concepts in this thread > *Name* > >- Name can mean a *"string name"* used to refer to a metric in a >metrics system such as stackdriver, i.e. "ElementCount", "ExecutionTime" >-

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler wrote: > Thanks, Robert! > > I think my lack of clarity is around the MetricSpec. Maybe what's in my > head and what's being proposed are the same thing. When I read that the > MetricSpec describes the proto structure, that

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Andrea Foegler
That's a great summary Alex, thanks! This doesn't address all your questions, but in terms of how I see the MetricSpec being specified / shared is something like this: SDKs just share the same MetricSpec file which defines all the system metrics guaranteed by Beam. SDK-specific additions can be

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Alex Amato
There are a few more confusing concepts in this thread *Name* - Name can mean a *"string name"* used to refer to a metric in a metrics system such as stackdriver, i.e. "ElementCount", "ExecutionTime" - Name can mean a set of *context* fields added to a counter, either embedded in a

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Andrea Foegler
Thanks, Robert! I think my lack of clarity is around the MetricSpec. Maybe what's in my head and what's being proposed are the same thing. When I read that the MetricSpec describes the proto structure, that sound kind of complicated to me. But I may be misinterpreting it. What I picture is

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: > > Or just "beam:counter::" or even > "beam:metric::" since metrics have a type separate from > their name. > I proposed keeping the "user" in there to avoid possible clashes with the system namespaces. (No preference on

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Andrea Foegler
I like the generalization from entity -> labels. I view the purpose of those fields to provide context. And labels feel like they supports a richer set of contexts. The URN concept gets a little tricky. I totally agree that the context fields should not be embedded in the name. There's a

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Kenneth Knowles
On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw wrote: > On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles wrote: > >> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw >> wrote: >> >>> Also, the only use for payloads is because "User

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles wrote: > On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw > wrote: > >> Also, the only use for payloads is because "User Counter" is currently a >> single URN, rather than using the namespacing characteristics

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Kenneth Knowles
On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw wrote: > Also, the only use for payloads is because "User Counter" is currently a > single URN, rather than using the namespacing characteristics of URNs to > map user names onto distinct metric names. > Can they be URNs? I

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
+1 to keeping things simple, both in code and the model to understand. I like thinking of things as (name, value, type) triples. Historically, we've packed the entity name (e.g. PTransform name) into the string name field and parsed it out in various places; I think it's worth pulling this out

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
On Fri, Apr 13, 2018 at 10:10 AM Alex Amato wrote: > > *Thank you for this clarification. I think the table of files fits into > the model as one of type string-set (with union as aggregation). * > Its not a list of files, its a list of metadata for each file, several >

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-13 Thread Robert Bradshaw
On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles wrote: > > To Robert's proto: > > // A mapping of entities to (encoded) values. >> map values; >> > > Are the keys here the names of the metrics, aka what is used for URNs in > the doc? > >> They're the entities to

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Robert Bradshaw
On Thu, Apr 12, 2018 at 8:17 PM Alex Amato wrote: > I agree that there is some confusion about concepts. Here are several > concepts which have come up in discussions, as I see them (not official > names). > > *Metric* > >- For the purposes of my document, I have been

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Alex Amato
I agree that there is some confusion about concepts. Here are several concepts which have come up in discussions, as I see them (not official names). *Metric* - For the purposes of my document, I have been referring to a Metric as any sort of information the SDK can send to the Runner

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Kenneth Knowles
Agree with all of this. It echoes a thread on the doc that I was going to bring here. Let's keep it simple and use concrete use cases to drive additional abstraction if/when it becomes compelling. Kenn On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers wrote: > Sounds perfect.

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Ben Chambers
Sounds perfect. Just wanted to make sure that "custom metrics of supported type" didn't include new ways of aggregating ints. As long as that means we have a fixed set of aggregations (that align with what what users want and metrics back end support) it seems like we are doing user metrics right.

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Romain Manni-Bucau
Maybe leave it out until proven it is needed. ATM counters are used a lot but others are less mainstream so being too fine from the start can just add complexity and bugs in impls IMHO. Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : > By "type" of metric, I mean both

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-12 Thread Robert Bradshaw
By "type" of metric, I mean both the data types (including their encoding) and accumulator strategy. So sumint would be a type, as would double-distribution. On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers wrote: > When you say type do you mean accumulator type, result type,

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-11 Thread Ben Chambers
When you say type do you mean accumulator type, result type, or accumulator strategy? Specifically, what is the "type" of sumint, sumlong, meanlong, etc? On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw wrote: > Fully custom metric types is the "more speculative and difficult"

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-11 Thread Robert Bradshaw
Fully custom metric types is the "more speculative and difficult" feature that I was proposing we kick down the road (and may never get to). What I'm suggesting is that we support custom metrics of standard type. On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers wrote: > The

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-11 Thread Ben Chambers
The metric api is designed to prevent user defined metric types based on the fact they just weren't used enough to justify support. Is there a reason we are bringing that complexity back? Shouldn't we just need the ability for the standard set plus any special system metrivs? On Wed, Apr 11,

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-11 Thread Alex Amato
Thank you everyone for your feedback so far. I have made a revision today which is to make all metrics refer to a primary entity, so I have restructured some of the protos a little bit. The point of this change was to futureproof the possibility of allowing custom user metrics, with custom

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-10 Thread Alex Amato
I've gathered a lot of feedback so far and want to make a decision by Friday, and begin working on related PRs next week. Please make sure that you provide your feedback before then and I will post the final decisions made to this thread Friday afternoon. On Thu, Apr 5, 2018 at 1:38 AM Ismaël

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-05 Thread Ismaël Mejía
Nice, I created a short link so people can refer to it easily in future discussions, website, etc. https://s.apache.org/beam-fn-api-metrics Thanks for sharing. On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw wrote: > Thanks for the nice writeup. I added some comments. >

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-04 Thread Robert Bradshaw
Thanks for the nice writeup. I added some comments. On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: > Hello beam community, > > Thank you everyone for your initial feedback on this proposal so far. I > have made some revisions based on the feedback. There were some larger >

Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

2018-04-04 Thread Alex Amato
Hello beam community, Thank you everyone for your initial feedback on this proposal so far. I have made some revisions based on the feedback. There were some larger questions asking about alternatives. For each of these I have added a section tagged with [Alternatives] and discussed my