Re: Committed vs. attempted metrics results

2017-01-26 Thread Aviem Zur
Ben - yes, there is still some ambiguity regarding the querying of the metrics results. You've discussed in this thread the notion that metrics step names should be converted to unique names when aggregating metrics, so that each step will aggregate its own metrics, and not join with other steps

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Jean-Baptiste Onofré
Fantastic ! Let me take a look on the Spark runner ;) Thanks ! Regards JB On 01/26/2017 03:34 PM, Kenneth Knowles wrote: Hi Etienne, I was drafting a proposal about @OnWindowExpiration when this email arrived. I thought I would try to quickly unblock you by responding with a TL;DR: you can

Re: Default Timestamp and Watermark

2017-01-26 Thread Thomas Groh
The default timestamp should be BoundedWindow.TIMESTAMP_MIN_VALUE, which is equivalent to -2**63 microseconds. We also occasionally refer to this timestamp as "negative infinity". The default watermark policy for a bounded source should be negative infinity until all of the data is read, then

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Etienne Chauchot
Wonderful ! Thanks Kenn ! Etienne Le 26/01/2017 à 15:34, Kenneth Knowles a écrit : Hi Etienne, I was drafting a proposal about @OnWindowExpiration when this email arrived. I thought I would try to quickly unblock you by responding with a TL;DR: you can achieve your goals with state & timers

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Eugene Kirpichov
Hi Etienne, Could you post some snippets of how your transform is to be used in a pipeline? I think that would make it easier to discuss on this thread and could save a lot of churn if the discussion ends up leading to a different API. On Thu, Jan 26, 2017 at 8:29 AM Etienne Chauchot

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
First off, let me say that a *correctly* batching DoFn is a lot of value, especially because it's (too) easy to (often unknowingly) implement it incorrectly. My take is that a BatchingParDo should be a PTransform that takes a DoFn, ? extends Iterable> as a parameter, as

Re: Committed vs. attempted metrics results

2017-01-26 Thread Ben Chambers
It think relaxing the query to not be an exact match is reasonable. I'm wondering if it should be substring or regex. either one preserves the existing behavior of, when passed a full step path returning only the metrics for that specific step, but it adds the ability to just know approximately

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Jean-Baptiste Onofré
Agree, I'm curious as well. I guess it would be something like: .apply(ParDo(new DoFn() { @Override public long batchSize() { return 1000; } @ProcessElement public void processElement(ProcessContext context) { ... } })); If batchSize (overrided by user) returns a

Re: Default Timestamp and Watermark

2017-01-26 Thread Kenneth Knowles
On Thu, Jan 26, 2017 at 9:48 AM, Thomas Groh wrote: > > The default watermark policy for a bounded source should be negative > infinity until all of the data is read, then positive infinity. Just to elaborate - there isn't a way for a bounded source to communicate a

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Kenneth Knowles
On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov > wrote: > > > > you can't wrap DoFn's, period > > As a simple example, given a DoFn it's perfectly natural to want > to

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread María García Herrero
Congratulations and thank you for your contributions thus far! On Thu, Jan 26, 2017 at 6:00 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > Welcome and congratulations! > > On Thu, Jan 26, 2017 at 5:05 PM, Sourabh Bajaj > wrote: > > Congrats!! > > >

Re: Conceptually, what are bundles?

2017-01-26 Thread Etienne Chauchot
Le 25/01/2017 à 20:34, Kenneth Knowles a écrit : There's actually not a JIRA filed beyond BEAM-25 for what Eugene is referring to. Context: Prior to windowing and streaming it was safe to buffer elements in @ProcessElement and then actually perform output in @FinishBundle. This pattern is only

Re: Conceptually, what are bundles?

2017-01-26 Thread Jean-Baptiste Onofré
It makes sense. Agreed. Regards JB On 01/25/2017 08:34 PM, Kenneth Knowles wrote: There's actually not a JIRA filed beyond BEAM-25 for what Eugene is referring to. Context: Prior to windowing and streaming it was safe to buffer elements in @ProcessElement and then actually perform output in

Re: Better developer instructions for using Maven?

2017-01-26 Thread Aljoscha Krettek
+1 to what Dan said On Wed, 25 Jan 2017 at 21:40 Kenneth Knowles wrote: > +1 > > On Jan 25, 2017 11:15, "Jean-Baptiste Onofré" wrote: > > > +1 > > > > It sounds good to me. > > > > Thanks Dan ! > > > > Regards > > JB⁣​ > > > > On Jan 25, 2017, 19:39,

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Eugene Kirpichov
I don't think we should make batching a core feature of the Beam programming model (by adding it to DoFn as this code snippet implies). I'm reasonably sure there are less invasive ways of implementing it. On Thu, Jan 26, 2017 at 12:22 PM Jean-Baptiste Onofré wrote: > Agree,

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Thomas Weise
Congrats! On Thu, Jan 26, 2017 at 7:49 PM, María García Herrero < mari...@google.com.invalid> wrote: > Congratulations and thank you for your contributions thus far! > > On Thu, Jan 26, 2017 at 6:00 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > > > Welcome and congratulations! >

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Aviem Zur
Congrats! On Fri, Jan 27, 2017, 06:25 Thomas Weise wrote: > Congrats! > > > On Thu, Jan 26, 2017 at 7:49 PM, María García Herrero < > mari...@google.com.invalid> wrote: > > > Congratulations and thank you for your contributions thus far! > > > > On Thu, Jan 26, 2017 at 6:00 PM,

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Jean-Baptiste Onofré
Welcome aboard !⁣ Regards JB On Jan 27, 2017, 01:27, at 01:27, Davor Bonaci wrote: >Please join me and the rest of Beam PMC in welcoming the following >contributors as our newest committers. They have significantly >contributed >to the project in different ways, and we look

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Jean-Baptiste Onofré
⁣Hi Eugene A simple way would be to create a BatchedDoFn in an extension. WDYT ? Regards JB On Jan 26, 2017, 21:48, at 21:48, Eugene Kirpichov wrote: >I don't think we should make batching a core feature of the Beam >programming model (by adding it to DoFn as

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Kobi Salant
Congrats! Well deserved Stas בתאריך 27 בינו' 2017 7:26,‏ "Frances Perry" כתב: > Woohoo! Congrats ;-) > > On Thu, Jan 26, 2017 at 9:05 PM, Jean-Baptiste Onofré > wrote: > > > Welcome aboard !⁣ > > > > Regards > > JB > > > > On Jan 27, 2017, 01:27, at

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 6:58 PM, Kenneth Knowles wrote: > On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov >> wrote: >> > >> > you can't wrap

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Jason Kuster
Congrats all! Very exciting. :) On Thu, Jan 26, 2017 at 4:48 PM, Jesse Anderson wrote: > Welcome! > > On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci wrote: > > > Please join me and the rest of Beam PMC in welcoming the following > > contributors as our

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Sourabh Bajaj
Congrats!! On Thu, Jan 26, 2017 at 5:02 PM Jason Kuster wrote: > Congrats all! Very exciting. :) > > On Thu, Jan 26, 2017 at 4:48 PM, Jesse Anderson > wrote: > > > Welcome! > > > > On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 4:20 PM, Ben Chambers wrote: > Here's an example API that would make this part of a DoFn. The idea here is > that it would still be run as `ParDo.of(new MyBatchedDoFn())`, but the > runner (and DoFnRunner) could see that it has asked for

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Robert Bradshaw
Welcome and congratulations! On Thu, Jan 26, 2017 at 5:05 PM, Sourabh Bajaj wrote: > Congrats!! > > On Thu, Jan 26, 2017 at 5:02 PM Jason Kuster > wrote: > >> Congrats all! Very exciting. :) >> >> On Thu, Jan 26, 2017 at 4:48 PM,

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Jesse Anderson
Welcome! On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci wrote: > Please join me and the rest of Beam PMC in welcoming the following > contributors as our newest committers. They have significantly contributed > to the project in different ways, and we look forward to many more >

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
I think that wrapping the DoFn is tricky -- we backed out IntraBundleParallelization because it did that, and it has weird interactions with both the reflective DoFn and windowing. We could maybe make some kind of "DoFnDelegatingDoFn" that could act as a base class and get some of that right,

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 3:31 PM, Ben Chambers wrote: > I think that wrapping the DoFn is tricky -- we backed out > IntraBundleParallelization because it did that, and it has weird > interactions with both the reflective DoFn and windowing. We could maybe > make some

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Eugene Kirpichov
I agree that wrapping the DoFn is probably not the way to go, because the DoFn may be quite tricky due to all the reflective features: e.g. how do you automatically "batch" a DoFn that uses state and timers? What about a DoFn that uses a BoundedWindow parameter? What about a splittable DoFn? What

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Kenneth Knowles
On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > The class for invoking DoFn's, > DoFnInvokers, is absent from the SDK (and present in runners-core) for a > good reason. > This would be true if it weren't for that pesky DoFnTester :-) And even if we

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
The third option for batching: - Functionality within the DoFn and DoFnRunner built as part of the SDK. I haven't thought through Batching, but at least for the IntraBundleParallelization use case this actually does make sense to expose as a part of the model. Knowing that a DoFn supports

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 12:48 PM, Eugene Kirpichov wrote: > I don't think we should make batching a core feature of the Beam > programming model (by adding it to DoFn as this code snippet implies). I'm > reasonably sure there are less invasive ways of implementing