Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-11 Thread Etienne Chauchot
Yes there is now a new PTransform that is called GroupIntoBatches Best, Etienne Le 11/07/2017 à 02:38, Robert Bradshaw a écrit : Sorry, just saw https://github.com/apache/beam/pull/2211 On Mon, Jul 10, 2017 at 5:37 PM, Robert Bradshaw wrote: Any progress on this? On

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-10 Thread Robert Bradshaw
Sorry, just saw https://github.com/apache/beam/pull/2211 On Mon, Jul 10, 2017 at 5:37 PM, Robert Bradshaw wrote: > Any progress on this? > > On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot wrote: >> Hi all, >> >> We had a discussion with Kenn yesterday

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-10 Thread Robert Bradshaw
Any progress on this? On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot wrote: > Hi all, > > We had a discussion with Kenn yesterday about point 1 bellow, I would like > to note it here on the ML: > > Using new method timer.set() instead of timer.setForNowPlus() makes the >

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-03-09 Thread Etienne Chauchot
Hi all, We had a discussion with Kenn yesterday about point 1 bellow, I would like to note it here on the ML: Using new method timer.set() instead of timer.setForNowPlus() makes the timer fire at the right time. Another thing, regarding point 2: if I inject the window in the @Ontimer

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-02-09 Thread Etienne Chauchot
Hi, @JB: good to know for the roadmap! thanks @Kenn: just to be clear: the timer fires fine. What I noticed is that it seems to be SET more than once because timer.setForNowPlus in called the @ProcessElement method. I am not 100% sure of it, what I noticed is that it started to work fine

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-30 Thread Etienne Chauchot
Hi, Le 27/01/2017 à 19:44, Robert Bradshaw a écrit : On Fri, Jan 27, 2017 at 6:55 AM, Etienne Chauchot wrote: Hi Robert, Le 26/01/2017 à 18:17, Robert Bradshaw a écrit : First off, let me say that a *correctly* batching DoFn is a lot of value, especially because it's

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-27 Thread Jean-Baptiste Onofré
It makes sense with the PTransform, less invasive, but the user has to define to define two functions (one perElement, the other perBatch). I like the DoFn approach with annotations, and I would do the same for the batch. The "trigger/window" is a key part as well, as even if the batch is not

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-27 Thread Robert Bradshaw
On Fri, Jan 27, 2017 at 6:55 AM, Etienne Chauchot wrote: > Hi Robert, > > Le 26/01/2017 à 18:17, Robert Bradshaw a écrit : >> >> First off, let me say that a *correctly* batching DoFn is a lot of >> value, especially because it's (too) easy to (often unknowingly) >> implement

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-27 Thread Etienne Chauchot
Hi, Indeed, I did not want to be invasive in DoFn either. So I chose to implement it as a PTransform. Please be aware that it is just the very beginning, it is still in the simple naive approach (no window and no buffering trans-bundle support), I plan to use state API to buffer

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-27 Thread Etienne Chauchot
Hi Robert, Le 26/01/2017 à 18:17, Robert Bradshaw a écrit : First off, let me say that a *correctly* batching DoFn is a lot of value, especially because it's (too) easy to (often unknowingly) implement it incorrectly. I definitely agree, I put a similar comment in another email. As an example

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 6:58 PM, Kenneth Knowles wrote: > On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov >> wrote: >> > >> > you can't wrap

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Jean-Baptiste Onofré
⁣Hi Eugene A simple way would be to create a BatchedDoFn in an extension. WDYT ? Regards JB On Jan 26, 2017, 21:48, at 21:48, Eugene Kirpichov wrote: >I don't think we should make batching a core feature of the Beam >programming model (by adding it to DoFn as

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Kenneth Knowles
On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov > wrote: > > > > you can't wrap DoFn's, period > > As a simple example, given a DoFn it's perfectly natural to want > to

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 4:20 PM, Ben Chambers wrote: > Here's an example API that would make this part of a DoFn. The idea here is > that it would still be run as `ParDo.of(new MyBatchedDoFn())`, but the > runner (and DoFnRunner) could see that it has asked for

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
The third option for batching: - Functionality within the DoFn and DoFnRunner built as part of the SDK. I haven't thought through Batching, but at least for the IntraBundleParallelization use case this actually does make sense to expose as a part of the model. Knowing that a DoFn supports

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Kenneth Knowles
On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > The class for invoking DoFn's, > DoFnInvokers, is absent from the SDK (and present in runners-core) for a > good reason. > This would be true if it weren't for that pesky DoFnTester :-) And even if we

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Eugene Kirpichov
I agree that wrapping the DoFn is probably not the way to go, because the DoFn may be quite tricky due to all the reflective features: e.g. how do you automatically "batch" a DoFn that uses state and timers? What about a DoFn that uses a BoundedWindow parameter? What about a splittable DoFn? What

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 3:31 PM, Ben Chambers wrote: > I think that wrapping the DoFn is tricky -- we backed out > IntraBundleParallelization because it did that, and it has weird > interactions with both the reflective DoFn and windowing. We could maybe > make some

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Ben Chambers
I think that wrapping the DoFn is tricky -- we backed out IntraBundleParallelization because it did that, and it has weird interactions with both the reflective DoFn and windowing. We could maybe make some kind of "DoFnDelegatingDoFn" that could act as a base class and get some of that right,

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 12:48 PM, Eugene Kirpichov wrote: > I don't think we should make batching a core feature of the Beam > programming model (by adding it to DoFn as this code snippet implies). I'm > reasonably sure there are less invasive ways of implementing

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Eugene Kirpichov
I don't think we should make batching a core feature of the Beam programming model (by adding it to DoFn as this code snippet implies). I'm reasonably sure there are less invasive ways of implementing it. On Thu, Jan 26, 2017 at 12:22 PM Jean-Baptiste Onofré wrote: > Agree,

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Jean-Baptiste Onofré
Agree, I'm curious as well. I guess it would be something like: .apply(ParDo(new DoFn() { @Override public long batchSize() { return 1000; } @ProcessElement public void processElement(ProcessContext context) { ... } })); If batchSize (overrided by user) returns a

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
First off, let me say that a *correctly* batching DoFn is a lot of value, especially because it's (too) easy to (often unknowingly) implement it incorrectly. My take is that a BatchingParDo should be a PTransform that takes a DoFn, ? extends Iterable> as a parameter, as

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Eugene Kirpichov
Hi Etienne, Could you post some snippets of how your transform is to be used in a pipeline? I think that would make it easier to discuss on this thread and could save a lot of churn if the discussion ends up leading to a different API. On Thu, Jan 26, 2017 at 8:29 AM Etienne Chauchot

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Etienne Chauchot
Wonderful ! Thanks Kenn ! Etienne Le 26/01/2017 à 15:34, Kenneth Knowles a écrit : Hi Etienne, I was drafting a proposal about @OnWindowExpiration when this email arrived. I thought I would try to quickly unblock you by responding with a TL;DR: you can achieve your goals with state & timers

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Jean-Baptiste Onofré
Fantastic ! Let me take a look on the Spark runner ;) Thanks ! Regards JB On 01/26/2017 03:34 PM, Kenneth Knowles wrote: Hi Etienne, I was drafting a proposal about @OnWindowExpiration when this email arrived. I thought I would try to quickly unblock you by responding with a TL;DR: you can