Yes there is now a new PTransform that is called GroupIntoBatches
Best,
Etienne
Le 11/07/2017 à 02:38, Robert Bradshaw a écrit :
Sorry, just saw https://github.com/apache/beam/pull/2211
On Mon, Jul 10, 2017 at 5:37 PM, Robert Bradshaw wrote:
Any progress on this?
On
Sorry, just saw https://github.com/apache/beam/pull/2211
On Mon, Jul 10, 2017 at 5:37 PM, Robert Bradshaw wrote:
> Any progress on this?
>
> On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot wrote:
>> Hi all,
>>
>> We had a discussion with Kenn yesterday
Any progress on this?
On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot wrote:
> Hi all,
>
> We had a discussion with Kenn yesterday about point 1 bellow, I would like
> to note it here on the ML:
>
> Using new method timer.set() instead of timer.setForNowPlus() makes the
>
Hi all,
We had a discussion with Kenn yesterday about point 1 bellow, I would
like to note it here on the ML:
Using new method timer.set() instead of timer.setForNowPlus() makes the
timer fire at the right time.
Another thing, regarding point 2: if I inject the window in the @Ontimer
Hi,
@JB: good to know for the roadmap! thanks
@Kenn: just to be clear: the timer fires fine. What I noticed is that it
seems to be SET more than once because timer.setForNowPlus in called the
@ProcessElement method. I am not 100% sure of it, what I noticed is that
it started to work fine
Hi,
Le 27/01/2017 à 19:44, Robert Bradshaw a écrit :
On Fri, Jan 27, 2017 at 6:55 AM, Etienne Chauchot wrote:
Hi Robert,
Le 26/01/2017 à 18:17, Robert Bradshaw a écrit :
First off, let me say that a *correctly* batching DoFn is a lot of
value, especially because it's
It makes sense with the PTransform, less invasive, but the user has to
define to define two functions (one perElement, the other perBatch). I
like the DoFn approach with annotations, and I would do the same for the
batch.
The "trigger/window" is a key part as well, as even if the batch is not
On Fri, Jan 27, 2017 at 6:55 AM, Etienne Chauchot wrote:
> Hi Robert,
>
> Le 26/01/2017 à 18:17, Robert Bradshaw a écrit :
>>
>> First off, let me say that a *correctly* batching DoFn is a lot of
>> value, especially because it's (too) easy to (often unknowingly)
>> implement
Hi,
Indeed, I did not want to be invasive in DoFn either. So I chose to
implement it as a PTransform.
Please be aware that it is just the very beginning, it is still in the
simple naive approach (no window and no buffering trans-bundle support),
I plan to use state API to buffer
Hi Robert,
Le 26/01/2017 à 18:17, Robert Bradshaw a écrit :
First off, let me say that a *correctly* batching DoFn is a lot of
value, especially because it's (too) easy to (often unknowingly)
implement it incorrectly.
I definitely agree, I put a similar comment in another email. As an
example
On Thu, Jan 26, 2017 at 6:58 PM, Kenneth Knowles
wrote:
> On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
>> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov
>> wrote:
>> >
>> > you can't wrap
Hi Eugene
A simple way would be to create a BatchedDoFn in an extension.
WDYT ?
Regards
JB
On Jan 26, 2017, 21:48, at 21:48, Eugene Kirpichov
wrote:
>I don't think we should make batching a core feature of the Beam
>programming model (by adding it to DoFn as
On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw <
rober...@google.com.invalid> wrote:
> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov
> wrote:
> >
> > you can't wrap DoFn's, period
>
> As a simple example, given a DoFn it's perfectly natural to want
> to
On Thu, Jan 26, 2017 at 4:20 PM, Ben Chambers
wrote:
> Here's an example API that would make this part of a DoFn. The idea here is
> that it would still be run as `ParDo.of(new MyBatchedDoFn())`, but the
> runner (and DoFnRunner) could see that it has asked for
The third option for batching:
- Functionality within the DoFn and DoFnRunner built as part of the SDK.
I haven't thought through Batching, but at least for the
IntraBundleParallelization use case this actually does make sense to expose
as a part of the model. Knowing that a DoFn supports
On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
> The class for invoking DoFn's,
> DoFnInvokers, is absent from the SDK (and present in runners-core) for a
> good reason.
>
This would be true if it weren't for that pesky DoFnTester :-)
And even if we
I agree that wrapping the DoFn is probably not the way to go, because the
DoFn may be quite tricky due to all the reflective features: e.g. how do
you automatically "batch" a DoFn that uses state and timers? What about a
DoFn that uses a BoundedWindow parameter? What about a splittable DoFn?
What
On Thu, Jan 26, 2017 at 3:31 PM, Ben Chambers
wrote:
> I think that wrapping the DoFn is tricky -- we backed out
> IntraBundleParallelization because it did that, and it has weird
> interactions with both the reflective DoFn and windowing. We could maybe
> make some
I think that wrapping the DoFn is tricky -- we backed out
IntraBundleParallelization because it did that, and it has weird
interactions with both the reflective DoFn and windowing. We could maybe
make some kind of "DoFnDelegatingDoFn" that could act as a base class and
get some of that right,
On Thu, Jan 26, 2017 at 12:48 PM, Eugene Kirpichov
wrote:
> I don't think we should make batching a core feature of the Beam
> programming model (by adding it to DoFn as this code snippet implies). I'm
> reasonably sure there are less invasive ways of implementing
I don't think we should make batching a core feature of the Beam
programming model (by adding it to DoFn as this code snippet implies). I'm
reasonably sure there are less invasive ways of implementing it.
On Thu, Jan 26, 2017 at 12:22 PM Jean-Baptiste Onofré
wrote:
> Agree,
Agree, I'm curious as well.
I guess it would be something like:
.apply(ParDo(new DoFn() {
@Override
public long batchSize() {
return 1000;
}
@ProcessElement
public void processElement(ProcessContext context) {
...
}
}));
If batchSize (overrided by user) returns a
First off, let me say that a *correctly* batching DoFn is a lot of
value, especially because it's (too) easy to (often unknowingly)
implement it incorrectly.
My take is that a BatchingParDo should be a PTransform that takes a DoFn, ? extends
Iterable> as a parameter, as
Hi Etienne,
Could you post some snippets of how your transform is to be used in a
pipeline? I think that would make it easier to discuss on this thread and
could save a lot of churn if the discussion ends up leading to a different
API.
On Thu, Jan 26, 2017 at 8:29 AM Etienne Chauchot
Wonderful !
Thanks Kenn !
Etienne
Le 26/01/2017 à 15:34, Kenneth Knowles a écrit :
Hi Etienne,
I was drafting a proposal about @OnWindowExpiration when this email
arrived. I thought I would try to quickly unblock you by responding with a
TL;DR: you can achieve your goals with state & timers
Fantastic !
Let me take a look on the Spark runner ;)
Thanks !
Regards
JB
On 01/26/2017 03:34 PM, Kenneth Knowles wrote:
Hi Etienne,
I was drafting a proposal about @OnWindowExpiration when this email
arrived. I thought I would try to quickly unblock you by responding with a
TL;DR: you can
26 matches
Mail list logo