Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-10-12 Thread Jean-Baptiste Onofré
Great ! Let me experiment a bit in SDF (especially in the IO). I keep you posted. Regards JB On 10/13/2016 02:55 AM, Eugene Kirpichov wrote: Hey all, An update: https://github.com/apache/incubator-beam/pull/896 has been merged, laying groundwork and adding support for splittable DoFn to

Re: Introducing a Redistribute transform

2016-10-12 Thread Jean-Baptiste Onofré
Hi Eugene, thanks for the update on the mailing list, much appreciated. Let me take a deeper look on that. Regards JB On 10/13/2016 02:03 AM, Eugene Kirpichov wrote: So, based on some offline discussion, the problem is more complex. There's several classes of ultimate user needs which are

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-10-12 Thread Jean-Baptiste Onofré
Hi Pei, good one ! We now have to update the 'other' runners. Thanks. Regards JB On 10/12/2016 10:48 PM, Pei He wrote: Hi, I just want to bump this thread, and brought it to attention. PipelineResult now have cancel() and waitUntilFinish(). However, currently only DataflowRunner supports

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-10-12 Thread Eugene Kirpichov
Hey all, An update: https://github.com/apache/incubator-beam/pull/896 has been merged, laying groundwork and adding support for splittable DoFn to the in-memory runner. What this PR does: - It defines an API, in full accordance with the proposal discussed on this thread. - It adds a mostly

Re: Introducing a Redistribute transform

2016-10-12 Thread Eugene Kirpichov
So, based on some offline discussion, the problem is more complex. There's several classes of ultimate user needs which are potentially orthogonal, even though the current Reshuffle transform, as implemented by the Dataflow runner, happens to satisfy all of them at the same time: 1. Checkpointing

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-10-12 Thread Pei He
Hi, I just want to bump this thread, and brought it to attention. PipelineResult now have cancel() and waitUntilFinish(). However, currently only DataflowRunner supports it in DataflowPipelineJob. We agreed that users should do "p.run().waitUntilFinish()" if they want to block. But, if they do

Re: Jenkins build is back to stable : beam_PostCommit_MavenVerify » Apache Beam :: Examples :: Java #1503

2016-10-12 Thread Dan Halperin
Just an FYI that the issues here were legitimate issues in an external service that have since been resolved. They were present for approximately 90 minutes in a small set of places, and we were affected :) On Tue, Oct 11, 2016 at 7:37 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote:

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Robert Bradshaw
+1 to the new metrics design. I strongly favor B as well. On Wed, Oct 12, 2016 at 10:54 AM, Kenneth Knowles wrote: > Correction: In my eagerness to see the end of aggregators, I mistook the > intention. Both A and B leave aggregators in place until there is a >

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Kenneth Knowles
Correction: In my eagerness to see the end of aggregators, I mistook the intention. Both A and B leave aggregators in place until there is a replacement. In which case, I am strongly in favor of B. As soon as we can remove aggregators, I think we should. On Wed, Oct 12, 2016 at 10:48 AM Kenneth

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Kenneth Knowles
Huzzah! This is IMO a really great change. I agree that we can get something in to allow work to continue, and improve the API as we learn. On Wed, Oct 12, 2016 at 10:20 AM Ben Chambers wrote: > 3. One open question is what to do with Aggregators. In the doc I