Re: Introducing a Redistribute transform

2016-10-12 Thread Jean-Baptiste Onofré
Hi Eugene, thanks for the update on the mailing list, much appreciated. Let me take a deeper look on that. Regards JB On 10/13/2016 02:03 AM, Eugene Kirpichov wrote: So, based on some offline discussion, the problem is more complex. There's several classes of ultimate user needs which are

Re: Introducing a Redistribute transform

2016-10-12 Thread Eugene Kirpichov
So, based on some offline discussion, the problem is more complex. There's several classes of ultimate user needs which are potentially orthogonal, even though the current Reshuffle transform, as implemented by the Dataflow runner, happens to satisfy all of them at the same time: 1. Checkpointing

Re: Introducing a Redistribute transform

2016-10-10 Thread Eugene Kirpichov
Hi Amit, The transform, the way it's implemented, actually does several things at the same time and that's why it's tricky to document it. Redistribute.arbitrarily(): - Introduces a fusion barrier (in runners that have it), making sure that the runner can fully parallelize processing the output

Re: Introducing a Redistribute transform

2016-10-10 Thread Amit Sela
On Mon, Oct 10, 2016 at 9:21 PM Robert Bradshaw wrote: > On Sat, Oct 8, 2016 at 7:31 AM, Amit Sela wrote: > > > Hi Eugene, > > > > > > This is very interesting. > > > Let me see if I get this right, the "Redistribute" transformation > assigns