Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-10-04 Thread Kenneth Knowles
I've filed the IP clearance record:
http://incubator.apache.org/ip-clearance/beam-dataflow-java-worker.html
https://lists.apache.org/thread.html/1cc32072bd888f6b1335f29db2cc4194ab0c70e35552c327c40122e1@%3Cgeneral.incubator.apache.org%3E

Kenn

On Wed, Oct 3, 2018 at 4:19 PM Boyuan Zhang  wrote:

> Hey all,
>
> We are tracking the dataflow worker donating process here:
> https://issues.apache.org/jira/browse/BEAM-5634 .
>
> Boyuan Zhang
>
> On Mon, Sep 17, 2018 at 5:05 PM Lukasz Cwik  wrote:
>
>> Thanks all, closing the vote with 18 +1s, 5 of which are binding.
>>
>> I'll try to get this code out and hopefully don't have any legal issues
>> within Google or with ASF to perform the donation. Will keep the community
>> up to date.
>>
>> On Mon, Sep 17, 2018 at 3:28 PM Ankur Chauhan  wrote:
>>
>>> +1
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 17, 2018, at 15:26, Ankur Goenka  wrote:
>>>
>>> +1
>>>
>>> On Sun, Sep 16, 2018 at 3:20 AM Maximilian Michels 
>>> wrote:
>>>
 +1 (binding)

 On 15.09.18 20:07, Reuven Lax wrote:
 > +1
 >
 > On Sat, Sep 15, 2018 at 9:40 AM Rui Wang >>> > > wrote:
 >
 > +1
 >
 > -Rui
 >
 > On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
 > mailto:rober...@google.com>> wrote:
 >
 > +1 (binding)
 >
 > On Sat, Sep 15, 2018 at 6:44 AM Tim <
 timrobertson...@gmail.com
 > > wrote:
 >
 > +1
 >
 > On 15 Sep 2018, at 01:23, Yifan Zou >>> > > wrote:
 >
 >> +1
 >>
 >> On Fri, Sep 14, 2018 at 4:20 PM David Morávek
 >> mailto:david.mora...@gmail.com
 >>
 >> wrote:
 >>
 >> +1
 >>
 >>
 >>
 >> On 15 Sep 2018, at 00:59, Anton Kedin
 >> mailto:ke...@google.com>> wrote:
 >>
 >>> +1
 >>>
 >>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
 >>> mailto:amyrv...@google.com>>
 wrote:
 >>>
 >>> +1
 >>>
 >>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
 >>> mailto:boyu...@google.com
 >>
 >>> wrote:
 >>>
 >>> +1
 >>>
 >>> On Fri, Sep 14, 2018 at 3:15 PM Henning
 Rohde
 >>> >>> >>> > wrote:
 >>>
 >>> +1
 >>>
 >>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet
 >>> Altay >>> >>> > wrote:
 >>>
 >>> +1 (binding)
 >>>
 >>> On Fri, Sep 14, 2018 at 2:35 PM,
 >>> Lukasz Cwik >>> >>> > wrote:
 >>>
 >>> +1 (binding)
 >>>
 >>> On Fri, Sep 14, 2018 at 2:34 PM
 >>> Pablo Estrada <
 pabl...@google.com
 >>> >
 wrote:
 >>>
 >>> +1
 >>>
 >>> On Fri, Sep 14, 2018 at 2:32
 >>> PM Andrew Pilloud
 >>> >>> >>> >
 >>> wrote:
 >>>
 >>> +1
 >>>
 >>> On Fri, Sep 14, 2018 at
 >>> 2:31 PM Lukasz Cwik
 >>> >>> >>> >>> lc...@google.com>> wrote:
 >>>
 >>> There was generally
 >>> positive support and
 >>> good feedback[1] but
 >>> it was not
 unanimous.
 >>> I wanted to bring
 the
 >>> donation of the
 >>> Dataflow worker code
 >>> base to Apache Beam
 >>> master to a vote.
 >>>
 >>>   

Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-10-03 Thread Boyuan Zhang
Hey all,

We are tracking the dataflow worker donating process here:
https://issues.apache.org/jira/browse/BEAM-5634 .

Boyuan Zhang

On Mon, Sep 17, 2018 at 5:05 PM Lukasz Cwik  wrote:

> Thanks all, closing the vote with 18 +1s, 5 of which are binding.
>
> I'll try to get this code out and hopefully don't have any legal issues
> within Google or with ASF to perform the donation. Will keep the community
> up to date.
>
> On Mon, Sep 17, 2018 at 3:28 PM Ankur Chauhan  wrote:
>
>> +1
>>
>> Sent from my iPhone
>>
>> On Sep 17, 2018, at 15:26, Ankur Goenka  wrote:
>>
>> +1
>>
>> On Sun, Sep 16, 2018 at 3:20 AM Maximilian Michels 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> On 15.09.18 20:07, Reuven Lax wrote:
>>> > +1
>>> >
>>> > On Sat, Sep 15, 2018 at 9:40 AM Rui Wang >> > > wrote:
>>> >
>>> > +1
>>> >
>>> > -Rui
>>> >
>>> > On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
>>> > mailto:rober...@google.com>> wrote:
>>> >
>>> > +1 (binding)
>>> >
>>> > On Sat, Sep 15, 2018 at 6:44 AM Tim >> > > wrote:
>>> >
>>> > +1
>>> >
>>> > On 15 Sep 2018, at 01:23, Yifan Zou >> > > wrote:
>>> >
>>> >> +1
>>> >>
>>> >> On Fri, Sep 14, 2018 at 4:20 PM David Morávek
>>> >> mailto:david.mora...@gmail.com
>>> >>
>>> >> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >>
>>> >>
>>> >> On 15 Sep 2018, at 00:59, Anton Kedin
>>> >> mailto:ke...@google.com>> wrote:
>>> >>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
>>> >>> mailto:amyrv...@google.com>>
>>> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
>>> >>> mailto:boyu...@google.com>>
>>> >>> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde
>>> >>> >> >>> > wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet
>>> >>> Altay >> >>> > wrote:
>>> >>>
>>> >>> +1 (binding)
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 2:35 PM,
>>> >>> Lukasz Cwik >> >>> > wrote:
>>> >>>
>>> >>> +1 (binding)
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 2:34 PM
>>> >>> Pablo Estrada <
>>> pabl...@google.com
>>> >>> >
>>> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at 2:32
>>> >>> PM Andrew Pilloud
>>> >>> >> >>> >> >>
>>> >>> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Sep 14, 2018 at
>>> >>> 2:31 PM Lukasz Cwik
>>> >>> >> >>> >
>>> wrote:
>>> >>>
>>> >>> There was generally
>>> >>> positive support and
>>> >>> good feedback[1] but
>>> >>> it was not unanimous.
>>> >>> I wanted to bring the
>>> >>> donation of the
>>> >>> Dataflow worker code
>>> >>> base to Apache Beam
>>> >>> master to a vote.
>>> >>>
>>> >>> +1: Support having
>>> >>> the Dataflow worker
>>> >>> code as part of
>>> >>> Apache Beam master
>>> branch
>>> >>> -1: Dataflow worker
>>> >>> code should live
>>> >>>  

Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-17 Thread Lukasz Cwik
Thanks all, closing the vote with 18 +1s, 5 of which are binding.

I'll try to get this code out and hopefully don't have any legal issues
within Google or with ASF to perform the donation. Will keep the community
up to date.

On Mon, Sep 17, 2018 at 3:28 PM Ankur Chauhan  wrote:

> +1
>
> Sent from my iPhone
>
> On Sep 17, 2018, at 15:26, Ankur Goenka  wrote:
>
> +1
>
> On Sun, Sep 16, 2018 at 3:20 AM Maximilian Michels  wrote:
>
>> +1 (binding)
>>
>> On 15.09.18 20:07, Reuven Lax wrote:
>> > +1
>> >
>> > On Sat, Sep 15, 2018 at 9:40 AM Rui Wang > > > wrote:
>> >
>> > +1
>> >
>> > -Rui
>> >
>> > On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
>> > mailto:rober...@google.com>> wrote:
>> >
>> > +1 (binding)
>> >
>> > On Sat, Sep 15, 2018 at 6:44 AM Tim > > > wrote:
>> >
>> > +1
>> >
>> > On 15 Sep 2018, at 01:23, Yifan Zou > > > wrote:
>> >
>> >> +1
>> >>
>> >> On Fri, Sep 14, 2018 at 4:20 PM David Morávek
>> >> mailto:david.mora...@gmail.com>>
>> >> wrote:
>> >>
>> >> +1
>> >>
>> >>
>> >>
>> >> On 15 Sep 2018, at 00:59, Anton Kedin
>> >> mailto:ke...@google.com>> wrote:
>> >>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
>> >>> mailto:amyrv...@google.com>>
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
>> >>> mailto:boyu...@google.com>>
>> >>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde
>> >>> > >>> > wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet
>> >>> Altay > >>> > wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:35 PM,
>> >>> Lukasz Cwik > >>> > wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:34 PM
>> >>> Pablo Estrada > >>> >
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:32
>> >>> PM Andrew Pilloud
>> >>> > >>> >
>> >>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at
>> >>> 2:31 PM Lukasz Cwik
>> >>> > >>> >
>> wrote:
>> >>>
>> >>> There was generally
>> >>> positive support and
>> >>> good feedback[1] but
>> >>> it was not unanimous.
>> >>> I wanted to bring the
>> >>> donation of the
>> >>> Dataflow worker code
>> >>> base to Apache Beam
>> >>> master to a vote.
>> >>>
>> >>> +1: Support having
>> >>> the Dataflow worker
>> >>> code as part of
>> >>> Apache Beam master
>> branch
>> >>> -1: Dataflow worker
>> >>> code should live
>> >>> elsewhere
>> >>>
>> >>> 1:
>> >>>
>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>> >>>
>> >>>
>>
>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-17 Thread Thomas Weise
+1 (binding)

On Mon, Sep 17, 2018 at 3:27 PM Ankur Goenka  wrote:

> +1
>
> On Sun, Sep 16, 2018 at 3:20 AM Maximilian Michels  wrote:
>
>> +1 (binding)
>>
>> On 15.09.18 20:07, Reuven Lax wrote:
>> > +1
>> >
>> > On Sat, Sep 15, 2018 at 9:40 AM Rui Wang > > > wrote:
>> >
>> > +1
>> >
>> > -Rui
>> >
>> > On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
>> > mailto:rober...@google.com>> wrote:
>> >
>> > +1 (binding)
>> >
>> > On Sat, Sep 15, 2018 at 6:44 AM Tim > > > wrote:
>> >
>> > +1
>> >
>> > On 15 Sep 2018, at 01:23, Yifan Zou > > > wrote:
>> >
>> >> +1
>> >>
>> >> On Fri, Sep 14, 2018 at 4:20 PM David Morávek
>> >> mailto:david.mora...@gmail.com>>
>> >> wrote:
>> >>
>> >> +1
>> >>
>> >>
>> >>
>> >> On 15 Sep 2018, at 00:59, Anton Kedin
>> >> mailto:ke...@google.com>> wrote:
>> >>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
>> >>> mailto:amyrv...@google.com>>
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
>> >>> mailto:boyu...@google.com>>
>> >>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde
>> >>> > >>> > wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet
>> >>> Altay > >>> > wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:35 PM,
>> >>> Lukasz Cwik > >>> > wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:34 PM
>> >>> Pablo Estrada > >>> >
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:32
>> >>> PM Andrew Pilloud
>> >>> > >>> >
>> >>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at
>> >>> 2:31 PM Lukasz Cwik
>> >>> > >>> >
>> wrote:
>> >>>
>> >>> There was generally
>> >>> positive support and
>> >>> good feedback[1] but
>> >>> it was not unanimous.
>> >>> I wanted to bring the
>> >>> donation of the
>> >>> Dataflow worker code
>> >>> base to Apache Beam
>> >>> master to a vote.
>> >>>
>> >>> +1: Support having
>> >>> the Dataflow worker
>> >>> code as part of
>> >>> Apache Beam master
>> branch
>> >>> -1: Dataflow worker
>> >>> code should live
>> >>> elsewhere
>> >>>
>> >>> 1:
>> >>>
>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>> >>>
>> >>>
>>
>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-17 Thread Ankur Chauhan
+1

Sent from my iPhone

> On Sep 17, 2018, at 15:26, Ankur Goenka  wrote:
> 
> +1
> 
>> On Sun, Sep 16, 2018 at 3:20 AM Maximilian Michels  wrote:
>> +1 (binding)
>> 
>> On 15.09.18 20:07, Reuven Lax wrote:
>> > +1
>> > 
>> > On Sat, Sep 15, 2018 at 9:40 AM Rui Wang > > > wrote:
>> > 
>> > +1
>> > 
>> > -Rui
>> > 
>> > On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
>> > mailto:rober...@google.com>> wrote:
>> > 
>> > +1 (binding)
>> > 
>> > On Sat, Sep 15, 2018 at 6:44 AM Tim > > > wrote:
>> > 
>> > +1
>> > 
>> > On 15 Sep 2018, at 01:23, Yifan Zou > > > wrote:
>> > 
>> >> +1
>> >>
>> >> On Fri, Sep 14, 2018 at 4:20 PM David Morávek
>> >> mailto:david.mora...@gmail.com>>
>> >> wrote:
>> >>
>> >> +1
>> >>
>> >>
>> >>
>> >> On 15 Sep 2018, at 00:59, Anton Kedin
>> >> mailto:ke...@google.com>> wrote:
>> >>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
>> >>> mailto:amyrv...@google.com>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
>> >>> mailto:boyu...@google.com>>
>> >>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde
>> >>> > >>> > wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet
>> >>> Altay > >>> > wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:35 PM,
>> >>> Lukasz Cwik > >>> > wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:34 PM
>> >>> Pablo Estrada > >>> > wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at 2:32
>> >>> PM Andrew Pilloud
>> >>> > >>> >
>> >>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, Sep 14, 2018 at
>> >>> 2:31 PM Lukasz Cwik
>> >>> > >>> > 
>> >>> wrote:
>> >>>
>> >>> There was generally
>> >>> positive support and
>> >>> good feedback[1] but
>> >>> it was not unanimous.
>> >>> I wanted to bring the
>> >>> donation of the
>> >>> Dataflow worker code
>> >>> base to Apache Beam
>> >>> master to a vote.
>> >>>
>> >>> +1: Support having
>> >>> the Dataflow worker
>> >>> code as part of
>> >>> Apache Beam master branch
>> >>> -1: Dataflow worker
>> >>> code should live
>> >>> elsewhere
>> >>>
>> >>> 1:
>> >>> 
>> >>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>> >>>
>> >>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-17 Thread Ankur Goenka
+1

On Sun, Sep 16, 2018 at 3:20 AM Maximilian Michels  wrote:

> +1 (binding)
>
> On 15.09.18 20:07, Reuven Lax wrote:
> > +1
> >
> > On Sat, Sep 15, 2018 at 9:40 AM Rui Wang  > > wrote:
> >
> > +1
> >
> > -Rui
> >
> > On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
> > mailto:rober...@google.com>> wrote:
> >
> > +1 (binding)
> >
> > On Sat, Sep 15, 2018 at 6:44 AM Tim  > > wrote:
> >
> > +1
> >
> > On 15 Sep 2018, at 01:23, Yifan Zou  > > wrote:
> >
> >> +1
> >>
> >> On Fri, Sep 14, 2018 at 4:20 PM David Morávek
> >> mailto:david.mora...@gmail.com>>
> >> wrote:
> >>
> >> +1
> >>
> >>
> >>
> >> On 15 Sep 2018, at 00:59, Anton Kedin
> >> mailto:ke...@google.com>> wrote:
> >>
> >>> +1
> >>>
> >>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
> >>> mailto:amyrv...@google.com>>
> wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
> >>> mailto:boyu...@google.com>>
> >>> wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde
> >>>  >>> > wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet
> >>> Altay  >>> > wrote:
> >>>
> >>> +1 (binding)
> >>>
> >>> On Fri, Sep 14, 2018 at 2:35 PM,
> >>> Lukasz Cwik  >>> > wrote:
> >>>
> >>> +1 (binding)
> >>>
> >>> On Fri, Sep 14, 2018 at 2:34 PM
> >>> Pablo Estrada  >>> >
> wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Sep 14, 2018 at 2:32
> >>> PM Andrew Pilloud
> >>>  >>> >
> >>> wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Sep 14, 2018 at
> >>> 2:31 PM Lukasz Cwik
> >>>  >>> >
> wrote:
> >>>
> >>> There was generally
> >>> positive support and
> >>> good feedback[1] but
> >>> it was not unanimous.
> >>> I wanted to bring the
> >>> donation of the
> >>> Dataflow worker code
> >>> base to Apache Beam
> >>> master to a vote.
> >>>
> >>> +1: Support having
> >>> the Dataflow worker
> >>> code as part of
> >>> Apache Beam master
> branch
> >>> -1: Dataflow worker
> >>> code should live
> >>> elsewhere
> >>>
> >>> 1:
> >>>
> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
> >>>
> >>>
>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-16 Thread Maximilian Michels

+1 (binding)

On 15.09.18 20:07, Reuven Lax wrote:

+1

On Sat, Sep 15, 2018 at 9:40 AM Rui Wang > wrote:


+1

-Rui

On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw
mailto:rober...@google.com>> wrote:

+1 (binding)

On Sat, Sep 15, 2018 at 6:44 AM Tim mailto:timrobertson...@gmail.com>> wrote:

+1

On 15 Sep 2018, at 01:23, Yifan Zou mailto:yifan...@google.com>> wrote:


+1

On Fri, Sep 14, 2018 at 4:20 PM David Morávek
mailto:david.mora...@gmail.com>>
wrote:

+1



On 15 Sep 2018, at 00:59, Anton Kedin
mailto:ke...@google.com>> wrote:


+1

On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold
mailto:amyrv...@google.com>> wrote:

+1

On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang
mailto:boyu...@google.com>>
wrote:

+1

On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde
mailto:hero...@google.com>> wrote:

+1

On Fri, Sep 14, 2018 at 2:40 PM Ahmet
Altay mailto:al...@google.com>> wrote:

+1 (binding)

On Fri, Sep 14, 2018 at 2:35 PM,
Lukasz Cwik mailto:lc...@google.com>> wrote:

+1 (binding)

On Fri, Sep 14, 2018 at 2:34 PM
Pablo Estrada mailto:pabl...@google.com>> wrote:

+1

On Fri, Sep 14, 2018 at 2:32
PM Andrew Pilloud
mailto:apill...@google.com>>
wrote:

+1

On Fri, Sep 14, 2018 at
2:31 PM Lukasz Cwik
mailto:lc...@google.com>> wrote:

There was generally
positive support and
good feedback[1] but
it was not unanimous.
I wanted to bring the
donation of the
Dataflow worker code
base to Apache Beam
master to a vote.

+1: Support having
the Dataflow worker
code as part of
Apache Beam master branch
-1: Dataflow worker
code should live
elsewhere

1:

https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E




Re: Donating the Dataflow Worker code to Apache Beam

2018-09-16 Thread Maximilian Michels
If anything, merging the Dataflow Worker code shows Google's commitment 
to the Beam project. Yes, it does solve internal issues with syncing 
their runtime with Beam, but Beam was always about the programming model 
for data processing, not about a specific type of execution engine.


Like any other execution engine, it requires effort from a party to 
support its Runner. If that proves not to be the case anymore, it will 
also be an option to remove code again.


Overall, it should help to sync and develop the portability with the 
rest of the community. And that helps to grow the popularity of Beam. I 
don't see why we would not support it.


+1 from my side

On 14.09.18 10:02, Romain Manni-Bucau wrote:



Le ven. 14 sept. 2018 à 09:48, Robert Bradshaw > a écrit :


On Fri, Sep 14, 2018 at 8:00 AM Romain Manni-Bucau
mailto:rmannibu...@gmail.com>> wrote:

Well IBM runner is outside Beam for instance so this is not
really a point IMHO.

My view is simple:
1. does this module bring anything to Beam as a project: I
understand your answer as a no (please clarify if I'm wrong)


As has been mentioned, this make it easier for both developers at
google and developers outside google to contribute, which is the
immediate benefit. Longer term I also hope it leads to more code
sharing (there is currently an unnecessary amount of duplication due
to the pain of developing across this boundary) including of
features that aren't yet in upstream runners but we'd like to see
(e.g. liquid sharding). 



This is half true, external dev will be able to contribute but not test 
so no real gain here.


2. does this module bring anything to Beam or Big Data users:
same answer


Dataflow is used by many Beam users, making it work well is in their
interest as well. That which makes contributors lives easier (and
wastes less of their time) will translate into more contributions
(new features, faster bugfixes, ...) as well. 



Same point, if you can contribute to something you can't test without 
mocks then you still can't work on it reliably.



So at the end this will not bring anything to the community and
just solve an google internal design issue so why should it hit
Beam?
I get the "we can't test it" point but this is wrong since you
can use snapshots and staging repos, if not the enhancement is
trivial enough to make it doable and not add a dead module to
beam tree.

Am I missing anything?


While it's true we *can* test this without it being in Beam, as we
have been doing, it's painful. It's like doing away with presubmits
and only relying on postsubmits, but where you can't even look at
the failure and fix it on your local machine. It's a huge time sink
for all those involved, and not good for transparency or openness
(e.g. there are things that only googlers can do).


This is the case for any vendor impl based on Beam since by design the 
dependency is in this direction.



As has been mentioned, we already do this for Flink, Spark, etc.
There's also a precedent for providing connectors to even non-OSS
systems, e.g. we ship the job submission portions for Dataflow, IO
connectors for Apache Kenisis, and an S3 filesystem adapter. It
certainly wouldn't be in our, or our users, benefit to remove those. 



Agree but these are modules which are touching users directly. I.e. if 
you have some S3 bucket you grab the module and run it on your database. 
In the worker case, you will never do it.



Eventually, as has been mentioned on the other thread, I hope our
interfaces become stable enough that it's easy to move much if not
all of this into the respective upstream projects. But that is
certainly not the case right now. 



This is likely where investment must be made instead of working it 
around making beam bigger, increase its maintenance cost for the 
community without real gain and harder to enter in.



Hopefully this helps answer your questions as to the benefits for Beam.

Le ven. 14 sept. 2018 à 07:22, Reuven Lax mailto:re...@google.com>> a écrit :

Dataflow tests are part of Beam post submit, and if a PR
breaks the Dataflow runner it will probably be rolled back.
Today Beam contributors that make changes impacting the
runner boundary have no way to make those changes without
breaking Dataflow (unless they as a Googler to help them).
Fortunately these are not the most common changes, but they
happen, and it's caused a lot of pain in the past.

Putting this code into the github repository allows all
runners to be modified when such a change is made, not just
the non-Dataflow runners. This makes it easier for

Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-15 Thread Reuven Lax
+1

On Sat, Sep 15, 2018 at 9:40 AM Rui Wang  wrote:

> +1
>
> -Rui
>
> On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw 
> wrote:
>
>> +1 (binding)
>>
>> On Sat, Sep 15, 2018 at 6:44 AM Tim  wrote:
>>
>>> +1
>>>
>>> On 15 Sep 2018, at 01:23, Yifan Zou  wrote:
>>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 4:20 PM David Morávek 
>>> wrote:
>>>
 +1



 On 15 Sep 2018, at 00:59, Anton Kedin  wrote:

 +1

 On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang 
> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde 
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay 
>>> wrote:
>>>
 +1 (binding)

 On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik 
 wrote:

> +1 (binding)
>
> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud <
>> apill...@google.com> wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik 
>>> wrote:
>>>
 There was generally positive support and good feedback[1] but
 it was not unanimous. I wanted to bring the donation of the 
 Dataflow worker
 code base to Apache Beam master to a vote.

 +1: Support having the Dataflow worker code as part of Apache
 Beam master branch
 -1: Dataflow worker code should live elsewhere

 1:
 https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E

>>>



Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-15 Thread Rui Wang
+1

-Rui

On Sat, Sep 15, 2018 at 12:32 AM Robert Bradshaw 
wrote:

> +1 (binding)
>
> On Sat, Sep 15, 2018 at 6:44 AM Tim  wrote:
>
>> +1
>>
>> On 15 Sep 2018, at 01:23, Yifan Zou  wrote:
>>
>> +1
>>
>> On Fri, Sep 14, 2018 at 4:20 PM David Morávek 
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> On 15 Sep 2018, at 00:59, Anton Kedin  wrote:
>>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold 
>>> wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde 
> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
>>
>>> +1 (binding)
>>>
>>> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik 
>>> wrote:
>>>
 +1 (binding)

 On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud <
> apill...@google.com> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik 
>> wrote:
>>
>>> There was generally positive support and good feedback[1] but it
>>> was not unanimous. I wanted to bring the donation of the Dataflow 
>>> worker
>>> code base to Apache Beam master to a vote.
>>>
>>> +1: Support having the Dataflow worker code as part of Apache
>>> Beam master branch
>>> -1: Dataflow worker code should live elsewhere
>>>
>>> 1:
>>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>>
>>
>>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-15 Thread Robert Bradshaw
+1 (binding)

On Sat, Sep 15, 2018 at 6:44 AM Tim  wrote:

> +1
>
> On 15 Sep 2018, at 01:23, Yifan Zou  wrote:
>
> +1
>
> On Fri, Sep 14, 2018 at 4:20 PM David Morávek 
> wrote:
>
>> +1
>>
>>
>>
>> On 15 Sep 2018, at 00:59, Anton Kedin  wrote:
>>
>> +1
>>
>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold  wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang  wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
>
>> +1 (binding)
>>
>> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
>>> wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik 
> wrote:
>
>> There was generally positive support and good feedback[1] but it
>> was not unanimous. I wanted to bring the donation of the Dataflow 
>> worker
>> code base to Apache Beam master to a vote.
>>
>> +1: Support having the Dataflow worker code as part of Apache
>> Beam master branch
>> -1: Dataflow worker code should live elsewhere
>>
>> 1:
>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>
>
>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Tim
+1

> On 15 Sep 2018, at 01:23, Yifan Zou  wrote:
> 
> +1
> 
>> On Fri, Sep 14, 2018 at 4:20 PM David Morávek  
>> wrote:
>> +1
>> 
>> 
>> 
>>> On 15 Sep 2018, at 00:59, Anton Kedin  wrote:
>>> 
>>> +1
>>> 
 On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold  wrote:
 +1
 
> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang  wrote:
> +1
> 
>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde  wrote:
>> +1
>> 
>>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
>>> +1 (binding)
>>> 
 On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:
 +1 (binding)
 
> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada  
> wrote:
> +1
> 
>> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud  
>> wrote:
>> +1
>> 
>>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  
>>> wrote:
>>> There was generally positive support and good feedback[1] but it 
>>> was not unanimous. I wanted to bring the donation of the Dataflow 
>>> worker code base to Apache Beam master to a vote.
>>> 
>>> +1: Support having the Dataflow worker code as part of Apache Beam 
>>> master branch
>>> -1: Dataflow worker code should live elsewhere
>>> 
>>> 1: 
>>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>> 


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Yifan Zou
+1

On Fri, Sep 14, 2018 at 4:20 PM David Morávek 
wrote:

> +1
>
>
>
> On 15 Sep 2018, at 00:59, Anton Kedin  wrote:
>
> +1
>
> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold  wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang  wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde 
>>> wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:

> +1 (binding)
>
> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:
>
>> +1 (binding)
>>
>> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
>>> wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik 
 wrote:

> There was generally positive support and good feedback[1] but it
> was not unanimous. I wanted to bring the donation of the Dataflow 
> worker
> code base to Apache Beam master to a vote.
>
> +1: Support having the Dataflow worker code as part of Apache Beam
> master branch
> -1: Dataflow worker code should live elsewhere
>
> 1:
> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>

>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread David Morávek
+1



> On 15 Sep 2018, at 00:59, Anton Kedin  wrote:
> 
> +1
> 
>> On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold  wrote:
>> +1
>> 
>>> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang  wrote:
>>> +1
>>> 
 On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde  wrote:
 +1
 
> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
> +1 (binding)
> 
>> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:
>> +1 (binding)
>> 
>>> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada  
>>> wrote:
>>> +1
>>> 
 On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud  
 wrote:
 +1
 
> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:
> There was generally positive support and good feedback[1] but it was 
> not unanimous. I wanted to bring the donation of the Dataflow worker 
> code base to Apache Beam master to a vote.
> 
> +1: Support having the Dataflow worker code as part of Apache Beam 
> master branch
> -1: Dataflow worker code should live elsewhere
> 
> 1: 
> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
> 


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Anton Kedin
+1

On Fri, Sep 14, 2018 at 3:22 PM Alan Myrvold  wrote:

> +1
>
> On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang  wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde  wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
>>>
 +1 (binding)

 On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:

> +1 (binding)
>
> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik 
>>> wrote:
>>>
 There was generally positive support and good feedback[1] but it
 was not unanimous. I wanted to bring the donation of the Dataflow 
 worker
 code base to Apache Beam master to a vote.

 +1: Support having the Dataflow worker code as part of Apache Beam
 master branch
 -1: Dataflow worker code should live elsewhere

 1:
 https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E

>>>



Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Alan Myrvold
+1

On Fri, Sep 14, 2018 at 3:16 PM Boyuan Zhang  wrote:

> +1
>
> On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde  wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
>>
>>> +1 (binding)
>>>
>>> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:
>>>
 +1 (binding)

 On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:
>>
>>> There was generally positive support and good feedback[1] but it was
>>> not unanimous. I wanted to bring the donation of the Dataflow worker 
>>> code
>>> base to Apache Beam master to a vote.
>>>
>>> +1: Support having the Dataflow worker code as part of Apache Beam
>>> master branch
>>> -1: Dataflow worker code should live elsewhere
>>>
>>> 1:
>>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>>
>>
>>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Boyuan Zhang
+1

On Fri, Sep 14, 2018 at 3:15 PM Henning Rohde  wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:
>
>> +1 (binding)
>>
>> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:
>>
>>> +1 (binding)
>>>
>>> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada 
>>> wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
 wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:
>
>> There was generally positive support and good feedback[1] but it was
>> not unanimous. I wanted to bring the donation of the Dataflow worker code
>> base to Apache Beam master to a vote.
>>
>> +1: Support having the Dataflow worker code as part of Apache Beam
>> master branch
>> -1: Dataflow worker code should live elsewhere
>>
>> 1:
>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>
>
>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Henning Rohde
+1

On Fri, Sep 14, 2018 at 2:40 PM Ahmet Altay  wrote:

> +1 (binding)
>
> On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:
>
>> +1 (binding)
>>
>> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada  wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
>>> wrote:
>>>
 +1

 On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:

> There was generally positive support and good feedback[1] but it was
> not unanimous. I wanted to bring the donation of the Dataflow worker code
> base to Apache Beam master to a vote.
>
> +1: Support having the Dataflow worker code as part of Apache Beam
> master branch
> -1: Dataflow worker code should live elsewhere
>
> 1:
> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>

>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Ahmet Altay
+1 (binding)

On Fri, Sep 14, 2018 at 2:35 PM, Lukasz Cwik  wrote:

> +1 (binding)
>
> On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada  wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:
>>>
 There was generally positive support and good feedback[1] but it was
 not unanimous. I wanted to bring the donation of the Dataflow worker code
 base to Apache Beam master to a vote.

 +1: Support having the Dataflow worker code as part of Apache Beam
 master branch
 -1: Dataflow worker code should live elsewhere

 1: https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05b
 d52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E

>>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Lukasz Cwik
+1 (binding)

On Fri, Sep 14, 2018 at 2:34 PM Pablo Estrada  wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud 
> wrote:
>
>> +1
>>
>> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:
>>
>>> There was generally positive support and good feedback[1] but it was not
>>> unanimous. I wanted to bring the donation of the Dataflow worker code base
>>> to Apache Beam master to a vote.
>>>
>>> +1: Support having the Dataflow worker code as part of Apache Beam
>>> master branch
>>> -1: Dataflow worker code should live elsewhere
>>>
>>> 1:
>>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>>
>>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Pablo Estrada
+1

On Fri, Sep 14, 2018 at 2:32 PM Andrew Pilloud  wrote:

> +1
>
> On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:
>
>> There was generally positive support and good feedback[1] but it was not
>> unanimous. I wanted to bring the donation of the Dataflow worker code base
>> to Apache Beam master to a vote.
>>
>> +1: Support having the Dataflow worker code as part of Apache Beam master
>> branch
>> -1: Dataflow worker code should live elsewhere
>>
>> 1:
>> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>>
>


Re: [VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Andrew Pilloud
+1

On Fri, Sep 14, 2018 at 2:31 PM Lukasz Cwik  wrote:

> There was generally positive support and good feedback[1] but it was not
> unanimous. I wanted to bring the donation of the Dataflow worker code base
> to Apache Beam master to a vote.
>
> +1: Support having the Dataflow worker code as part of Apache Beam master
> branch
> -1: Dataflow worker code should live elsewhere
>
> 1:
> https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E
>


[VOTE] Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Lukasz Cwik
There was generally positive support and good feedback[1] but it was not
unanimous. I wanted to bring the donation of the Dataflow worker code base
to Apache Beam master to a vote.

+1: Support having the Dataflow worker code as part of Apache Beam master
branch
-1: Dataflow worker code should live elsewhere

1:
https://lists.apache.org/thread.html/89efd3bc1d30f3d43d4b361a5ee05bd52778c9dc3f43ac72354c2bd9@%3Cdev.beam.apache.org%3E


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Robert Bradshaw
On Fri, Sep 14, 2018 at 10:02 AM Romain Manni-Bucau 
wrote:

>
> Le ven. 14 sept. 2018 à 09:48, Robert Bradshaw  a
> écrit :
>
>> On Fri, Sep 14, 2018 at 8:00 AM Romain Manni-Bucau 
>> wrote:
>>
>>> Well IBM runner is outside Beam for instance so this is not really a
>>> point IMHO.
>>>
>>> My view is simple:
>>> 1. does this module bring anything to Beam as a project: I understand
>>> your answer as a no (please clarify if I'm wrong)
>>>
>>
>> As has been mentioned, this make it easier for both developers at google
>> and developers outside google to contribute, which is the immediate
>> benefit. Longer term I also hope it leads to more code sharing (there is
>> currently an unnecessary amount of duplication due to the pain of
>> developing across this boundary) including of features that aren't yet in
>> upstream runners but we'd like to see (e.g. liquid sharding).
>>
>
> This is half true, external dev will be able to contribute but not test so
> no real gain here.
>
>
>>
>>
>>> 2. does this module bring anything to Beam or Big Data users: same answer
>>>
>>
>> Dataflow is used by many Beam users, making it work well is in their
>> interest as well. That which makes contributors lives easier (and wastes
>> less of their time) will translate into more contributions (new features,
>> faster bugfixes, ...) as well.
>>
>
> Same point, if you can contribute to something you can't test without
> mocks then you still can't work on it reliably.
>

It won't solve the situation, but it'll make it much better.

Whenever you're interfacing two systems, you have to draw the line
somewhere. Rather than staying abstract, let me illustrate why in this case
moving the code (aka moving the line) is concretely better.

Currently the dataflow worker looks something like

DataflowService <---DataflowRpcs---> DataflowWorkerAdaptors
<---RunnerLibApi---> CommonWorkerCode <---SdkApi---> UserCode.

The first two components are shipped with Dataflow, and the latter two
provided by the user as their Jar. We'd like to move to the latter three
being developed and compiled together. Right now the RunnerLibApi is in
active development, and the XxxWorkerAdapters lives in the Beam repository
for most runners, but not Dataflow, making it easy to evolve along with it.
Say you want to refactor RunnerLibApi to add a parameter to a method. It's
an easy change, possibly mostly automated with an IDE. Now the postcommits
go and test this against the dataflow runner and someone gets a class
loader or method not found exception reproducible only on some worker on
the dataflow service (which is much more painful than the compiler error
(no need for mocks even) you would have gotten locally during development
earlier in the cycle). So you have to roll this back and do this in a
two-step phase, adding the old method as a delegate, getting someone at
Google to move Dataflow from using the old method to the new method, then
when that's all pushed out deleting the old method. And this is for a
fairly simple change. (And in too many cases this last step or two is not
carried through, or hackier workarounds are chosen to avoid this process
entirely, which leaves cruft in the Beam codebase.)

On the other hand, the DataflowRpcs have been stable for a long time, and
when changes are made, they're almost always made by Googlers who are in a
position to pay this price.

The crux of the problem is that RunnerLibApi is not (yet) a stable API.


> So at the end this will not bring anything to the community and just solve
>>> an google internal design issue so why should it hit Beam?
>>> I get the "we can't test it" point but this is wrong since you can use
>>> snapshots and staging repos, if not the enhancement is trivial enough to
>>> make it doable and not add a dead module to beam tree.
>>>
>>> Am I missing anything?
>>>
>>
>> While it's true we *can* test this without it being in Beam, as we have
>> been doing, it's painful. It's like doing away with presubmits and only
>> relying on postsubmits, but where you can't even look at the failure and
>> fix it on your local machine. It's a huge time sink for all those involved,
>> and not good for transparency or openness (e.g. there are things that only
>> googlers can do).
>>
>
> This is the case for any vendor impl based on Beam since by design the
> dependency is in this direction.
>
>
>>
>> As has been mentioned, we already do this for Flink, Spark, etc. There's
>> also a precedent for providing connectors to even non-OSS systems, e.g. we
>> ship the job submission portions for Dataflow, IO connectors for Apache
>> Kenisis, and an S3 filesystem adapter. It certainly wouldn't be in our, or
>> our users, benefit to remove those.
>>
>
> Agree but these are modules which are touching users directly. I.e. if you
> have some S3 bucket you grab the module and run it on your database. In the
> worker case, you will never do it.
>
>
>>
>> Eventually, as has been mentioned on the other thread, I hope our
>> 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Romain Manni-Bucau
Le ven. 14 sept. 2018 à 09:48, Robert Bradshaw  a
écrit :

> On Fri, Sep 14, 2018 at 8:00 AM Romain Manni-Bucau 
> wrote:
>
>> Well IBM runner is outside Beam for instance so this is not really a
>> point IMHO.
>>
>> My view is simple:
>> 1. does this module bring anything to Beam as a project: I understand
>> your answer as a no (please clarify if I'm wrong)
>>
>
> As has been mentioned, this make it easier for both developers at google
> and developers outside google to contribute, which is the immediate
> benefit. Longer term I also hope it leads to more code sharing (there is
> currently an unnecessary amount of duplication due to the pain of
> developing across this boundary) including of features that aren't yet in
> upstream runners but we'd like to see (e.g. liquid sharding).
>

This is half true, external dev will be able to contribute but not test so
no real gain here.


>
>
>> 2. does this module bring anything to Beam or Big Data users: same answer
>>
>
> Dataflow is used by many Beam users, making it work well is in their
> interest as well. That which makes contributors lives easier (and wastes
> less of their time) will translate into more contributions (new features,
> faster bugfixes, ...) as well.
>

Same point, if you can contribute to something you can't test without mocks
then you still can't work on it reliably.


>
> So at the end this will not bring anything to the community and just solve
>> an google internal design issue so why should it hit Beam?
>> I get the "we can't test it" point but this is wrong since you can use
>> snapshots and staging repos, if not the enhancement is trivial enough to
>> make it doable and not add a dead module to beam tree.
>>
>> Am I missing anything?
>>
>
> While it's true we *can* test this without it being in Beam, as we have
> been doing, it's painful. It's like doing away with presubmits and only
> relying on postsubmits, but where you can't even look at the failure and
> fix it on your local machine. It's a huge time sink for all those involved,
> and not good for transparency or openness (e.g. there are things that only
> googlers can do).
>

This is the case for any vendor impl based on Beam since by design the
dependency is in this direction.


>
> As has been mentioned, we already do this for Flink, Spark, etc. There's
> also a precedent for providing connectors to even non-OSS systems, e.g. we
> ship the job submission portions for Dataflow, IO connectors for Apache
> Kenisis, and an S3 filesystem adapter. It certainly wouldn't be in our, or
> our users, benefit to remove those.
>

Agree but these are modules which are touching users directly. I.e. if you
have some S3 bucket you grab the module and run it on your database. In the
worker case, you will never do it.


>
> Eventually, as has been mentioned on the other thread, I hope our
> interfaces become stable enough that it's easy to move much if not all of
> this into the respective upstream projects. But that is certainly not the
> case right now.
>

This is likely where investment must be made instead of working it around
making beam bigger, increase its maintenance cost for the community without
real gain and harder to enter in.


>
> Hopefully this helps answer your questions as to the benefits for Beam.
>
>
>> Le ven. 14 sept. 2018 à 07:22, Reuven Lax  a écrit :
>>
>>> Dataflow tests are part of Beam post submit, and if a PR breaks the
>>> Dataflow runner it will probably be rolled back. Today Beam contributors
>>> that make changes impacting the runner boundary have no way to make those
>>> changes without breaking Dataflow (unless they as a Googler to help them).
>>> Fortunately these are not the most common changes, but they happen, and
>>> it's caused a lot of pain in the past.
>>>
>>> Putting this code into the github repository allows all runners to be
>>> modified when such a change is made, not just the non-Dataflow runners.
>>> This makes it easier for contributors and committers to make changes to
>>> Beam.
>>>
>>> Reuven
>>>
>>> On Thu, Sep 13, 2018 at 10:08 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Flink, Spark, Apex are usable since they are OS so you grab them+beam
 and you "run".
 If I grab dataflow worker + X OS project and "run" it is the same,
 however if I grab dataflow worker and cant do anything with it, the added
 value for Beam and users is pretty null, no? Just means Google should find
 another way to manage this dependency if this is the case IMHO.

 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 


 Le jeu. 13 sept. 2018 à 23:35, Lukasz Cwik  a écrit :

> Romain, the code 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Stephan Ewen
+1 (non googler)

I think this is actually a nice move.

Even if there is no immediate end-user benefit (no one can directly run
it), it will probably be good and valuable code for other runners to learn
and borrow from, so there is benefit for other developers. Plus, it eases
the life of some other developers, and I am thinking of the individuals
here.

The community is both users and developers, so if something makes things
better for developers, and not worse for users, I see no problem with that.


On Fri, Sep 14, 2018 at 9:47 AM, Robert Bradshaw 
wrote:

> On Fri, Sep 14, 2018 at 8:00 AM Romain Manni-Bucau 
> wrote:
>
>> Well IBM runner is outside Beam for instance so this is not really a
>> point IMHO.
>>
>> My view is simple:
>> 1. does this module bring anything to Beam as a project: I understand
>> your answer as a no (please clarify if I'm wrong)
>>
>
> As has been mentioned, this make it easier for both developers at google
> and developers outside google to contribute, which is the immediate
> benefit. Longer term I also hope it leads to more code sharing (there is
> currently an unnecessary amount of duplication due to the pain of
> developing across this boundary) including of features that aren't yet in
> upstream runners but we'd like to see (e.g. liquid sharding).
>
>
>> 2. does this module bring anything to Beam or Big Data users: same answer
>>
>
> Dataflow is used by many Beam users, making it work well is in their
> interest as well. That which makes contributors lives easier (and wastes
> less of their time) will translate into more contributions (new features,
> faster bugfixes, ...) as well.
>
> So at the end this will not bring anything to the community and just solve
>> an google internal design issue so why should it hit Beam?
>> I get the "we can't test it" point but this is wrong since you can use
>> snapshots and staging repos, if not the enhancement is trivial enough to
>> make it doable and not add a dead module to beam tree.
>>
>> Am I missing anything?
>>
>
> While it's true we *can* test this without it being in Beam, as we have
> been doing, it's painful. It's like doing away with presubmits and only
> relying on postsubmits, but where you can't even look at the failure and
> fix it on your local machine. It's a huge time sink for all those involved,
> and not good for transparency or openness (e.g. there are things that only
> googlers can do).
>
> As has been mentioned, we already do this for Flink, Spark, etc. There's
> also a precedent for providing connectors to even non-OSS systems, e.g. we
> ship the job submission portions for Dataflow, IO connectors for Apache
> Kenisis, and an S3 filesystem adapter. It certainly wouldn't be in our, or
> our users, benefit to remove those.
>
> Eventually, as has been mentioned on the other thread, I hope our
> interfaces become stable enough that it's easy to move much if not all of
> this into the respective upstream projects. But that is certainly not the
> case right now.
>
> Hopefully this helps answer your questions as to the benefits for Beam.
>
>
>> Le ven. 14 sept. 2018 à 07:22, Reuven Lax  a écrit :
>>
>>> Dataflow tests are part of Beam post submit, and if a PR breaks the
>>> Dataflow runner it will probably be rolled back. Today Beam contributors
>>> that make changes impacting the runner boundary have no way to make those
>>> changes without breaking Dataflow (unless they as a Googler to help them).
>>> Fortunately these are not the most common changes, but they happen, and
>>> it's caused a lot of pain in the past.
>>>
>>> Putting this code into the github repository allows all runners to be
>>> modified when such a change is made, not just the non-Dataflow runners.
>>> This makes it easier for contributors and committers to make changes to
>>> Beam.
>>>
>>> Reuven
>>>
>>> On Thu, Sep 13, 2018 at 10:08 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Flink, Spark, Apex are usable since they are OS so you grab them+beam
 and you "run".
 If I grab dataflow worker + X OS project and "run" it is the same,
 however if I grab dataflow worker and cant do anything with it, the added
 value for Beam and users is pretty null, no? Just means Google should find
 another way to manage this dependency if this is the case IMHO.

 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 


 Le jeu. 13 sept. 2018 à 23:35, Lukasz Cwik  a écrit :

> Romain, the code is very similar to the adaptation layer between the
> shared libraries part of Apache Beam and any other runner, for example the
> code within runners/spark or runners/apex or runners/flink.
> 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Robert Bradshaw
On Fri, Sep 14, 2018 at 8:00 AM Romain Manni-Bucau 
wrote:

> Well IBM runner is outside Beam for instance so this is not really a point
> IMHO.
>
> My view is simple:
> 1. does this module bring anything to Beam as a project: I understand your
> answer as a no (please clarify if I'm wrong)
>

As has been mentioned, this make it easier for both developers at google
and developers outside google to contribute, which is the immediate
benefit. Longer term I also hope it leads to more code sharing (there is
currently an unnecessary amount of duplication due to the pain of
developing across this boundary) including of features that aren't yet in
upstream runners but we'd like to see (e.g. liquid sharding).


> 2. does this module bring anything to Beam or Big Data users: same answer
>

Dataflow is used by many Beam users, making it work well is in their
interest as well. That which makes contributors lives easier (and wastes
less of their time) will translate into more contributions (new features,
faster bugfixes, ...) as well.

So at the end this will not bring anything to the community and just solve
> an google internal design issue so why should it hit Beam?
> I get the "we can't test it" point but this is wrong since you can use
> snapshots and staging repos, if not the enhancement is trivial enough to
> make it doable and not add a dead module to beam tree.
>
> Am I missing anything?
>

While it's true we *can* test this without it being in Beam, as we have
been doing, it's painful. It's like doing away with presubmits and only
relying on postsubmits, but where you can't even look at the failure and
fix it on your local machine. It's a huge time sink for all those involved,
and not good for transparency or openness (e.g. there are things that only
googlers can do).

As has been mentioned, we already do this for Flink, Spark, etc. There's
also a precedent for providing connectors to even non-OSS systems, e.g. we
ship the job submission portions for Dataflow, IO connectors for Apache
Kenisis, and an S3 filesystem adapter. It certainly wouldn't be in our, or
our users, benefit to remove those.

Eventually, as has been mentioned on the other thread, I hope our
interfaces become stable enough that it's easy to move much if not all of
this into the respective upstream projects. But that is certainly not the
case right now.

Hopefully this helps answer your questions as to the benefits for Beam.


> Le ven. 14 sept. 2018 à 07:22, Reuven Lax  a écrit :
>
>> Dataflow tests are part of Beam post submit, and if a PR breaks the
>> Dataflow runner it will probably be rolled back. Today Beam contributors
>> that make changes impacting the runner boundary have no way to make those
>> changes without breaking Dataflow (unless they as a Googler to help them).
>> Fortunately these are not the most common changes, but they happen, and
>> it's caused a lot of pain in the past.
>>
>> Putting this code into the github repository allows all runners to be
>> modified when such a change is made, not just the non-Dataflow runners.
>> This makes it easier for contributors and committers to make changes to
>> Beam.
>>
>> Reuven
>>
>> On Thu, Sep 13, 2018 at 10:08 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Flink, Spark, Apex are usable since they are OS so you grab them+beam
>>> and you "run".
>>> If I grab dataflow worker + X OS project and "run" it is the same,
>>> however if I grab dataflow worker and cant do anything with it, the added
>>> value for Beam and users is pretty null, no? Just means Google should find
>>> another way to manage this dependency if this is the case IMHO.
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>>
>>> Le jeu. 13 sept. 2018 à 23:35, Lukasz Cwik  a écrit :
>>>
 Romain, the code is very similar to the adaptation layer between the
 shared libraries part of Apache Beam and any other runner, for example the
 code within runners/spark or runners/apex or runners/flink.
 If someone wanted to build an emulator of the Dataflow service, they
 would be able to re-use them but that is as impractical as writing an
 emulator for Flink or Spark and plugging them in as the dependency for
 runners/flink and runners/spark respectively.

 On Thu, Sep 13, 2018 at 2:07 PM Raghu Angadi 
 wrote:

> On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> If usable by itself without google karma (can you use a worker
>> without dataflow itself?) it sounds awesome otherwise it sounds weird 
>> IMHO.
>>
>
> Can you elaborate a bit more on using worker without dataflow? I

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-14 Thread Romain Manni-Bucau
Well IBM runner is outside Beam for instance so this is not really a point
IMHO.

My view is simple:
1. does this module bring anything to Beam as a project: I understand your
answer as a no (please clarify if I'm wrong)
2. does this module bring anything to Beam or Big Data users: same answer

So at the end this will not bring anything to the community and just solve
an google internal design issue so why should it hit Beam?
I get the "we can't test it" point but this is wrong since you can use
snapshots and staging repos, if not the enhancement is trivial enough to
make it doable and not add a dead module to beam tree.

Am I missing anything?

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le ven. 14 sept. 2018 à 07:22, Reuven Lax  a écrit :

> Dataflow tests are part of Beam post submit, and if a PR breaks the
> Dataflow runner it will probably be rolled back. Today Beam contributors
> that make changes impacting the runner boundary have no way to make those
> changes without breaking Dataflow (unless they as a Googler to help them).
> Fortunately these are not the most common changes, but they happen, and
> it's caused a lot of pain in the past.
>
> Putting this code into the github repository allows all runners to be
> modified when such a change is made, not just the non-Dataflow runners.
> This makes it easier for contributors and committers to make changes to
> Beam.
>
> Reuven
>
> On Thu, Sep 13, 2018 at 10:08 PM Romain Manni-Bucau 
> wrote:
>
>> Flink, Spark, Apex are usable since they are OS so you grab them+beam and
>> you "run".
>> If I grab dataflow worker + X OS project and "run" it is the same,
>> however if I grab dataflow worker and cant do anything with it, the added
>> value for Beam and users is pretty null, no? Just means Google should find
>> another way to manage this dependency if this is the case IMHO.
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>>
>> Le jeu. 13 sept. 2018 à 23:35, Lukasz Cwik  a écrit :
>>
>>> Romain, the code is very similar to the adaptation layer between the
>>> shared libraries part of Apache Beam and any other runner, for example the
>>> code within runners/spark or runners/apex or runners/flink.
>>> If someone wanted to build an emulator of the Dataflow service, they
>>> would be able to re-use them but that is as impractical as writing an
>>> emulator for Flink or Spark and plugging them in as the dependency for
>>> runners/flink and runners/spark respectively.
>>>
>>> On Thu, Sep 13, 2018 at 2:07 PM Raghu Angadi  wrote:
>>>
 On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> If usable by itself without google karma (can you use a worker without
> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.
>

 Can you elaborate a bit more on using worker without dataflow? I
 essentially  see that as o part of Dataflow runner. A runner is specific to
 a platform.

 I am a Googler, but commenting as a community member.

 Raghu.

>
> Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :
>
>> +1 (non googler)
>>
>> big help for transparency and for future runners.
>>
>> Best,
>> Kai
>>
>> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>>
>>> Big +1 (non-googler).
>>>
>>> From Samza Runner's perspective, we are very happy to see dataflow
>>> worker code so we can learn and compete :).
>>>
>>> Thanks,
>>> Xinyu
>>>
>>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi <
>>> suneel.mar...@gmail.com> wrote:
>>>
 +1 (non-googler)

 This is a great  move

 Sent from my iPhone

 On Sep 13, 2018, at 2:25 PM, Tim Robertson <
 timrobertson...@gmail.com> wrote:

 +1 (non googler)
 It sounds pragmatic, helps with transparency should issues arise
 and enables more people to fix.


 On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
 wrote:

> From my perspective as a (non-Google) community member, huge +1.
>
> I don't see anything bad for the community about open sourcing
> more of the probably-most-used runner. While the DirectRunner is 
> probably
> still the most referential implementation of Beam, can't hurt 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Reuven Lax
Dataflow tests are part of Beam post submit, and if a PR breaks the
Dataflow runner it will probably be rolled back. Today Beam contributors
that make changes impacting the runner boundary have no way to make those
changes without breaking Dataflow (unless they as a Googler to help them).
Fortunately these are not the most common changes, but they happen, and
it's caused a lot of pain in the past.

Putting this code into the github repository allows all runners to be
modified when such a change is made, not just the non-Dataflow runners.
This makes it easier for contributors and committers to make changes to
Beam.

Reuven

On Thu, Sep 13, 2018 at 10:08 PM Romain Manni-Bucau 
wrote:

> Flink, Spark, Apex are usable since they are OS so you grab them+beam and
> you "run".
> If I grab dataflow worker + X OS project and "run" it is the same, however
> if I grab dataflow worker and cant do anything with it, the added value for
> Beam and users is pretty null, no? Just means Google should find another
> way to manage this dependency if this is the case IMHO.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
>
> Le jeu. 13 sept. 2018 à 23:35, Lukasz Cwik  a écrit :
>
>> Romain, the code is very similar to the adaptation layer between the
>> shared libraries part of Apache Beam and any other runner, for example the
>> code within runners/spark or runners/apex or runners/flink.
>> If someone wanted to build an emulator of the Dataflow service, they
>> would be able to re-use them but that is as impractical as writing an
>> emulator for Flink or Spark and plugging them in as the dependency for
>> runners/flink and runners/spark respectively.
>>
>> On Thu, Sep 13, 2018 at 2:07 PM Raghu Angadi  wrote:
>>
>>> On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 If usable by itself without google karma (can you use a worker without
 dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.

>>>
>>> Can you elaborate a bit more on using worker without dataflow? I
>>> essentially  see that as o part of Dataflow runner. A runner is specific to
>>> a platform.
>>>
>>> I am a Googler, but commenting as a community member.
>>>
>>> Raghu.
>>>

 Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :

> +1 (non googler)
>
> big help for transparency and for future runners.
>
> Best,
> Kai
>
> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>
>> Big +1 (non-googler).
>>
>> From Samza Runner's perspective, we are very happy to see dataflow
>> worker code so we can learn and compete :).
>>
>> Thanks,
>> Xinyu
>>
>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi <
>> suneel.mar...@gmail.com> wrote:
>>
>>> +1 (non-googler)
>>>
>>> This is a great  move
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 13, 2018, at 2:25 PM, Tim Robertson <
>>> timrobertson...@gmail.com> wrote:
>>>
>>> +1 (non googler)
>>> It sounds pragmatic, helps with transparency should issues arise and
>>> enables more people to fix.
>>>
>>>
>>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
>>> wrote:
>>>
 From my perspective as a (non-Google) community member, huge +1.

 I don't see anything bad for the community about open sourcing more
 of the probably-most-used runner. While the DirectRunner is probably 
 still
 the most referential implementation of Beam, can't hurt to see more 
 working
 code. Other runners or runner implementors can refer to this code if 
 they
 want, and ignore it if they don't.

 In terms of having more code and tests to support, well, that's par
 for the course. Will this change make the things that need to be done 
 to
 support them more obvious? (E.g., "this PR is blocked because someone 
 at
 Google on Dataflow team has to fix something" vs "this PR is blocked
 because the Apache Beam code in foo/bar/baz is failing, and anyone who 
 can
 see the code can fix it"). The latter seems like a clear win for the
 community.

 (As long as the code donation is handled properly, but that's
 completely orthogonal and I have no reason to think it wouldn't be.)

 Thanks,
 Dan

 On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik 
 wrote:

> Yes, I'm specifically asking the community for opinions as to
> whether it should be accepted or not.
>
> On Thu, Sep 13, 2018 at 10:51 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Romain Manni-Bucau
Flink, Spark, Apex are usable since they are OS so you grab them+beam and
you "run".
If I grab dataflow worker + X OS project and "run" it is the same, however
if I grab dataflow worker and cant do anything with it, the added value for
Beam and users is pretty null, no? Just means Google should find another
way to manage this dependency if this is the case IMHO.

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le jeu. 13 sept. 2018 à 23:35, Lukasz Cwik  a écrit :

> Romain, the code is very similar to the adaptation layer between the
> shared libraries part of Apache Beam and any other runner, for example the
> code within runners/spark or runners/apex or runners/flink.
> If someone wanted to build an emulator of the Dataflow service, they would
> be able to re-use them but that is as impractical as writing an emulator
> for Flink or Spark and plugging them in as the dependency for runners/flink
> and runners/spark respectively.
>
> On Thu, Sep 13, 2018 at 2:07 PM Raghu Angadi  wrote:
>
>> On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> If usable by itself without google karma (can you use a worker without
>>> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.
>>>
>>
>> Can you elaborate a bit more on using worker without dataflow? I
>> essentially  see that as o part of Dataflow runner. A runner is specific to
>> a platform.
>>
>> I am a Googler, but commenting as a community member.
>>
>> Raghu.
>>
>>>
>>> Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :
>>>
 +1 (non googler)

 big help for transparency and for future runners.

 Best,
 Kai

 On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:

> Big +1 (non-googler).
>
> From Samza Runner's perspective, we are very happy to see dataflow
> worker code so we can learn and compete :).
>
> Thanks,
> Xinyu
>
> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi <
> suneel.mar...@gmail.com> wrote:
>
>> +1 (non-googler)
>>
>> This is a great  move
>>
>> Sent from my iPhone
>>
>> On Sep 13, 2018, at 2:25 PM, Tim Robertson 
>> wrote:
>>
>> +1 (non googler)
>> It sounds pragmatic, helps with transparency should issues arise and
>> enables more people to fix.
>>
>>
>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
>> wrote:
>>
>>> From my perspective as a (non-Google) community member, huge +1.
>>>
>>> I don't see anything bad for the community about open sourcing more
>>> of the probably-most-used runner. While the DirectRunner is probably 
>>> still
>>> the most referential implementation of Beam, can't hurt to see more 
>>> working
>>> code. Other runners or runner implementors can refer to this code if 
>>> they
>>> want, and ignore it if they don't.
>>>
>>> In terms of having more code and tests to support, well, that's par
>>> for the course. Will this change make the things that need to be done to
>>> support them more obvious? (E.g., "this PR is blocked because someone at
>>> Google on Dataflow team has to fix something" vs "this PR is blocked
>>> because the Apache Beam code in foo/bar/baz is failing, and anyone who 
>>> can
>>> see the code can fix it"). The latter seems like a clear win for the
>>> community.
>>>
>>> (As long as the code donation is handled properly, but that's
>>> completely orthogonal and I have no reason to think it wouldn't be.)
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik 
>>> wrote:
>>>
 Yes, I'm specifically asking the community for opinions as to
 whether it should be accepted or not.

 On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
 wrote:

> This is terrific!
>
> Is thread asking for opinions from the community about if it
> should be accepted? Assuming Google side decision is made to 
> contribute,
> big +1 from me to include it next to other runners.
>
> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik 
> wrote:
>
>> At Google we have been importing the Apache Beam code base and
>> integrating it with the Google portion of the codebase that supports 
>> the
>> Dataflow worker. This process is painful as we regularly are making
>> breaking API changes to support libraries related to running portable
>> pipelines (and sometimes in other places as well). This has made it
>> sometimes difficult for PR changes to make changes 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Lukasz Cwik
Romain, the code is very similar to the adaptation layer between the shared
libraries part of Apache Beam and any other runner, for example the code
within runners/spark or runners/apex or runners/flink.
If someone wanted to build an emulator of the Dataflow service, they would
be able to re-use them but that is as impractical as writing an emulator
for Flink or Spark and plugging them in as the dependency for runners/flink
and runners/spark respectively.

On Thu, Sep 13, 2018 at 2:07 PM Raghu Angadi  wrote:

> On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau 
> wrote:
>
>> If usable by itself without google karma (can you use a worker without
>> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.
>>
>
> Can you elaborate a bit more on using worker without dataflow? I
> essentially  see that as o part of Dataflow runner. A runner is specific to
> a platform.
>
> I am a Googler, but commenting as a community member.
>
> Raghu.
>
>>
>> Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :
>>
>>> +1 (non googler)
>>>
>>> big help for transparency and for future runners.
>>>
>>> Best,
>>> Kai
>>>
>>> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>>>
 Big +1 (non-googler).

 From Samza Runner's perspective, we are very happy to see dataflow
 worker code so we can learn and compete :).

 Thanks,
 Xinyu

 On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
 wrote:

> +1 (non-googler)
>
> This is a great  move
>
> Sent from my iPhone
>
> On Sep 13, 2018, at 2:25 PM, Tim Robertson 
> wrote:
>
> +1 (non googler)
> It sounds pragmatic, helps with transparency should issues arise and
> enables more people to fix.
>
>
> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
> wrote:
>
>> From my perspective as a (non-Google) community member, huge +1.
>>
>> I don't see anything bad for the community about open sourcing more
>> of the probably-most-used runner. While the DirectRunner is probably 
>> still
>> the most referential implementation of Beam, can't hurt to see more 
>> working
>> code. Other runners or runner implementors can refer to this code if they
>> want, and ignore it if they don't.
>>
>> In terms of having more code and tests to support, well, that's par
>> for the course. Will this change make the things that need to be done to
>> support them more obvious? (E.g., "this PR is blocked because someone at
>> Google on Dataflow team has to fix something" vs "this PR is blocked
>> because the Apache Beam code in foo/bar/baz is failing, and anyone who 
>> can
>> see the code can fix it"). The latter seems like a clear win for the
>> community.
>>
>> (As long as the code donation is handled properly, but that's
>> completely orthogonal and I have no reason to think it wouldn't be.)
>>
>> Thanks,
>> Dan
>>
>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik 
>> wrote:
>>
>>> Yes, I'm specifically asking the community for opinions as to
>>> whether it should be accepted or not.
>>>
>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
>>> wrote:
>>>
 This is terrific!

 Is thread asking for opinions from the community about if it should
 be accepted? Assuming Google side decision is made to contribute, big 
 +1
 from me to include it next to other runners.

 On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik 
 wrote:

> At Google we have been importing the Apache Beam code base and
> integrating it with the Google portion of the codebase that supports 
> the
> Dataflow worker. This process is painful as we regularly are making
> breaking API changes to support libraries related to running portable
> pipelines (and sometimes in other places as well). This has made it
> sometimes difficult for PR changes to make changes without either 
> breaking
> something for Google or waiting for a Googler to make the change 
> internally
> (e.g. dependency updates).
>
> This code is very similar to the other integrations that exist for
> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that 
> sits
> on top of an execution engine. There is no super secret awesome stuff 
> as
> this code was already publicly visible in the past when it was part 
> of the
> Google Cloud Dataflow github repo[1].
>
> Process wise the code will need to get approval from Google to be
> donated and for it to go through the code donation process but before 
> we
> attempt to do that, I was wondering whether the community would 
> object to
> adding this code to the master branch?
>
> The up 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Raghu Angadi
On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau 
wrote:

> If usable by itself without google karma (can you use a worker without
> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.
>

Can you elaborate a bit more on using worker without dataflow? I
essentially  see that as o part of Dataflow runner. A runner is specific to
a platform.

I am a Googler, but commenting as a community member.

Raghu.

>
> Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :
>
>> +1 (non googler)
>>
>> big help for transparency and for future runners.
>>
>> Best,
>> Kai
>>
>> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>>
>>> Big +1 (non-googler).
>>>
>>> From Samza Runner's perspective, we are very happy to see dataflow
>>> worker code so we can learn and compete :).
>>>
>>> Thanks,
>>> Xinyu
>>>
>>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
>>> wrote:
>>>
 +1 (non-googler)

 This is a great  move

 Sent from my iPhone

 On Sep 13, 2018, at 2:25 PM, Tim Robertson 
 wrote:

 +1 (non googler)
 It sounds pragmatic, helps with transparency should issues arise and
 enables more people to fix.


 On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
 wrote:

> From my perspective as a (non-Google) community member, huge +1.
>
> I don't see anything bad for the community about open sourcing more of
> the probably-most-used runner. While the DirectRunner is probably still 
> the
> most referential implementation of Beam, can't hurt to see more working
> code. Other runners or runner implementors can refer to this code if they
> want, and ignore it if they don't.
>
> In terms of having more code and tests to support, well, that's par
> for the course. Will this change make the things that need to be done to
> support them more obvious? (E.g., "this PR is blocked because someone at
> Google on Dataflow team has to fix something" vs "this PR is blocked
> because the Apache Beam code in foo/bar/baz is failing, and anyone who can
> see the code can fix it"). The latter seems like a clear win for the
> community.
>
> (As long as the code donation is handled properly, but that's
> completely orthogonal and I have no reason to think it wouldn't be.)
>
> Thanks,
> Dan
>
> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>
>> Yes, I'm specifically asking the community for opinions as to whether
>> it should be accepted or not.
>>
>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
>> wrote:
>>
>>> This is terrific!
>>>
>>> Is thread asking for opinions from the community about if it should
>>> be accepted? Assuming Google side decision is made to contribute, big +1
>>> from me to include it next to other runners.
>>>
>>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik 
>>> wrote:
>>>
 At Google we have been importing the Apache Beam code base and
 integrating it with the Google portion of the codebase that supports 
 the
 Dataflow worker. This process is painful as we regularly are making
 breaking API changes to support libraries related to running portable
 pipelines (and sometimes in other places as well). This has made it
 sometimes difficult for PR changes to make changes without either 
 breaking
 something for Google or waiting for a Googler to make the change 
 internally
 (e.g. dependency updates).

 This code is very similar to the other integrations that exist for
 runners such as Flink/Spark/Apex/Samza. It is an adaption layer that 
 sits
 on top of an execution engine. There is no super secret awesome stuff 
 as
 this code was already publicly visible in the past when it was part of 
 the
 Google Cloud Dataflow github repo[1].

 Process wise the code will need to get approval from Google to be
 donated and for it to go through the code donation process but before 
 we
 attempt to do that, I was wondering whether the community would object 
 to
 adding this code to the master branch?

 The up side is that people can make breaking changes and fix it for
 all runners. It will also help Googlers contribute more to the 
 portability
 story as it will remove the burden of doing the code import (wasted 
 time)
 and it will allow people to develop in master (can have the whole 
 project
 loaded in a single IDE).

 The downsides are that this will represent more code and unit tests
 to support.

 1:
 https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker

>>>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Andrew Psaltis
Big +1 (non googler)

Great help for transparency, future runners, learning, etc...

On Thu, Sep 13, 2018 at 4:08 PM Andrew Pilloud  wrote:

> +1
>
> On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau 
> wrote:
>
>> If usable by itself without google karma (can you use a worker without
>> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.
>>
>> Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :
>>
>>> +1 (non googler)
>>>
>>> big help for transparency and for future runners.
>>>
>>> Best,
>>> Kai
>>>
>>> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>>>
 Big +1 (non-googler).

 From Samza Runner's perspective, we are very happy to see dataflow
 worker code so we can learn and compete :).

 Thanks,
 Xinyu

 On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
 wrote:

> +1 (non-googler)
>
> This is a great  move
>
> Sent from my iPhone
>
> On Sep 13, 2018, at 2:25 PM, Tim Robertson 
> wrote:
>
> +1 (non googler)
> It sounds pragmatic, helps with transparency should issues arise and
> enables more people to fix.
>
>
> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
> wrote:
>
>> From my perspective as a (non-Google) community member, huge +1.
>>
>> I don't see anything bad for the community about open sourcing more
>> of the probably-most-used runner. While the DirectRunner is probably 
>> still
>> the most referential implementation of Beam, can't hurt to see more 
>> working
>> code. Other runners or runner implementors can refer to this code if they
>> want, and ignore it if they don't.
>>
>> In terms of having more code and tests to support, well, that's par
>> for the course. Will this change make the things that need to be done to
>> support them more obvious? (E.g., "this PR is blocked because someone at
>> Google on Dataflow team has to fix something" vs "this PR is blocked
>> because the Apache Beam code in foo/bar/baz is failing, and anyone who 
>> can
>> see the code can fix it"). The latter seems like a clear win for the
>> community.
>>
>> (As long as the code donation is handled properly, but that's
>> completely orthogonal and I have no reason to think it wouldn't be.)
>>
>> Thanks,
>> Dan
>>
>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik 
>> wrote:
>>
>>> Yes, I'm specifically asking the community for opinions as to
>>> whether it should be accepted or not.
>>>
>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
>>> wrote:
>>>
 This is terrific!

 Is thread asking for opinions from the community about if it should
 be accepted? Assuming Google side decision is made to contribute, big 
 +1
 from me to include it next to other runners.

 On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik 
 wrote:

> At Google we have been importing the Apache Beam code base and
> integrating it with the Google portion of the codebase that supports 
> the
> Dataflow worker. This process is painful as we regularly are making
> breaking API changes to support libraries related to running portable
> pipelines (and sometimes in other places as well). This has made it
> sometimes difficult for PR changes to make changes without either 
> breaking
> something for Google or waiting for a Googler to make the change 
> internally
> (e.g. dependency updates).
>
> This code is very similar to the other integrations that exist for
> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that 
> sits
> on top of an execution engine. There is no super secret awesome stuff 
> as
> this code was already publicly visible in the past when it was part 
> of the
> Google Cloud Dataflow github repo[1].
>
> Process wise the code will need to get approval from Google to be
> donated and for it to go through the code donation process but before 
> we
> attempt to do that, I was wondering whether the community would 
> object to
> adding this code to the master branch?
>
> The up side is that people can make breaking changes and fix it
> for all runners. It will also help Googlers contribute more to the
> portability story as it will remove the burden of doing the code 
> import
> (wasted time) and it will allow people to develop in master (can have 
> the
> whole project loaded in a single IDE).
>
> The downsides are that this will represent more code and unit
> tests to support.
>
> 1:
> 

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Andrew Pilloud
+1

On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau 
wrote:

> If usable by itself without google karma (can you use a worker without
> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.
>
> Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :
>
>> +1 (non googler)
>>
>> big help for transparency and for future runners.
>>
>> Best,
>> Kai
>>
>> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>>
>>> Big +1 (non-googler).
>>>
>>> From Samza Runner's perspective, we are very happy to see dataflow
>>> worker code so we can learn and compete :).
>>>
>>> Thanks,
>>> Xinyu
>>>
>>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
>>> wrote:
>>>
 +1 (non-googler)

 This is a great  move

 Sent from my iPhone

 On Sep 13, 2018, at 2:25 PM, Tim Robertson 
 wrote:

 +1 (non googler)
 It sounds pragmatic, helps with transparency should issues arise and
 enables more people to fix.


 On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
 wrote:

> From my perspective as a (non-Google) community member, huge +1.
>
> I don't see anything bad for the community about open sourcing more of
> the probably-most-used runner. While the DirectRunner is probably still 
> the
> most referential implementation of Beam, can't hurt to see more working
> code. Other runners or runner implementors can refer to this code if they
> want, and ignore it if they don't.
>
> In terms of having more code and tests to support, well, that's par
> for the course. Will this change make the things that need to be done to
> support them more obvious? (E.g., "this PR is blocked because someone at
> Google on Dataflow team has to fix something" vs "this PR is blocked
> because the Apache Beam code in foo/bar/baz is failing, and anyone who can
> see the code can fix it"). The latter seems like a clear win for the
> community.
>
> (As long as the code donation is handled properly, but that's
> completely orthogonal and I have no reason to think it wouldn't be.)
>
> Thanks,
> Dan
>
> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>
>> Yes, I'm specifically asking the community for opinions as to whether
>> it should be accepted or not.
>>
>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
>> wrote:
>>
>>> This is terrific!
>>>
>>> Is thread asking for opinions from the community about if it should
>>> be accepted? Assuming Google side decision is made to contribute, big +1
>>> from me to include it next to other runners.
>>>
>>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik 
>>> wrote:
>>>
 At Google we have been importing the Apache Beam code base and
 integrating it with the Google portion of the codebase that supports 
 the
 Dataflow worker. This process is painful as we regularly are making
 breaking API changes to support libraries related to running portable
 pipelines (and sometimes in other places as well). This has made it
 sometimes difficult for PR changes to make changes without either 
 breaking
 something for Google or waiting for a Googler to make the change 
 internally
 (e.g. dependency updates).

 This code is very similar to the other integrations that exist for
 runners such as Flink/Spark/Apex/Samza. It is an adaption layer that 
 sits
 on top of an execution engine. There is no super secret awesome stuff 
 as
 this code was already publicly visible in the past when it was part of 
 the
 Google Cloud Dataflow github repo[1].

 Process wise the code will need to get approval from Google to be
 donated and for it to go through the code donation process but before 
 we
 attempt to do that, I was wondering whether the community would object 
 to
 adding this code to the master branch?

 The up side is that people can make breaking changes and fix it for
 all runners. It will also help Googlers contribute more to the 
 portability
 story as it will remove the burden of doing the code import (wasted 
 time)
 and it will allow people to develop in master (can have the whole 
 project
 loaded in a single IDE).

 The downsides are that this will represent more code and unit tests
 to support.

 1:
 https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker

>>>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Romain Manni-Bucau
If usable by itself without google karma (can you use a worker without
dataflow itself?) it sounds awesome otherwise it sounds weird IMHO.

Le jeu. 13 sept. 2018 21:36, Kai Jiang  a écrit :

> +1 (non googler)
>
> big help for transparency and for future runners.
>
> Best,
> Kai
>
> On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:
>
>> Big +1 (non-googler).
>>
>> From Samza Runner's perspective, we are very happy to see dataflow worker
>> code so we can learn and compete :).
>>
>> Thanks,
>> Xinyu
>>
>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
>> wrote:
>>
>>> +1 (non-googler)
>>>
>>> This is a great  move
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 13, 2018, at 2:25 PM, Tim Robertson 
>>> wrote:
>>>
>>> +1 (non googler)
>>> It sounds pragmatic, helps with transparency should issues arise and
>>> enables more people to fix.
>>>
>>>
>>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin 
>>> wrote:
>>>
 From my perspective as a (non-Google) community member, huge +1.

 I don't see anything bad for the community about open sourcing more of
 the probably-most-used runner. While the DirectRunner is probably still the
 most referential implementation of Beam, can't hurt to see more working
 code. Other runners or runner implementors can refer to this code if they
 want, and ignore it if they don't.

 In terms of having more code and tests to support, well, that's par for
 the course. Will this change make the things that need to be done to
 support them more obvious? (E.g., "this PR is blocked because someone at
 Google on Dataflow team has to fix something" vs "this PR is blocked
 because the Apache Beam code in foo/bar/baz is failing, and anyone who can
 see the code can fix it"). The latter seems like a clear win for the
 community.

 (As long as the code donation is handled properly, but that's
 completely orthogonal and I have no reason to think it wouldn't be.)

 Thanks,
 Dan

 On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:

> Yes, I'm specifically asking the community for opinions as to whether
> it should be accepted or not.
>
> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
> wrote:
>
>> This is terrific!
>>
>> Is thread asking for opinions from the community about if it should
>> be accepted? Assuming Google side decision is made to contribute, big +1
>> from me to include it next to other runners.
>>
>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik 
>> wrote:
>>
>>> At Google we have been importing the Apache Beam code base and
>>> integrating it with the Google portion of the codebase that supports the
>>> Dataflow worker. This process is painful as we regularly are making
>>> breaking API changes to support libraries related to running portable
>>> pipelines (and sometimes in other places as well). This has made it
>>> sometimes difficult for PR changes to make changes without either 
>>> breaking
>>> something for Google or waiting for a Googler to make the change 
>>> internally
>>> (e.g. dependency updates).
>>>
>>> This code is very similar to the other integrations that exist for
>>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that 
>>> sits
>>> on top of an execution engine. There is no super secret awesome stuff as
>>> this code was already publicly visible in the past when it was part of 
>>> the
>>> Google Cloud Dataflow github repo[1].
>>>
>>> Process wise the code will need to get approval from Google to be
>>> donated and for it to go through the code donation process but before we
>>> attempt to do that, I was wondering whether the community would object 
>>> to
>>> adding this code to the master branch?
>>>
>>> The up side is that people can make breaking changes and fix it for
>>> all runners. It will also help Googlers contribute more to the 
>>> portability
>>> story as it will remove the burden of doing the code import (wasted 
>>> time)
>>> and it will allow people to develop in master (can have the whole 
>>> project
>>> loaded in a single IDE).
>>>
>>> The downsides are that this will represent more code and unit tests
>>> to support.
>>>
>>> 1:
>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>>>
>>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Kai Jiang
+1 (non googler)

big help for transparency and for future runners.

Best,
Kai

On Thu, Sep 13, 2018, 11:45 Xinyu Liu  wrote:

> Big +1 (non-googler).
>
> From Samza Runner's perspective, we are very happy to see dataflow worker
> code so we can learn and compete :).
>
> Thanks,
> Xinyu
>
> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
> wrote:
>
>> +1 (non-googler)
>>
>> This is a great  move
>>
>> Sent from my iPhone
>>
>> On Sep 13, 2018, at 2:25 PM, Tim Robertson 
>> wrote:
>>
>> +1 (non googler)
>> It sounds pragmatic, helps with transparency should issues arise and
>> enables more people to fix.
>>
>>
>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin  wrote:
>>
>>> From my perspective as a (non-Google) community member, huge +1.
>>>
>>> I don't see anything bad for the community about open sourcing more of
>>> the probably-most-used runner. While the DirectRunner is probably still the
>>> most referential implementation of Beam, can't hurt to see more working
>>> code. Other runners or runner implementors can refer to this code if they
>>> want, and ignore it if they don't.
>>>
>>> In terms of having more code and tests to support, well, that's par for
>>> the course. Will this change make the things that need to be done to
>>> support them more obvious? (E.g., "this PR is blocked because someone at
>>> Google on Dataflow team has to fix something" vs "this PR is blocked
>>> because the Apache Beam code in foo/bar/baz is failing, and anyone who can
>>> see the code can fix it"). The latter seems like a clear win for the
>>> community.
>>>
>>> (As long as the code donation is handled properly, but that's completely
>>> orthogonal and I have no reason to think it wouldn't be.)
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>>>
 Yes, I'm specifically asking the community for opinions as to whether
 it should be accepted or not.

 On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
 wrote:

> This is terrific!
>
> Is thread asking for opinions from the community about if it should be
> accepted? Assuming Google side decision is made to contribute, big +1 from
> me to include it next to other runners.
>
> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:
>
>> At Google we have been importing the Apache Beam code base and
>> integrating it with the Google portion of the codebase that supports the
>> Dataflow worker. This process is painful as we regularly are making
>> breaking API changes to support libraries related to running portable
>> pipelines (and sometimes in other places as well). This has made it
>> sometimes difficult for PR changes to make changes without either 
>> breaking
>> something for Google or waiting for a Googler to make the change 
>> internally
>> (e.g. dependency updates).
>>
>> This code is very similar to the other integrations that exist for
>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
>> on top of an execution engine. There is no super secret awesome stuff as
>> this code was already publicly visible in the past when it was part of 
>> the
>> Google Cloud Dataflow github repo[1].
>>
>> Process wise the code will need to get approval from Google to be
>> donated and for it to go through the code donation process but before we
>> attempt to do that, I was wondering whether the community would object to
>> adding this code to the master branch?
>>
>> The up side is that people can make breaking changes and fix it for
>> all runners. It will also help Googlers contribute more to the 
>> portability
>> story as it will remove the burden of doing the code import (wasted time)
>> and it will allow people to develop in master (can have the whole project
>> loaded in a single IDE).
>>
>> The downsides are that this will represent more code and unit tests
>> to support.
>>
>> 1:
>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>>
>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Xinyu Liu
Big +1 (non-googler).

>From Samza Runner's perspective, we are very happy to see dataflow worker
code so we can learn and compete :).

Thanks,
Xinyu

On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi 
wrote:

> +1 (non-googler)
>
> This is a great  move
>
> Sent from my iPhone
>
> On Sep 13, 2018, at 2:25 PM, Tim Robertson 
> wrote:
>
> +1 (non googler)
> It sounds pragmatic, helps with transparency should issues arise and
> enables more people to fix.
>
>
> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin  wrote:
>
>> From my perspective as a (non-Google) community member, huge +1.
>>
>> I don't see anything bad for the community about open sourcing more of
>> the probably-most-used runner. While the DirectRunner is probably still the
>> most referential implementation of Beam, can't hurt to see more working
>> code. Other runners or runner implementors can refer to this code if they
>> want, and ignore it if they don't.
>>
>> In terms of having more code and tests to support, well, that's par for
>> the course. Will this change make the things that need to be done to
>> support them more obvious? (E.g., "this PR is blocked because someone at
>> Google on Dataflow team has to fix something" vs "this PR is blocked
>> because the Apache Beam code in foo/bar/baz is failing, and anyone who can
>> see the code can fix it"). The latter seems like a clear win for the
>> community.
>>
>> (As long as the code donation is handled properly, but that's completely
>> orthogonal and I have no reason to think it wouldn't be.)
>>
>> Thanks,
>> Dan
>>
>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>>
>>> Yes, I'm specifically asking the community for opinions as to whether it
>>> should be accepted or not.
>>>
>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
>>> wrote:
>>>
 This is terrific!

 Is thread asking for opinions from the community about if it should be
 accepted? Assuming Google side decision is made to contribute, big +1 from
 me to include it next to other runners.

 On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:

> At Google we have been importing the Apache Beam code base and
> integrating it with the Google portion of the codebase that supports the
> Dataflow worker. This process is painful as we regularly are making
> breaking API changes to support libraries related to running portable
> pipelines (and sometimes in other places as well). This has made it
> sometimes difficult for PR changes to make changes without either breaking
> something for Google or waiting for a Googler to make the change 
> internally
> (e.g. dependency updates).
>
> This code is very similar to the other integrations that exist for
> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
> on top of an execution engine. There is no super secret awesome stuff as
> this code was already publicly visible in the past when it was part of the
> Google Cloud Dataflow github repo[1].
>
> Process wise the code will need to get approval from Google to be
> donated and for it to go through the code donation process but before we
> attempt to do that, I was wondering whether the community would object to
> adding this code to the master branch?
>
> The up side is that people can make breaking changes and fix it for
> all runners. It will also help Googlers contribute more to the portability
> story as it will remove the burden of doing the code import (wasted time)
> and it will allow people to develop in master (can have the whole project
> loaded in a single IDE).
>
> The downsides are that this will represent more code and unit tests to
> support.
>
> 1:
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>



Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Suneel Marthi
+1 (non-googler)

This is a great  move

Sent from my iPhone

> On Sep 13, 2018, at 2:25 PM, Tim Robertson  wrote:
> 
> +1 (non googler)
> It sounds pragmatic, helps with transparency should issues arise and enables 
> more people to fix. 
>  
> 
>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin  wrote:
>> From my perspective as a (non-Google) community member, huge +1.
>> 
>> I don't see anything bad for the community about open sourcing more of the 
>> probably-most-used runner. While the DirectRunner is probably still the most 
>> referential implementation of Beam, can't hurt to see more working code. 
>> Other runners or runner implementors can refer to this code if they want, 
>> and ignore it if they don't.
>> 
>> In terms of having more code and tests to support, well, that's par for the 
>> course. Will this change make the things that need to be done to support 
>> them more obvious? (E.g., "this PR is blocked because someone at Google on 
>> Dataflow team has to fix something" vs "this PR is blocked because the 
>> Apache Beam code in foo/bar/baz is failing, and anyone who can see the code 
>> can fix it"). The latter seems like a clear win for the community.
>> 
>> (As long as the code donation is handled properly, but that's completely 
>> orthogonal and I have no reason to think it wouldn't be.)
>> 
>> Thanks,
>> Dan
>> 
>>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>>> Yes, I'm specifically asking the community for opinions as to whether it 
>>> should be accepted or not.
>>> 
 On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi  wrote:
 This is terrific! 
 
 Is thread asking for opinions from the community about if it should be 
 accepted? Assuming Google side decision is made to contribute, big +1 from 
 me to include it next to other runners. 
 
> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:
> At Google we have been importing the Apache Beam code base and 
> integrating it with the Google portion of the codebase that supports the 
> Dataflow worker. This process is painful as we regularly are making 
> breaking API changes to support libraries related to running portable 
> pipelines (and sometimes in other places as well). This has made it 
> sometimes difficult for PR changes to make changes without either 
> breaking something for Google or waiting for a Googler to make the change 
> internally (e.g. dependency updates).
> 
> This code is very similar to the other integrations that exist for 
> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits 
> on top of an execution engine. There is no super secret awesome stuff as 
> this code was already publicly visible in the past when it was part of 
> the Google Cloud Dataflow github repo[1].
> 
> Process wise the code will need to get approval from Google to be donated 
> and for it to go through the code donation process but before we attempt 
> to do that, I was wondering whether the community would object to adding 
> this code to the master branch?
> 
> The up side is that people can make breaking changes and fix it for all 
> runners. It will also help Googlers contribute more to the portability 
> story as it will remove the burden of doing the code import (wasted time) 
> and it will allow people to develop in master (can have the whole project 
> loaded in a single IDE).
> 
> The downsides are that this will represent more code and unit tests to 
> support.
> 
> 1: 
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Rui Wang
+1

And I think more unit tests is a nice thing than a downside :-)

-Rui

On Thu, Sep 13, 2018 at 11:25 AM Tim Robertson 
wrote:

> +1 (non googler)
> It sounds pragmatic, helps with transparency should issues arise and
> enables more people to fix.
>
>
> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin  wrote:
>
>> From my perspective as a (non-Google) community member, huge +1.
>>
>> I don't see anything bad for the community about open sourcing more of
>> the probably-most-used runner. While the DirectRunner is probably still the
>> most referential implementation of Beam, can't hurt to see more working
>> code. Other runners or runner implementors can refer to this code if they
>> want, and ignore it if they don't.
>>
>> In terms of having more code and tests to support, well, that's par for
>> the course. Will this change make the things that need to be done to
>> support them more obvious? (E.g., "this PR is blocked because someone at
>> Google on Dataflow team has to fix something" vs "this PR is blocked
>> because the Apache Beam code in foo/bar/baz is failing, and anyone who can
>> see the code can fix it"). The latter seems like a clear win for the
>> community.
>>
>> (As long as the code donation is handled properly, but that's completely
>> orthogonal and I have no reason to think it wouldn't be.)
>>
>> Thanks,
>> Dan
>>
>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>>
>>> Yes, I'm specifically asking the community for opinions as to whether it
>>> should be accepted or not.
>>>
>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi 
>>> wrote:
>>>
 This is terrific!

 Is thread asking for opinions from the community about if it should be
 accepted? Assuming Google side decision is made to contribute, big +1 from
 me to include it next to other runners.

 On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:

> At Google we have been importing the Apache Beam code base and
> integrating it with the Google portion of the codebase that supports the
> Dataflow worker. This process is painful as we regularly are making
> breaking API changes to support libraries related to running portable
> pipelines (and sometimes in other places as well). This has made it
> sometimes difficult for PR changes to make changes without either breaking
> something for Google or waiting for a Googler to make the change 
> internally
> (e.g. dependency updates).
>
> This code is very similar to the other integrations that exist for
> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
> on top of an execution engine. There is no super secret awesome stuff as
> this code was already publicly visible in the past when it was part of the
> Google Cloud Dataflow github repo[1].
>
> Process wise the code will need to get approval from Google to be
> donated and for it to go through the code donation process but before we
> attempt to do that, I was wondering whether the community would object to
> adding this code to the master branch?
>
> The up side is that people can make breaking changes and fix it for
> all runners. It will also help Googlers contribute more to the portability
> story as it will remove the burden of doing the code import (wasted time)
> and it will allow people to develop in master (can have the whole project
> loaded in a single IDE).
>
> The downsides are that this will represent more code and unit tests to
> support.
>
> 1:
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>



Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Tim Robertson
+1 (non googler)
It sounds pragmatic, helps with transparency should issues arise and
enables more people to fix.


On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin  wrote:

> From my perspective as a (non-Google) community member, huge +1.
>
> I don't see anything bad for the community about open sourcing more of the
> probably-most-used runner. While the DirectRunner is probably still the
> most referential implementation of Beam, can't hurt to see more working
> code. Other runners or runner implementors can refer to this code if they
> want, and ignore it if they don't.
>
> In terms of having more code and tests to support, well, that's par for
> the course. Will this change make the things that need to be done to
> support them more obvious? (E.g., "this PR is blocked because someone at
> Google on Dataflow team has to fix something" vs "this PR is blocked
> because the Apache Beam code in foo/bar/baz is failing, and anyone who can
> see the code can fix it"). The latter seems like a clear win for the
> community.
>
> (As long as the code donation is handled properly, but that's completely
> orthogonal and I have no reason to think it wouldn't be.)
>
> Thanks,
> Dan
>
> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:
>
>> Yes, I'm specifically asking the community for opinions as to whether it
>> should be accepted or not.
>>
>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi  wrote:
>>
>>> This is terrific!
>>>
>>> Is thread asking for opinions from the community about if it should be
>>> accepted? Assuming Google side decision is made to contribute, big +1 from
>>> me to include it next to other runners.
>>>
>>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:
>>>
 At Google we have been importing the Apache Beam code base and
 integrating it with the Google portion of the codebase that supports the
 Dataflow worker. This process is painful as we regularly are making
 breaking API changes to support libraries related to running portable
 pipelines (and sometimes in other places as well). This has made it
 sometimes difficult for PR changes to make changes without either breaking
 something for Google or waiting for a Googler to make the change internally
 (e.g. dependency updates).

 This code is very similar to the other integrations that exist for
 runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
 on top of an execution engine. There is no super secret awesome stuff as
 this code was already publicly visible in the past when it was part of the
 Google Cloud Dataflow github repo[1].

 Process wise the code will need to get approval from Google to be
 donated and for it to go through the code donation process but before we
 attempt to do that, I was wondering whether the community would object to
 adding this code to the master branch?

 The up side is that people can make breaking changes and fix it for all
 runners. It will also help Googlers contribute more to the portability
 story as it will remove the burden of doing the code import (wasted time)
 and it will allow people to develop in master (can have the whole project
 loaded in a single IDE).

 The downsides are that this will represent more code and unit tests to
 support.

 1:
 https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker

>>>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Dan Halperin
>From my perspective as a (non-Google) community member, huge +1.

I don't see anything bad for the community about open sourcing more of the
probably-most-used runner. While the DirectRunner is probably still the
most referential implementation of Beam, can't hurt to see more working
code. Other runners or runner implementors can refer to this code if they
want, and ignore it if they don't.

In terms of having more code and tests to support, well, that's par for the
course. Will this change make the things that need to be done to support
them more obvious? (E.g., "this PR is blocked because someone at Google on
Dataflow team has to fix something" vs "this PR is blocked because the
Apache Beam code in foo/bar/baz is failing, and anyone who can see the code
can fix it"). The latter seems like a clear win for the community.

(As long as the code donation is handled properly, but that's completely
orthogonal and I have no reason to think it wouldn't be.)

Thanks,
Dan

On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik  wrote:

> Yes, I'm specifically asking the community for opinions as to whether it
> should be accepted or not.
>
> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi  wrote:
>
>> This is terrific!
>>
>> Is thread asking for opinions from the community about if it should be
>> accepted? Assuming Google side decision is made to contribute, big +1 from
>> me to include it next to other runners.
>>
>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:
>>
>>> At Google we have been importing the Apache Beam code base and
>>> integrating it with the Google portion of the codebase that supports the
>>> Dataflow worker. This process is painful as we regularly are making
>>> breaking API changes to support libraries related to running portable
>>> pipelines (and sometimes in other places as well). This has made it
>>> sometimes difficult for PR changes to make changes without either breaking
>>> something for Google or waiting for a Googler to make the change internally
>>> (e.g. dependency updates).
>>>
>>> This code is very similar to the other integrations that exist for
>>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
>>> on top of an execution engine. There is no super secret awesome stuff as
>>> this code was already publicly visible in the past when it was part of the
>>> Google Cloud Dataflow github repo[1].
>>>
>>> Process wise the code will need to get approval from Google to be
>>> donated and for it to go through the code donation process but before we
>>> attempt to do that, I was wondering whether the community would object to
>>> adding this code to the master branch?
>>>
>>> The up side is that people can make breaking changes and fix it for all
>>> runners. It will also help Googlers contribute more to the portability
>>> story as it will remove the burden of doing the code import (wasted time)
>>> and it will allow people to develop in master (can have the whole project
>>> loaded in a single IDE).
>>>
>>> The downsides are that this will represent more code and unit tests to
>>> support.
>>>
>>> 1:
>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>>>
>>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Lukasz Cwik
Yes, I'm specifically asking the community for opinions as to whether it
should be accepted or not.

On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi  wrote:

> This is terrific!
>
> Is thread asking for opinions from the community about if it should be
> accepted? Assuming Google side decision is made to contribute, big +1 from
> me to include it next to other runners.
>
> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:
>
>> At Google we have been importing the Apache Beam code base and
>> integrating it with the Google portion of the codebase that supports the
>> Dataflow worker. This process is painful as we regularly are making
>> breaking API changes to support libraries related to running portable
>> pipelines (and sometimes in other places as well). This has made it
>> sometimes difficult for PR changes to make changes without either breaking
>> something for Google or waiting for a Googler to make the change internally
>> (e.g. dependency updates).
>>
>> This code is very similar to the other integrations that exist for
>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
>> on top of an execution engine. There is no super secret awesome stuff as
>> this code was already publicly visible in the past when it was part of the
>> Google Cloud Dataflow github repo[1].
>>
>> Process wise the code will need to get approval from Google to be donated
>> and for it to go through the code donation process but before we attempt to
>> do that, I was wondering whether the community would object to adding this
>> code to the master branch?
>>
>> The up side is that people can make breaking changes and fix it for all
>> runners. It will also help Googlers contribute more to the portability
>> story as it will remove the burden of doing the code import (wasted time)
>> and it will allow people to develop in master (can have the whole project
>> loaded in a single IDE).
>>
>> The downsides are that this will represent more code and unit tests to
>> support.
>>
>> 1:
>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>>
>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Raghu Angadi
This is terrific!

Is thread asking for opinions from the community about if it should be
accepted? Assuming Google side decision is made to contribute, big +1 from
me to include it next to other runners.

On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:

> At Google we have been importing the Apache Beam code base and integrating
> it with the Google portion of the codebase that supports the Dataflow
> worker. This process is painful as we regularly are making breaking API
> changes to support libraries related to running portable pipelines (and
> sometimes in other places as well). This has made it sometimes difficult
> for PR changes to make changes without either breaking something for Google
> or waiting for a Googler to make the change internally (e.g. dependency
> updates).
>
> This code is very similar to the other integrations that exist for runners
> such as Flink/Spark/Apex/Samza. It is an adaption layer that sits on top of
> an execution engine. There is no super secret awesome stuff as this code
> was already publicly visible in the past when it was part of the Google
> Cloud Dataflow github repo[1].
>
> Process wise the code will need to get approval from Google to be donated
> and for it to go through the code donation process but before we attempt to
> do that, I was wondering whether the community would object to adding this
> code to the master branch?
>
> The up side is that people can make breaking changes and fix it for all
> runners. It will also help Googlers contribute more to the portability
> story as it will remove the burden of doing the code import (wasted time)
> and it will allow people to develop in master (can have the whole project
> loaded in a single IDE).
>
> The downsides are that this will represent more code and unit tests to
> support.
>
> 1:
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>


Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Reuven Lax
There have been multiple scenarios where people changed Beam, and ended up
breaking the Dataflow runner because that code lived in a private
repository. I believe that putting the Dataflow runner code in the public
repository will make it easier and simpler to make changes to Apache Beam.

Reuven

On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik  wrote:

> At Google we have been importing the Apache Beam code base and integrating
> it with the Google portion of the codebase that supports the Dataflow
> worker. This process is painful as we regularly are making breaking API
> changes to support libraries related to running portable pipelines (and
> sometimes in other places as well). This has made it sometimes difficult
> for PR changes to make changes without either breaking something for Google
> or waiting for a Googler to make the change internally (e.g. dependency
> updates).
>
> This code is very similar to the other integrations that exist for runners
> such as Flink/Spark/Apex/Samza. It is an adaption layer that sits on top of
> an execution engine. There is no super secret awesome stuff as this code
> was already publicly visible in the past when it was part of the Google
> Cloud Dataflow github repo[1].
>
> Process wise the code will need to get approval from Google to be donated
> and for it to go through the code donation process but before we attempt to
> do that, I was wondering whether the community would object to adding this
> code to the master branch?
>
> The up side is that people can make breaking changes and fix it for all
> runners. It will also help Googlers contribute more to the portability
> story as it will remove the burden of doing the code import (wasted time)
> and it will allow people to develop in master (can have the whole project
> loaded in a single IDE).
>
> The downsides are that this will represent more code and unit tests to
> support.
>
> 1:
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>


Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Lukasz Cwik
At Google we have been importing the Apache Beam code base and integrating
it with the Google portion of the codebase that supports the Dataflow
worker. This process is painful as we regularly are making breaking API
changes to support libraries related to running portable pipelines (and
sometimes in other places as well). This has made it sometimes difficult
for PR changes to make changes without either breaking something for Google
or waiting for a Googler to make the change internally (e.g. dependency
updates).

This code is very similar to the other integrations that exist for runners
such as Flink/Spark/Apex/Samza. It is an adaption layer that sits on top of
an execution engine. There is no super secret awesome stuff as this code
was already publicly visible in the past when it was part of the Google
Cloud Dataflow github repo[1].

Process wise the code will need to get approval from Google to be donated
and for it to go through the code donation process but before we attempt to
do that, I was wondering whether the community would object to adding this
code to the master branch?

The up side is that people can make breaking changes and fix it for all
runners. It will also help Googlers contribute more to the portability
story as it will remove the burden of doing the code import (wasted time)
and it will allow people to develop in master (can have the whole project
loaded in a single IDE).

The downsides are that this will represent more code and unit tests to
support.

1:
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker