Re: [PROPOSAL] Python 3 support

2018-04-18 Thread Ahmet Altay
Robbe, I added you as a contributor to our JIRA. You should be able to
assign issues to yourself. Board will auto update itself based on the
issues. Give it a try.

On Wed, Apr 18, 2018 at 1:15 AM, Robbe Sneyders 
wrote:

> Thanks!
>
> Can someone give me permission to assign issues to myself?
> And edit rights to the Kanban board?
>
> Robbe
>
> On Tue, 17 Apr 2018 at 22:56 Ahmet Altay  wrote:
>
>> Kanban board for python 3: https://issues.apache.org/
>> jira/secure/RapidBoard.jspa?rapidView=245
>>
>> (Thank you Davor!)
>>
>> Ahmet
>>
>> On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax  wrote:
>>
>>> I had a similar problem.
>>>
>>> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay  wrote:
>>>
 I tried to create a shared kanban board but I failed. I think I am
 lacking some permission to create a shared filter. Could someone help with
 creating this?

 The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
 OR parent = BEAM-1251) ORDER BY Rank ASC"

 Ahmet

 On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
 wrote:

> Hi all,
>
> I don't seem to have the permissions to create a Kanban board or even
> assign tasks to myself. Who could help me with this?
>
> I've updated the coders package pull request [1] and added the applied
> strategy to the proposal document [2].
> It would be great to get some feedback on this, so we can start moving
> forward with other subpackages.
>
> Kind regards,
> Robbe
>
> [1] https://github.com/apache/beam/pull/4990
> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>
> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders 
> wrote:
>
>> Hello Robert,
>>
>> I think a Kanban board on Jira as proposed by Ahmet can be helpful
>> for this. I'll look into setting one up tomorrow.
>>
>> In the meantime, you can find the first pull request with the updated
>> coders package here:
>> https://github.com/apache/beam/pull/4990
>>
>> Kind regards,
>> Robbe
>>
>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw 
>> wrote:
>>
>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <
>>> robbe.sneyd...@ml6.eu> wrote:
>>>
 Thanks Ahmet and Robert,

 I think we can work on different subpackages in parallel, but it's
 important to apply the same strategy everywhere. I'm currently working 
 on
 applying step 1 (was mostly done already) and 2 of the proposal to the
 coders subpackage to create a first pull request. We can then discuss 
 the
 applied strategy in detail before merging and applying it to the other
 subpackages.

>>>
>>> Sounds good. Again, could you document (in a more permanent/easy to
>>> look up state than email) when packages are started/done?
>>>
>>>
 This strategy also includes the choice of automated tools. I'm
 focusing on writing python 3 code with python 2 compatibility, which 
 means
 depending on the future package instead of the six package (which is
 already used in some places in the current code base). I have already
 noticed that this indeed requires a lot of manual work after running 
 the
 automated script.
 The future package supports python 3.3+ compatibility, so I don't
 think there is a higher cost supporting 3.4 compared to 3.5+.

>>>
>>> Sure. It may incur a higher maintenance burden long-term though.
>>> (Basically, if we go out the door with 3.4 it's a promise to support it 
>>> for
>>> some time to come.)
>>>
>>>
 I have already added a tox environment to run pylint2 with the
 --py3k argument per updated subpackage, which should help avoid 
 regression
 between step 2 and step 3 of the proposal. This update will be pushed 
 with
 the first pull request.

 Kind regards,
 Robbe


 On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
 wrote:

> Thank you, Robbie, for your offer to help with contribution here.
> I read over your doc and the one thing I'd like to add is that this 
> work is
> very parallelizable, but if we have enough people looking at it we'll 
> want
> some way to coordinate so as to not overlap work (or just waste time
> discovering what's been done). Tracking individual JIRAs and PRs gets
> unwieldy, perhaps a spreadsheet with modules/packages on one axis and 
> the
> various automated/manual conversions along the other would be helpful?

Re: [PROPOSAL] Python 3 support

2018-04-18 Thread Robbe Sneyders
Thanks!

Can someone give me permission to assign issues to myself?
And edit rights to the Kanban board?

Robbe

On Tue, 17 Apr 2018 at 22:56 Ahmet Altay  wrote:

> Kanban board for python 3:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245
>
> (Thank you Davor!)
>
> Ahmet
>
> On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax  wrote:
>
>> I had a similar problem.
>>
>> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay  wrote:
>>
>>> I tried to create a shared kanban board but I failed. I think I am
>>> lacking some permission to create a shared filter. Could someone help with
>>> creating this?
>>>
>>> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
>>> OR parent = BEAM-1251) ORDER BY Rank ASC"
>>>
>>> Ahmet
>>>
>>> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
>>> wrote:
>>>
 Hi all,

 I don't seem to have the permissions to create a Kanban board or even
 assign tasks to myself. Who could help me with this?

 I've updated the coders package pull request [1] and added the applied
 strategy to the proposal document [2].
 It would be great to get some feedback on this, so we can start moving
 forward with other subpackages.

 Kind regards,
 Robbe

 [1] https://github.com/apache/beam/pull/4990
 [2]
 https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing


 On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders 
 wrote:

> Hello Robert,
>
> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
> this. I'll look into setting one up tomorrow.
>
> In the meantime, you can find the first pull request with the updated
> coders package here:
> https://github.com/apache/beam/pull/4990
>
> Kind regards,
> Robbe
>
> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw 
> wrote:
>
>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
>> wrote:
>>
>>> Thanks Ahmet and Robert,
>>>
>>> I think we can work on different subpackages in parallel, but it's
>>> important to apply the same strategy everywhere. I'm currently working 
>>> on
>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>> coders subpackage to create a first pull request. We can then discuss 
>>> the
>>> applied strategy in detail before merging and applying it to the other
>>> subpackages.
>>>
>>
>> Sounds good. Again, could you document (in a more permanent/easy to
>> look up state than email) when packages are started/done?
>>
>>
>>> This strategy also includes the choice of automated tools. I'm
>>> focusing on writing python 3 code with python 2 compatibility, which 
>>> means
>>> depending on the future package instead of the six package (which is
>>> already used in some places in the current code base). I have already
>>> noticed that this indeed requires a lot of manual work after running the
>>> automated script.
>>> The future package supports python 3.3+ compatibility, so I don't
>>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>>
>>
>> Sure. It may incur a higher maintenance burden long-term though.
>> (Basically, if we go out the door with 3.4 it's a promise to support it 
>> for
>> some time to come.)
>>
>>
>>> I have already added a tox environment to run pylint2 with the
>>> --py3k argument per updated subpackage, which should help avoid 
>>> regression
>>> between step 2 and step 3 of the proposal. This update will be pushed 
>>> with
>>> the first pull request.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>>
>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
>>> wrote:
>>>
 Thank you, Robbie, for your offer to help with contribution here. I
 read over your doc and the one thing I'd like to add is that this work 
 is
 very parallelizable, but if we have enough people looking at it we'll 
 want
 some way to coordinate so as to not overlap work (or just waste time
 discovering what's been done). Tracking individual JIRAs and PRs gets
 unwieldy, perhaps a spreadsheet with modules/packages on one axis and 
 the
 various automated/manual conversions along the other would be helpful?

 A note on automated tools, they're sometimes overly conservative,
 so we should be sure to review the changes manually. (A typical 
 example of
 this is unnecessarily importing six.moves.xrange when there was no big
 reason to use xrange over range in Python 2, or conversely using
 list(range(...) in Python 3.)

 Also, 

Re: [PROPOSAL] Python 3 support

2018-04-17 Thread Ahmet Altay
Kanban board for python 3:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245

(Thank you Davor!)

Ahmet

On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax  wrote:

> I had a similar problem.
>
> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay  wrote:
>
>> I tried to create a shared kanban board but I failed. I think I am
>> lacking some permission to create a shared filter. Could someone help with
>> creating this?
>>
>> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
>> OR parent = BEAM-1251) ORDER BY Rank ASC"
>>
>> Ahmet
>>
>> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
>> wrote:
>>
>>> Hi all,
>>>
>>> I don't seem to have the permissions to create a Kanban board or even
>>> assign tasks to myself. Who could help me with this?
>>>
>>> I've updated the coders package pull request [1] and added the applied
>>> strategy to the proposal document [2].
>>> It would be great to get some feedback on this, so we can start moving
>>> forward with other subpackages.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> [1] https://github.com/apache/beam/pull/4990
>>> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>
>>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders 
>>> wrote:
>>>
 Hello Robert,

 I think a Kanban board on Jira as proposed by Ahmet can be helpful for
 this. I'll look into setting one up tomorrow.

 In the meantime, you can find the first pull request with the updated
 coders package here:
 https://github.com/apache/beam/pull/4990

 Kind regards,
 Robbe

 On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw 
 wrote:

> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
> wrote:
>
>> Thanks Ahmet and Robert,
>>
>> I think we can work on different subpackages in parallel, but it's
>> important to apply the same strategy everywhere. I'm currently working on
>> applying step 1 (was mostly done already) and 2 of the proposal to the
>> coders subpackage to create a first pull request. We can then discuss the
>> applied strategy in detail before merging and applying it to the other
>> subpackages.
>>
>
> Sounds good. Again, could you document (in a more permanent/easy to
> look up state than email) when packages are started/done?
>
>
>> This strategy also includes the choice of automated tools. I'm
>> focusing on writing python 3 code with python 2 compatibility, which 
>> means
>> depending on the future package instead of the six package (which is
>> already used in some places in the current code base). I have already
>> noticed that this indeed requires a lot of manual work after running the
>> automated script.
>> The future package supports python 3.3+ compatibility, so I don't
>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>
>
> Sure. It may incur a higher maintenance burden long-term though.
> (Basically, if we go out the door with 3.4 it's a promise to support it 
> for
> some time to come.)
>
>
>> I have already added a tox environment to run pylint2 with the --py3k
>> argument per updated subpackage, which should help avoid regression 
>> between
>> step 2 and step 3 of the proposal. This update will be pushed with the
>> first pull request.
>>
>> Kind regards,
>> Robbe
>>
>>
>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
>> wrote:
>>
>>> Thank you, Robbie, for your offer to help with contribution here. I
>>> read over your doc and the one thing I'd like to add is that this work 
>>> is
>>> very parallelizable, but if we have enough people looking at it we'll 
>>> want
>>> some way to coordinate so as to not overlap work (or just waste time
>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and 
>>> the
>>> various automated/manual conversions along the other would be helpful?
>>>
>>> A note on automated tools, they're sometimes overly conservative, so
>>> we should be sure to review the changes manually. (A typical example of
>>> this is unnecessarily importing six.moves.xrange when there was no big
>>> reason to use xrange over range in Python 2, or conversely using
>>> list(range(...) in Python 3.)
>>>
>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent
>>> regressions. If there's a cost to supporting 3.4 as opposed to requiring
>>> 3.5+ we should identify it and decide that before widespread 
>>> announcement.
>>>
>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay 
>>> wrote:
>>>


 

Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Reuven Lax
I had a similar problem.

On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay  wrote:

> I tried to create a shared kanban board but I failed. I think I am lacking
> some permission to create a shared filter. Could someone help with creating
> this?
>
> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784 OR
> parent = BEAM-1251) ORDER BY Rank ASC"
>
> Ahmet
>
> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
> wrote:
>
>> Hi all,
>>
>> I don't seem to have the permissions to create a Kanban board or even
>> assign tasks to myself. Who could help me with this?
>>
>> I've updated the coders package pull request [1] and added the applied
>> strategy to the proposal document [2].
>> It would be great to get some feedback on this, so we can start moving
>> forward with other subpackages.
>>
>> Kind regards,
>> Robbe
>>
>> [1] https://github.com/apache/beam/pull/4990
>> [2]
>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>
>>
>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders  wrote:
>>
>>> Hello Robert,
>>>
>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>>> this. I'll look into setting one up tomorrow.
>>>
>>> In the meantime, you can find the first pull request with the updated
>>> coders package here:
>>> https://github.com/apache/beam/pull/4990
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw 
>>> wrote:
>>>
 On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
 wrote:

> Thanks Ahmet and Robert,
>
> I think we can work on different subpackages in parallel, but it's
> important to apply the same strategy everywhere. I'm currently working on
> applying step 1 (was mostly done already) and 2 of the proposal to the
> coders subpackage to create a first pull request. We can then discuss the
> applied strategy in detail before merging and applying it to the other
> subpackages.
>

 Sounds good. Again, could you document (in a more permanent/easy to
 look up state than email) when packages are started/done?


> This strategy also includes the choice of automated tools. I'm
> focusing on writing python 3 code with python 2 compatibility, which means
> depending on the future package instead of the six package (which is
> already used in some places in the current code base). I have already
> noticed that this indeed requires a lot of manual work after running the
> automated script.
> The future package supports python 3.3+ compatibility, so I don't
> think there is a higher cost supporting 3.4 compared to 3.5+.
>

 Sure. It may incur a higher maintenance burden long-term though.
 (Basically, if we go out the door with 3.4 it's a promise to support it for
 some time to come.)


> I have already added a tox environment to run pylint2 with the --py3k
> argument per updated subpackage, which should help avoid regression 
> between
> step 2 and step 3 of the proposal. This update will be pushed with the
> first pull request.
>
> Kind regards,
> Robbe
>
>
> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
> wrote:
>
>> Thank you, Robbie, for your offer to help with contribution here. I
>> read over your doc and the one thing I'd like to add is that this work is
>> very parallelizable, but if we have enough people looking at it we'll 
>> want
>> some way to coordinate so as to not overlap work (or just waste time
>> discovering what's been done). Tracking individual JIRAs and PRs gets
>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>> various automated/manual conversions along the other would be helpful?
>>
>> A note on automated tools, they're sometimes overly conservative, so
>> we should be sure to review the changes manually. (A typical example of
>> this is unnecessarily importing six.moves.xrange when there was no big
>> reason to use xrange over range in Python 2, or conversely using
>> list(range(...) in Python 3.)
>>
>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
>> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we 
>> should
>> identify it and decide that before widespread announcement.
>>
>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>>> wrote:
>>>

 On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
 robbe.sneyd...@ml6.eu> wrote:

> Hi Anand,
>
> Thanks for the feedback.
>
> It should be no problem to run everything on DataflowRunner as

Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Ahmet Altay
I tried to create a shared kanban board but I failed. I think I am lacking
some permission to create a shared filter. Could someone help with creating
this?

The filter I planned to use was "project = BEAM AND (parent = BEAM-2784 OR
parent = BEAM-1251) ORDER BY Rank ASC"

Ahmet

On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
wrote:

> Hi all,
>
> I don't seem to have the permissions to create a Kanban board or even
> assign tasks to myself. Who could help me with this?
>
> I've updated the coders package pull request [1] and added the applied
> strategy to the proposal document [2].
> It would be great to get some feedback on this, so we can start moving
> forward with other subpackages.
>
> Kind regards,
> Robbe
>
> [1] https://github.com/apache/beam/pull/4990
> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>
> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders  wrote:
>
>> Hello Robert,
>>
>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>> this. I'll look into setting one up tomorrow.
>>
>> In the meantime, you can find the first pull request with the updated
>> coders package here:
>> https://github.com/apache/beam/pull/4990
>>
>> Kind regards,
>> Robbe
>>
>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw  wrote:
>>
>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
>>> wrote:
>>>
 Thanks Ahmet and Robert,

 I think we can work on different subpackages in parallel, but it's
 important to apply the same strategy everywhere. I'm currently working on
 applying step 1 (was mostly done already) and 2 of the proposal to the
 coders subpackage to create a first pull request. We can then discuss the
 applied strategy in detail before merging and applying it to the other
 subpackages.

>>>
>>> Sounds good. Again, could you document (in a more permanent/easy to look
>>> up state than email) when packages are started/done?
>>>
>>>
 This strategy also includes the choice of automated tools. I'm focusing
 on writing python 3 code with python 2 compatibility, which means depending
 on the future package instead of the six package (which is already used in
 some places in the current code base). I have already noticed that this
 indeed requires a lot of manual work after running the automated script.
 The future package supports python 3.3+ compatibility, so I don't think
 there is a higher cost supporting 3.4 compared to 3.5+.

>>>
>>> Sure. It may incur a higher maintenance burden long-term though.
>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>> some time to come.)
>>>
>>>
 I have already added a tox environment to run pylint2 with the --py3k
 argument per updated subpackage, which should help avoid regression between
 step 2 and step 3 of the proposal. This update will be pushed with the
 first pull request.

 Kind regards,
 Robbe


 On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
 wrote:

> Thank you, Robbie, for your offer to help with contribution here. I
> read over your doc and the one thing I'd like to add is that this work is
> very parallelizable, but if we have enough people looking at it we'll want
> some way to coordinate so as to not overlap work (or just waste time
> discovering what's been done). Tracking individual JIRAs and PRs gets
> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
> various automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so
> we should be sure to review the changes manually. (A typical example of
> this is unnecessarily importing six.moves.xrange when there was no big
> reason to use xrange over range in Python 2, or conversely using
> list(range(...) in Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>> robbe.sneyd...@ml6.eu> wrote:
>>>
 Hi Anand,

 Thanks for the feedback.

 It should be no problem to run everything on DataflowRunner as well.
 Are there any performance tests in place to check for performance
 regressions?

>>>
>> Yes there is a suite (https://github.com/apache/
>> beam/blob/master/.test-infra/jenkins/job_beam_
>> 

Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Robbe Sneyders
Hi all,

I don't seem to have the permissions to create a Kanban board or even
assign tasks to myself. Who could help me with this?

I've updated the coders package pull request [1] and added the applied
strategy to the proposal document [2].
It would be great to get some feedback on this, so we can start moving
forward with other subpackages.

Kind regards,
Robbe

[1] https://github.com/apache/beam/pull/4990
[2]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing


On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders  wrote:

> Hello Robert,
>
> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
> this. I'll look into setting one up tomorrow.
>
> In the meantime, you can find the first pull request with the updated
> coders package here:
> https://github.com/apache/beam/pull/4990
>
> Kind regards,
> Robbe
>
> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw  wrote:
>
>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
>> wrote:
>>
>>> Thanks Ahmet and Robert,
>>>
>>> I think we can work on different subpackages in parallel, but it's
>>> important to apply the same strategy everywhere. I'm currently working on
>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>> coders subpackage to create a first pull request. We can then discuss the
>>> applied strategy in detail before merging and applying it to the other
>>> subpackages.
>>>
>>
>> Sounds good. Again, could you document (in a more permanent/easy to look
>> up state than email) when packages are started/done?
>>
>>
>>> This strategy also includes the choice of automated tools. I'm focusing
>>> on writing python 3 code with python 2 compatibility, which means depending
>>> on the future package instead of the six package (which is already used in
>>> some places in the current code base). I have already noticed that this
>>> indeed requires a lot of manual work after running the automated script.
>>> The future package supports python 3.3+ compatibility, so I don't think
>>> there is a higher cost supporting 3.4 compared to 3.5+.
>>>
>>
>> Sure. It may incur a higher maintenance burden long-term though.
>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>> some time to come.)
>>
>>
>>> I have already added a tox environment to run pylint2 with the --py3k
>>> argument per updated subpackage, which should help avoid regression between
>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>> first pull request.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>>
>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
>>> wrote:
>>>
 Thank you, Robbie, for your offer to help with contribution here. I
 read over your doc and the one thing I'd like to add is that this work is
 very parallelizable, but if we have enough people looking at it we'll want
 some way to coordinate so as to not overlap work (or just waste time
 discovering what's been done). Tracking individual JIRAs and PRs gets
 unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
 various automated/manual conversions along the other would be helpful?

 A note on automated tools, they're sometimes overly conservative, so we
 should be sure to review the changes manually. (A typical example of this
 is unnecessarily importing six.moves.xrange when there was no big reason to
 use xrange over range in Python 2, or conversely using list(range(...) in
 Python 3.)

 Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
 If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
 identify it and decide that before widespread announcement.

 On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:

>
>
> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
> wrote:
>
>>
>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
>> wrote:
>>
>>> Hi Anand,
>>>
>>> Thanks for the feedback.
>>>
>>> It should be no problem to run everything on DataflowRunner as well.
>>> Are there any performance tests in place to check for performance
>>> regressions?
>>>
>>
> Yes there is a suite (
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
> It may not be very comprehensive and seems to be failing for a while. I
> would not block python 3 work on performance for now. That is the
> unfortuante state of things.
>
> If anybody in the community is interested, this would be a great
> opportunity to help with benchmarks in general.
>
>
>>
>>> Some questions were raised in the proposal document which I want to
>>> add to this conversation:
>>>
>>> The first comment was about 

Re: [PROPOSAL] Python 3 support

2018-04-02 Thread Robbe Sneyders
Hello Robert,

I think a Kanban board on Jira as proposed by Ahmet can be helpful for
this. I'll look into setting one up tomorrow.

In the meantime, you can find the first pull request with the updated
coders package here:
https://github.com/apache/beam/pull/4990

Kind regards,
Robbe

On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw  wrote:

> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
> wrote:
>
>> Thanks Ahmet and Robert,
>>
>> I think we can work on different subpackages in parallel, but it's
>> important to apply the same strategy everywhere. I'm currently working on
>> applying step 1 (was mostly done already) and 2 of the proposal to the
>> coders subpackage to create a first pull request. We can then discuss the
>> applied strategy in detail before merging and applying it to the other
>> subpackages.
>>
>
> Sounds good. Again, could you document (in a more permanent/easy to look
> up state than email) when packages are started/done?
>
>
>> This strategy also includes the choice of automated tools. I'm focusing
>> on writing python 3 code with python 2 compatibility, which means depending
>> on the future package instead of the six package (which is already used in
>> some places in the current code base). I have already noticed that this
>> indeed requires a lot of manual work after running the automated script.
>> The future package supports python 3.3+ compatibility, so I don't think
>> there is a higher cost supporting 3.4 compared to 3.5+.
>>
>
> Sure. It may incur a higher maintenance burden long-term though.
> (Basically, if we go out the door with 3.4 it's a promise to support it for
> some time to come.)
>
>
>> I have already added a tox environment to run pylint2 with the --py3k
>> argument per updated subpackage, which should help avoid regression between
>> step 2 and step 3 of the proposal. This update will be pushed with the
>> first pull request.
>>
>> Kind regards,
>> Robbe
>>
>>
>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw  wrote:
>>
>>> Thank you, Robbie, for your offer to help with contribution here. I read
>>> over your doc and the one thing I'd like to add is that this work is very
>>> parallelizable, but if we have enough people looking at it we'll want some
>>> way to coordinate so as to not overlap work (or just waste time discovering
>>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>>> a spreadsheet with modules/packages on one axis and the various
>>> automated/manual conversions along the other would be helpful?
>>>
>>> A note on automated tools, they're sometimes overly conservative, so we
>>> should be sure to review the changes manually. (A typical example of this
>>> is unnecessarily importing six.moves.xrange when there was no big reason to
>>> use xrange over range in Python 2, or conversely using list(range(...) in
>>> Python 3.)
>>>
>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>> identify it and decide that before widespread announcement.
>>>
>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>>>


 On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
 wrote:

>
> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
> wrote:
>
>> Hi Anand,
>>
>> Thanks for the feedback.
>>
>> It should be no problem to run everything on DataflowRunner as well.
>> Are there any performance tests in place to check for performance
>> regressions?
>>
>
 Yes there is a suite (
 https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
 It may not be very comprehensive and seems to be failing for a while. I
 would not block python 3 work on performance for now. That is the
 unfortuante state of things.

 If anybody in the community is interested, this would be a great
 opportunity to help with benchmarks in general.


>
>> Some questions were raised in the proposal document which I want to
>> add to this conversation:
>>
>> The first comment was about the targeted python 3 versions. We
>> proposed to target 3.6 since it is the latest version available and added
>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>> sources on this though).
>> If the beam community prefers 3.4, I would propose to target 3.4 only
>> during porting and add 3.5 and 3.6 later so we don't slow down the 
>> porting
>> progress. 3.4 has the advantage of already being installed on the workers
>> and allows pySpark pipelines to be moved over to beam more easily.
>> It would be great to get some opinions on this.
>>
>
 My preference is to support 3.4+. I searched a bit on the web to
 understand the usage 

Re: [PROPOSAL] Python 3 support

2018-03-30 Thread Robert Bradshaw
On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
wrote:

> Thanks Ahmet and Robert,
>
> I think we can work on different subpackages in parallel, but it's
> important to apply the same strategy everywhere. I'm currently working on
> applying step 1 (was mostly done already) and 2 of the proposal to the
> coders subpackage to create a first pull request. We can then discuss the
> applied strategy in detail before merging and applying it to the other
> subpackages.
>

Sounds good. Again, could you document (in a more permanent/easy to look up
state than email) when packages are started/done?


> This strategy also includes the choice of automated tools. I'm focusing on
> writing python 3 code with python 2 compatibility, which means depending on
> the future package instead of the six package (which is already used in
> some places in the current code base). I have already noticed that this
> indeed requires a lot of manual work after running the automated script.
> The future package supports python 3.3+ compatibility, so I don't think
> there is a higher cost supporting 3.4 compared to 3.5+.
>

Sure. It may incur a higher maintenance burden long-term though.
(Basically, if we go out the door with 3.4 it's a promise to support it for
some time to come.)


> I have already added a tox environment to run pylint2 with the --py3k
> argument per updated subpackage, which should help avoid regression between
> step 2 and step 3 of the proposal. This update will be pushed with the
> first pull request.
>
> Kind regards,
> Robbe
>
>
> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw  wrote:
>
>> Thank you, Robbie, for your offer to help with contribution here. I read
>> over your doc and the one thing I'd like to add is that this work is very
>> parallelizable, but if we have enough people looking at it we'll want some
>> way to coordinate so as to not overlap work (or just waste time discovering
>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>> a spreadsheet with modules/packages on one axis and the various
>> automated/manual conversions along the other would be helpful?
>>
>> A note on automated tools, they're sometimes overly conservative, so we
>> should be sure to review the changes manually. (A typical example of this
>> is unnecessarily importing six.moves.xrange when there was no big reason to
>> use xrange over range in Python 2, or conversely using list(range(...) in
>> Python 3.)
>>
>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>> identify it and decide that before widespread announcement.
>>
>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>>> wrote:
>>>

 On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
 wrote:

> Hi Anand,
>
> Thanks for the feedback.
>
> It should be no problem to run everything on DataflowRunner as well.
> Are there any performance tests in place to check for performance
> regressions?
>

>>> Yes there is a suite (
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>> It may not be very comprehensive and seems to be failing for a while. I
>>> would not block python 3 work on performance for now. That is the
>>> unfortuante state of things.
>>>
>>> If anybody in the community is interested, this would be a great
>>> opportunity to help with benchmarks in general.
>>>
>>>

> Some questions were raised in the proposal document which I want to
> add to this conversation:
>
> The first comment was about the targeted python 3 versions. We
> proposed to target 3.6 since it is the latest version available and added
> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
> sources on this though).
> If the beam community prefers 3.4, I would propose to target 3.4 only
> during porting and add 3.5 and 3.6 later so we don't slow down the porting
> progress. 3.4 has the advantage of already being installed on the workers
> and allows pySpark pipelines to be moved over to beam more easily.
> It would be great to get some opinions on this.
>

>>> My preference is to support 3.4+. I searched a bit on the web to
>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>> ~20% usage and python 3.4+ has 99% (
>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>> Based on that, I think it makes sense to support it.
>>>
>>>
>>>

> Another comment was made on how to avoid regression during the porting
> progress.
> After applying step 1 and step 2, no python 3 compatibility lint
> warnings should remain, so it would be great 

Re: [PROPOSAL] Python 3 support

2018-03-30 Thread Robbe Sneyders
Thanks Ahmet and Robert,

I think we can work on different subpackages in parallel, but it's
important to apply the same strategy everywhere. I'm currently working on
applying step 1 (was mostly done already) and 2 of the proposal to the
coders subpackage to create a first pull request. We can then discuss the
applied strategy in detail before merging and applying it to the other
subpackages.

This strategy also includes the choice of automated tools. I'm focusing on
writing python 3 code with python 2 compatibility, which means depending on
the future package instead of the six package (which is already used in
some places in the current code base). I have already noticed that this
indeed requires a lot of manual work after running the automated script.
The future package supports python 3.3+ compatibility, so I don't think
there is a higher cost supporting 3.4 compared to 3.5+.

I have already added a tox environment to run pylint2 with the --py3k
argument per updated subpackage, which should help avoid regression between
step 2 and step 3 of the proposal. This update will be pushed with the
first pull request.

Kind regards,
Robbe


On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw  wrote:

> Thank you, Robbie, for your offer to help with contribution here. I read
> over your doc and the one thing I'd like to add is that this work is very
> parallelizable, but if we have enough people looking at it we'll want some
> way to coordinate so as to not overlap work (or just waste time discovering
> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
> a spreadsheet with modules/packages on one axis and the various
> automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so we
> should be sure to review the changes manually. (A typical example of this
> is unnecessarily importing six.moves.xrange when there was no big reason to
> use xrange over range in Python 2, or conversely using list(range(...) in
> Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
>>> wrote:
>>>
 Hi Anand,

 Thanks for the feedback.

 It should be no problem to run everything on DataflowRunner as well.
 Are there any performance tests in place to check for performance
 regressions?

>>>
>> Yes there is a suite (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>> It may not be very comprehensive and seems to be failing for a while. I
>> would not block python 3 work on performance for now. That is the
>> unfortuante state of things.
>>
>> If anybody in the community is interested, this would be a great
>> opportunity to help with benchmarks in general.
>>
>>
>>>
 Some questions were raised in the proposal document which I want to add
 to this conversation:

 The first comment was about the targeted python 3 versions. We proposed
 to target 3.6 since it is the latest version available and added 3.5
 because 3.6 adoption seems rather low (hard to find any relevant sources on
 this though).
 If the beam community prefers 3.4, I would propose to target 3.4 only
 during porting and add 3.5 and 3.6 later so we don't slow down the porting
 progress. 3.4 has the advantage of already being installed on the workers
 and allows pySpark pipelines to be moved over to beam more easily.
 It would be great to get some opinions on this.

>>>
>> My preference is to support 3.4+. I searched a bit on the web to
>> understand the usage statistics for python 3, it seems like python 3.4 has
>> ~20% usage and python 3.4+ has 99% (
>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>> Based on that, I think it makes sense to support it.
>>
>>
>>
>>>
 Another comment was made on how to avoid regression during the porting
 progress.
 After applying step 1 and step 2, no python 3 compatibility lint
 warnings should remain, so it would be great if we could enforce this check
 for every pull request on an already updated subpackage.
 After applying step 3, all tests should run on python 3, so again it
 would be great if we can enforce these per updated subpackage.
 Any insights on how to best accomplish this?

>>> So you can look at some of the recent changes to tox.ini in the git log
>>> to see what we’ve done so far around this I suspect you can repeat that
>>> same pattern.
>>>
>>
>> +1 updating tox.ini and 

Re: [PROPOSAL] Python 3 support

2018-03-29 Thread Robert Bradshaw
Thank you, Robbie, for your offer to help with contribution here. I read
over your doc and the one thing I'd like to add is that this work is very
parallelizable, but if we have enough people looking at it we'll want some
way to coordinate so as to not overlap work (or just waste time discovering
what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
a spreadsheet with modules/packages on one axis and the various
automated/manual conversions along the other would be helpful?

A note on automated tools, they're sometimes overly conservative, so we
should be sure to review the changes manually. (A typical example of this
is unnecessarily importing six.moves.xrange when there was no big reason to
use xrange over range in Python 2, or conversely using list(range(...) in
Python 3.)

Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
identify it and decide that before widespread announcement.

On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:

>
>
> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
> wrote:
>
>>
>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
>> wrote:
>>
>>> Hi Anand,
>>>
>>> Thanks for the feedback.
>>>
>>> It should be no problem to run everything on DataflowRunner as well.
>>> Are there any performance tests in place to check for performance
>>> regressions?
>>>
>>
> Yes there is a suite (
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
> It may not be very comprehensive and seems to be failing for a while. I
> would not block python 3 work on performance for now. That is the
> unfortuante state of things.
>
> If anybody in the community is interested, this would be a great
> opportunity to help with benchmarks in general.
>
>
>>
>>> Some questions were raised in the proposal document which I want to add
>>> to this conversation:
>>>
>>> The first comment was about the targeted python 3 versions. We proposed
>>> to target 3.6 since it is the latest version available and added 3.5
>>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>>> this though).
>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>> progress. 3.4 has the advantage of already being installed on the workers
>>> and allows pySpark pipelines to be moved over to beam more easily.
>>> It would be great to get some opinions on this.
>>>
>>
> My preference is to support 3.4+. I searched a bit on the web to
> understand the usage statistics for python 3, it seems like python 3.4 has
> ~20% usage and python 3.4+ has 99% (
> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
> Based on that, I think it makes sense to support it.
>
>
>
>>
>>> Another comment was made on how to avoid regression during the porting
>>> progress.
>>> After applying step 1 and step 2, no python 3 compatibility lint
>>> warnings should remain, so it would be great if we could enforce this check
>>> for every pull request on an already updated subpackage.
>>> After applying step 3, all tests should run on python 3, so again it
>>> would be great if we can enforce these per updated subpackage.
>>> Any insights on how to best accomplish this?
>>>
>> So you can look at some of the recent changes to tox.ini in the git log
>> to see what we’ve done so far around this I suspect you can repeat that
>> same pattern.
>>
>
> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
> help a lot to prevent regressions.
>
>
>
>>
>>> Thanks,
>>> Robbe
>>>
>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay  wrote:
>>>
 Thank you Robbe.

 I reviewed the document it looks reasonable to me. I will touch on some
 points that were not mentioned:
 - Runner exercise different code paths. Doing auto conversions and
 focusing on DirectRunner is not enough. It is worthwhile to run things on
 DataflowRunner as well. This can be triggered from Jenkins. It will
 validate that we are still compatible for python 2.
 - Similar to above but with an eye on perf regressions.

 For project tracking on JIRA, please feel free to create any new
 issues, close stale ones, or take ownership of any open issues. All JIRAs
 should be assigned to the people actively working on them. If you wan to
 track it in a separate way, you can also propose that. (For example a
 kanban board is used for portability effort which is fully supported in
 JIRA.)

 I will also call out to a few other people in addition to Holden who
 helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
 @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
 people (and myself) 

Re: [PROPOSAL] Python 3 support

2018-03-27 Thread Ahmet Altay
On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau  wrote:

>
> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
> wrote:
>
>> Hi Anand,
>>
>> Thanks for the feedback.
>>
>> It should be no problem to run everything on DataflowRunner as well.
>> Are there any performance tests in place to check for performance
>> regressions?
>>
>
Yes there is a suite (
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
It may not be very comprehensive and seems to be failing for a while. I
would not block python 3 work on performance for now. That is the
unfortuante state of things.

If anybody in the community is interested, this would be a great
opportunity to help with benchmarks in general.


>
>> Some questions were raised in the proposal document which I want to add
>> to this conversation:
>>
>> The first comment was about the targeted python 3 versions. We proposed
>> to target 3.6 since it is the latest version available and added 3.5
>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>> this though).
>> If the beam community prefers 3.4, I would propose to target 3.4 only
>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>> progress. 3.4 has the advantage of already being installed on the workers
>> and allows pySpark pipelines to be moved over to beam more easily.
>> It would be great to get some opinions on this.
>>
>
My preference is to support 3.4+. I searched a bit on the web to understand
the usage statistics for python 3, it seems like python 3.4 has ~20% usage
and python 3.4+ has 99% (
https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
Based on that, I think it makes sense to support it.



>
>> Another comment was made on how to avoid regression during the porting
>> progress.
>> After applying step 1 and step 2, no python 3 compatibility lint warnings
>> should remain, so it would be great if we could enforce this check for
>> every pull request on an already updated subpackage.
>> After applying step 3, all tests should run on python 3, so again it
>> would be great if we can enforce these per updated subpackage.
>> Any insights on how to best accomplish this?
>>
> So you can look at some of the recent changes to tox.ini in the git log to
> see what we’ve done so far around this I suspect you can repeat that same
> pattern.
>

+1 updating tox.ini and adding new checks to run_mini_py3lint.sh would help
a lot to prevent regressions.



>
>> Thanks,
>> Robbe
>>
>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay  wrote:
>>
>>> Thank you Robbe.
>>>
>>> I reviewed the document it looks reasonable to me. I will touch on some
>>> points that were not mentioned:
>>> - Runner exercise different code paths. Doing auto conversions and
>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>> validate that we are still compatible for python 2.
>>> - Similar to above but with an eye on perf regressions.
>>>
>>> For project tracking on JIRA, please feel free to create any new issues,
>>> close stale ones, or take ownership of any open issues. All JIRAs should be
>>> assigned to the people actively working on them. If you wan to track it in
>>> a separate way, you can also propose that. (For example a kanban board is
>>> used for portability effort which is fully supported in JIRA.)
>>>
>>> I will also call out to a few other people in addition to Holden who
>>> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
>>> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
>>> people (and myself) for reviews and other questions that you have.
>>>
>>> Welcome again, and looking forward to your contributions.
>>>
>>> Thank you,
>>> Ahmet
>>>
>>>
>>>
>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders 
>>> wrote:
>>>
 Hello everyone,

 In the next month(s), me and my colleague Matthias will commit a lot of
 time and effort to python 3 support for beam and we would like to discuss
 the best way to go forward with this.

 We have drawn up a document [1] with a high level outline of the
 proposed approach and would like to get your feedback on this.

 The main Jira issue [2] for python 3 support has been mostly inactive
 for the past year. Other smaller issues have been opened, but it's hard to
 track the general progress. It would be great if anyone could offer some
 insights on how to best handle this project on Jira.

 @Holden Karau, you seem to have already put in a lot of effort to add
 python 3 support, so it would be great to get your insights and find a way
 to merge our efforts.

 Kind regards,
 Robbe

 [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
 

Re: [PROPOSAL] Python 3 support

2018-03-27 Thread Holden Karau
On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
wrote:

> Hi Anand,
>
> Thanks for the feedback.
>
> It should be no problem to run everything on DataflowRunner as well.
> Are there any performance tests in place to check for performance
> regressions?
>
> Some questions were raised in the proposal document which I want to add to
> this conversation:
>
> The first comment was about the targeted python 3 versions. We proposed to
> target 3.6 since it is the latest version available and added 3.5 because
> 3.6 adoption seems rather low (hard to find any relevant sources on this
> though).
> If the beam community prefers 3.4, I would propose to target 3.4 only
> during porting and add 3.5 and 3.6 later so we don't slow down the porting
> progress. 3.4 has the advantage of already being installed on the workers
> and allows pySpark pipelines to be moved over to beam more easily.
> It would be great to get some opinions on this.
>
> Another comment was made on how to avoid regression during the porting
> progress.
> After applying step 1 and step 2, no python 3 compatibility lint warnings
> should remain, so it would be great if we could enforce this check for
> every pull request on an already updated subpackage.
> After applying step 3, all tests should run on python 3, so again it would
> be great if we can enforce these per updated subpackage.
> Any insights on how to best accomplish this?
>
So you can look at some of the recent changes to tox.ini in the git log to
see what we’ve done so far around this I suspect you can repeat that same
pattern.

>
> Thanks,
> Robbe
>
> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay  wrote:
>
>> Thank you Robbe.
>>
>> I reviewed the document it looks reasonable to me. I will touch on some
>> points that were not mentioned:
>> - Runner exercise different code paths. Doing auto conversions and
>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>> DataflowRunner as well. This can be triggered from Jenkins. It will
>> validate that we are still compatible for python 2.
>> - Similar to above but with an eye on perf regressions.
>>
>> For project tracking on JIRA, please feel free to create any new issues,
>> close stale ones, or take ownership of any open issues. All JIRAs should be
>> assigned to the people actively working on them. If you wan to track it in
>> a separate way, you can also propose that. (For example a kanban board is
>> used for portability effort which is fully supported in JIRA.)
>>
>> I will also call out to a few other people in addition to Holden who
>> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
>> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
>> people (and myself) for reviews and other questions that you have.
>>
>> Welcome again, and looking forward to your contributions.
>>
>> Thank you,
>> Ahmet
>>
>>
>>
>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> In the next month(s), me and my colleague Matthias will commit a lot of
>>> time and effort to python 3 support for beam and we would like to discuss
>>> the best way to go forward with this.
>>>
>>> We have drawn up a document [1] with a high level outline of the
>>> proposed approach and would like to get your feedback on this.
>>>
>>> The main Jira issue [2] for python 3 support has been mostly inactive
>>> for the past year. Other smaller issues have been opened, but it's hard to
>>> track the general progress. It would be great if anyone could offer some
>>> insights on how to best handle this project on Jira.
>>>
>>> @Holden Karau, you seem to have already put in a lot of effort to add
>>> python 3 support, so it would be great to get your insights and find a way
>>> to merge our efforts.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> [1]
>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>
>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>> --
>>>
>>> [image: https://ml6.eu] 
>>>
>>> * Robbe Sneyders*
>>>
>>> ML6 Gent
>>> 
>>>
>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>
>>
>> --
>
> [image: https://ml6.eu] 
>
> * Robbe Sneyders*
>
> ML6 Gent
> 
>
> M: +32 474 71 31 08
>
-- 
Twitter: https://twitter.com/holdenkarau


Re: [PROPOSAL] Python 3 support

2018-03-27 Thread Robbe Sneyders
Hi Anand,

Thanks for the feedback.

It should be no problem to run everything on DataflowRunner as well.
Are there any performance tests in place to check for performance
regressions?

Some questions were raised in the proposal document which I want to add to
this conversation:

The first comment was about the targeted python 3 versions. We proposed to
target 3.6 since it is the latest version available and added 3.5 because
3.6 adoption seems rather low (hard to find any relevant sources on this
though).
If the beam community prefers 3.4, I would propose to target 3.4 only
during porting and add 3.5 and 3.6 later so we don't slow down the porting
progress. 3.4 has the advantage of already being installed on the workers
and allows pySpark pipelines to be moved over to beam more easily.
It would be great to get some opinions on this.

Another comment was made on how to avoid regression during the porting
progress.
After applying step 1 and step 2, no python 3 compatibility lint warnings
should remain, so it would be great if we could enforce this check for
every pull request on an already updated subpackage.
After applying step 3, all tests should run on python 3, so again it would
be great if we can enforce these per updated subpackage.
Any insights on how to best accomplish this?

Thanks,
Robbe

On Fri, 23 Mar 2018 at 19:59 Ahmet Altay  wrote:

> Thank you Robbe.
>
> I reviewed the document it looks reasonable to me. I will touch on some
> points that were not mentioned:
> - Runner exercise different code paths. Doing auto conversions and
> focusing on DirectRunner is not enough. It is worthwhile to run things on
> DataflowRunner as well. This can be triggered from Jenkins. It will
> validate that we are still compatible for python 2.
> - Similar to above but with an eye on perf regressions.
>
> For project tracking on JIRA, please feel free to create any new issues,
> close stale ones, or take ownership of any open issues. All JIRAs should be
> assigned to the people actively working on them. If you wan to track it in
> a separate way, you can also propose that. (For example a kanban board is
> used for portability effort which is fully supported in JIRA.)
>
> I will also call out to a few other people in addition to Holden who
> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
> people (and myself) for reviews and other questions that you have.
>
> Welcome again, and looking forward to your contributions.
>
> Thank you,
> Ahmet
>
>
>
> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders 
> wrote:
>
>> Hello everyone,
>>
>> In the next month(s), me and my colleague Matthias will commit a lot of
>> time and effort to python 3 support for beam and we would like to discuss
>> the best way to go forward with this.
>>
>> We have drawn up a document [1] with a high level outline of the proposed
>> approach and would like to get your feedback on this.
>>
>> The main Jira issue [2] for python 3 support has been mostly inactive for
>> the past year. Other smaller issues have been opened, but it's hard to
>> track the general progress. It would be great if anyone could offer some
>> insights on how to best handle this project on Jira.
>>
>> @Holden Karau, you seem to have already put in a lot of effort to add
>> python 3 support, so it would be great to get your insights and find a way
>> to merge our efforts.
>>
>> Kind regards,
>> Robbe
>>
>> [1]
>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>
>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>> --
>>
>> [image: https://ml6.eu] 
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> 
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
>
> --

[image: https://ml6.eu] 

* Robbe Sneyders*

ML6 Gent


M: +32 474 71 31 08


Re: [PROPOSAL] Python 3 support

2018-03-23 Thread Ahmet Altay
Thank you Robbe.

I reviewed the document it looks reasonable to me. I will touch on some
points that were not mentioned:
- Runner exercise different code paths. Doing auto conversions and focusing
on DirectRunner is not enough. It is worthwhile to run things on
DataflowRunner as well. This can be triggered from Jenkins. It will
validate that we are still compatible for python 2.
- Similar to above but with an eye on perf regressions.

For project tracking on JIRA, please feel free to create any new issues,
close stale ones, or take ownership of any open issues. All JIRAs should be
assigned to the people actively working on them. If you wan to track it in
a separate way, you can also propose that. (For example a kanban board is
used for portability effort which is fully supported in JIRA.)

I will also call out to a few other people in addition to Holden who helped
out or showed interest in helping with Python 3. @cclaus, @luke-zhu, @udim,
@robertwb, @charlesccychen, @tvalentyn. You can include these people (and
myself) for reviews and other questions that you have.

Welcome again, and looking forward to your contributions.

Thank you,
Ahmet



On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders 
wrote:

> Hello everyone,
>
> In the next month(s), me and my colleague Matthias will commit a lot of
> time and effort to python 3 support for beam and we would like to discuss
> the best way to go forward with this.
>
> We have drawn up a document [1] with a high level outline of the proposed
> approach and would like to get your feedback on this.
>
> The main Jira issue [2] for python 3 support has been mostly inactive for
> the past year. Other smaller issues have been opened, but it's hard to
> track the general progress. It would be great if anyone could offer some
> insights on how to best handle this project on Jira.
>
> @Holden Karau, you seem to have already put in a lot of effort to add
> python 3 support, so it would be great to get your insights and find a way
> to merge our efforts.
>
> Kind regards,
> Robbe
>
> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
> [2] https://issues.apache.org/jira/browse/BEAM-1251
> --
>
> [image: https://ml6.eu] 
>
> * Robbe Sneyders*
>
> ML6 Gent
> 
>
> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>