Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Robbe Sneyders
Congrats Valentyn!

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


On Tue, 27 Aug 2019 at 09:26, Gleb Kanterov  wrote:

> Congratulations Valentyn!
>
> On Tue, Aug 27, 2019 at 7:22 AM jincheng sun 
> wrote:
>
>> Congrats Valentyn!
>>
>> Best,
>> Jincheng
>>
>> Ankur Goenka  于2019年8月27日周二 上午10:37写道:
>>
>>> Congratulations Valentyn!
>>>
>>> On Mon, Aug 26, 2019, 5:02 PM Yifan Zou  wrote:
>>>
>>>> Congratulations, Valentyn! Well deserved!
>>>>
>>>> On Mon, Aug 26, 2019 at 3:31 PM Aizhamal Nurmamat kyzy <
>>>> aizha...@google.com> wrote:
>>>>
>>>>> Congratulations! and thank you for your contributions, Valentyn!
>>>>>
>>>>> On Mon, Aug 26, 2019 at 3:26 PM Thomas Weise  wrote:
>>>>>
>>>>>> Congrats!
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 26, 2019 at 3:22 PM Heejong Lee 
>>>>>> wrote:
>>>>>>
>>>>>>> Congratulations! :)
>>>>>>>
>>>>>>> On Mon, Aug 26, 2019 at 2:44 PM Rui Wang  wrote:
>>>>>>>
>>>>>>>> Congratulations!
>>>>>>>>
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang <
>>>>>>>> hannahji...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Congratulations Valentyn, well deserved!
>>>>>>>>>
>>>>>>>>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
>>>>>>>>> chamik...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Congrats Valentyn!
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Valentyn!
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you Valentyn! Congratulations!
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>>>>>>>>> committer: Valentyn Tymofieiev
>>>>>>>>>>>>>
>>>>>>>>>>>>> Valentyn has made numerous contributions to Beam over the last
>>>>>>>>>>>>> several
>>>>>>>>>>>>> years (including 100+ pull requests), most recently pushing
>>>>>>>>>>>>> through
>>>>>>>>>>>>> the effort to make Beam compatible with Python 3. He is also
>>>>>>>>>>>>> an active
>>>>>>>>>>>>> participant in design discussions on the list, participates in
>>>>>>>>>>>>> release
>>>>>>>>>>>>> candidate validation, and proactively helps keep our tests
>>>>>>>>>>>>> green.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In consideration of Valentyn's contributions, the Beam PMC
>>>>>>>>>>>>> trusts him
>>>>>>>>>>>>> with the responsibilities of a Beam committer [1].
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you, Valentyn, for your contributions and looking
>>>>>>>>>>>>> forward to many more!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Robert, on behalf of the Apache Beam PMC
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>>>>>>>>
>>>>>>>>>>>>
>
> --
> Cheers,
> Gleb
>


Re: Deprecating Avro for fastavro on Python 3

2019-04-02 Thread Robbe Sneyders
Hi all,

Thank you for the feedback. Looking at the responses, it seems like there
is a consensus to move forward with fastavro as the default implementation
on Python 3.

There are 2 questions left however:
- Should fastavro also become the default implementation on Python 2?
This is a trade-off between having a consistent API across Python versions,
or keeping the current behavior on Python 2.

- Should we keep the avro-python3 dependency?
With the proposed solution, we could remove the avro-python3 dependency,
but it might have to be re-added if we want to support Avro again on Python
3 in a future version.

Kind regards,
Robbe

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


On Thu, 28 Mar 2019 at 18:28, Ahmet Altay  wrote:

> Hi Ismaël,
>
> It is great to hear that Avro is planning to make a release soon.
>
> To answer your concerns, fastavro has a set of tests using regular avro
> files[1] and it also has a large set of users (with 675470 package
> downloads). This is in addition to it being a py2 & py3 compatible package
> and offering ~7x performance improvements [2]. Another data point, we were
> testing fastavro for a while behind an experimental flag and have not seen
> issues related compatibility.
>
> pyavro-rs sounds promising however I could not find a released version of
> it on pypi. The source code does not look like being maintained either with
> last commit on Jul 2, 2018. (for comparison last change on fastavro was on
> Mar 19, 2019).
>
> I think given the state of things, it makes sense to switch to fastavro as
> the default implementation to unblock python 3 changes. When avro offers a
> similar level of performance we could switch back without any visible user
> impact.
>
> Ahmet
>
> [1] https://github.com/fastavro/fastavro/tree/master/tests
> [2] https://pypi.org/project/fastavro/
>
> On Thu, Mar 28, 2019 at 7:53 AM Ismaël Mejía  wrote:
>
>> Hello,
>>
>> The problem of switching implementations is the risk of losing
>> interoperability, and this is more important than performance. Does
>> fastavro have tests that guarantee that it is fully compatible with
>> Avro’s Java version? (given that it is the de-facto implementation
>> used everywhere).
>>
>> If performance is a more important criteria maybe it is worth to check
>> at pyavro-rs [1], you can take a look at its performance in the great
>> talk of last year [2].
>>
>> I have been involved actively in the Avro community in the last months
>> and I am now a committer there. Also Dan Kulp who has done multiple
>> contributions in Beam is now a PMC member too. We are at this point
>> working hard to get the next release of Avro out, actually the branch
>> cut of Avro 1.9.0 is happening this week, and we plan to improve the
>> release cadence. Please understand that the issue with Avro is that it
>> is a really specific and ‘old‘ project (~10 years) so part of the
>> active moved to other areas because it is stable, but we are still
>> there working on it and we are eager to improve it for everyone’s
>> needs (and of course Beam needs).
>>
>> I know that Python 3’s Avro implementation is still lacking and could
>> be improved (views expressed here are clearly valid), but maybe this
>> is a chance to contribute there too. Remember Apache projects are a
>> family and we have a history of cross colaboration with other
>> communities e.g. Flink, Calcite so why not give it a chance to Avro
>> too.
>>
>> Regards,
>> Ismaël
>>
>> [1] https://github.com/flavray/pyavro-rs
>> [2]
>> https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf
>>
>> On Wed, Mar 27, 2019 at 11:42 PM Chamikara Jayalath
>>  wrote:
>> >
>> > +1 for making use_fastavro the default for Python3. I don't see any
>> significant drawbacks in doing this from Beam's point of view. One concern
>> is whether avro and fastavro can safely co-exist in the same environment so
>> that Beam continues to work for users who already have avro library
>> installed.
>> >
>> > Note that there are two use_fastavro flags (confusingly enough).
>> > (1) for avro file source [1]
>> > (2) an experiment flag [2] with the same name that makes Dataflow
>> runner use fastavro library for reading/writing intermediate files and for
>> reading Avro files exported by BigQuery.
>> >
>>

Deprecating Avro for fastavro on Python 3

2019-03-27 Thread Robbe Sneyders
Hi all,

We're looking at fixing avroio on Python 3, which still fails due to a
non-picklable schema class in Avro [1]. This is fixed when using the latest
Avro master, but the last release dates back to May 2017.

Fastavro does not have the same problem, but is currently also failing due
to a dependency of avroio on Avro for schema parsing.

We would therefore propose to (temporarily?) deprecate Avro on Python 3,
and implement a pure fastavro solution instead. +Frederik Bode
  already submitted a PR for this [2].

Use of fastavro is currently activated with the `use_fastavro` flag, which
defaults to False. Since this flag would not make sense anymore on Python
3, we would like to switch the default value to True. The documentation
already mentions that this will probably become the default on the long
term, but this change would also impact Python 2. Is this a problem?

Also, looking at the performance gain of fastavro, is there any reason to
not deprecate Avro in favor of fastavro on Python 3 indefinitely?

[1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499
[2] https://github.com/apache/beam/pull/8130

Kind regards,
Robbe


Re: Python precommit duration is above 1hr

2019-03-10 Thread Robbe Sneyders
Yes, this is largely due to the addition of Python 3 test suites.

Running tests in parallel is actively being investigated by +Mark Liu
 in this Jira ticket [1] and this PR [2]. We will add
other Python 3.6 and 3.7 test suites only to postcommit until then.

[1] https://issues.apache.org/jira/browse/BEAM-6527
[2] https://github.com/apache/beam/pull/7675

Kind regards,
Robbe

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


On Sat, 9 Mar 2019 at 20:22, Robert Bradshaw  wrote:

> Perhaps this is the duplication of all (or at least most) previously
> existing tests for running under Python 3. I agree that this is excessive;
> we should probably split out Py2, Py3, and the linters into separate
>  targets.
>
> We could look into using detox or retox to parallelize the testing as
> well. (The issue last time was suppression of output on timeout, but that
> can be worked around by adding timeouts to the individual tox targets.)
>
> On Fri, Mar 8, 2019 at 11:26 PM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> Seems that our python pre-commits grow up in time really fast
>> <http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1=now-6M=now>
>> .
>>
>> Did anyone follow trend or know what are the biggest changes that
>> happened with python lately?
>>
>> I don't see a single jump, but duration of pre-commits almost doubled
>> since new year.
>>
>> [image: image.png]
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>


Re: Python 3: final step

2019-01-09 Thread Robbe Sneyders
Hi all,

We've been making quite some progress these last weeks. I'll give a short
status update on where we are right now.

Our current goal is to make all unittests succeed in Python 3. We are
currently at:
1842 tests: (SKIP=350, errors=100, failures=9)

All of these remaining errors and failures are in the io and examples
packages, except for some non-blocking failing typehints tests. All other
packages have been ported. We're currently working on porting the io
package, and have finished all core modules (iobase, filesystemio,
filebasedsource/sink, textio, ...). This allowed us to fix the wordcount
end to end examples, which now run successfully on Python 3! [1]

Next up are all other io sources and sinks. Most of the skipped tests are
due to missing GCP components in the Python 3 test suite. We will add a
separate GCP test suite for Python 3 when we start porting the first GCP io
module, which should be very soon.

Anyone who wants to help out, can start with porting one of these
sources/sinks.

Kind regards,
Robbe

[1] https://github.com/apache/beam/pull/7447

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


On Tue, 8 Jan 2019 at 04:58, Ahmet Altay  wrote:

> +Matthias Feys  +Valentyn Tymofieiev
>  +Mark Liu  could add more
> details here since they are working on Python 3 for a while now.
>
> Hopeful state is that we might have python 3 working with DirectRunner in
> the release after this one (2.12). Mark is also working on getting python 3
> working on a cluster. He has been able to run WordCount on Dataflow service
> with some hacks but it was not yet ready to run out of the box. I would
> like to note that we are targeting python 3 support only for portable
> runners so running on Dataflow and Flink should happen at the same time.
>
> Matthias and Valentyn are still working on converting the SDK to be Python
> 3 compatible. They are now mostly dealing with harder to convert parts of
> the SDK. (e.g. Parts were python 2/3 differences results in performance
> regressions or subtle changes in the behavior).
>
> To the folks working on this, it would be really helpful if you could
> update BEAM-1251 regularly. We have shared this issue with many people and
> not all of them will read this thread.
>
> Ahmet
>
> On Mon, Jan 7, 2019 at 8:15 AM Maximilian Michels  wrote:
>
>> Also curious because I see Python 3 requests quite often. I always say,
>> we're
>> close, but how close are we? :)
>>
>> Thanks,
>> Max
>>
>> On 05.01.19 00:03, Manu Zhang wrote:
>> > Guys,
>> >
>> > Happy New Year !!!
>> > I haven't got much time to contribute to Python 3 support. What is the
>> progress
>> > now ? It seems there are quite a few open issues under
>> > https://issues.apache.org/jira/browse/BEAM-1251. People have kept
>> asking about
>> > Python 3 support in tf.transform
>> > (https://github.com/tensorflow/transform/issues/1) which is blocked by
>> BEAM-1251.
>> >
>> > Thanks,
>> > Manu Zhang
>> >
>> >
>> > On Fri, Oct 12, 2018 at 3:17 AM Valentyn Tymofieiev <
>> valen...@google.com
>> > <mailto:valen...@google.com>> wrote:
>> >
>> > I cc'ed a few folks who are familiar with Jenkins setup on
>> > https://issues.apache.org/jira/browse/BEAM-5663, I think we can
>> continue the
>> > discussion there or start a separate thread.
>> >
>> > On Wed, Oct 10, 2018 at 8:54 PM Manu Zhang > > <mailto:owenzhang1...@gmail.com>> wrote:
>> >
>> > Does anyone know how to set up python version on Jenkins ? It’s
>> Python
>> > 3.5.2 now.
>> >
>> > Thanks,
>> > Manu Zhang
>> > On Oct 5, 2018, 9:24 AM +0800, Valentyn Tymofieiev <
>> valen...@google.com
>> > <mailto:valen...@google.com>>, wrote:
>> >> I have put together a guide [1] to help get started with
>> investigating
>> >> Python 3-related test failures that may be helpful for new
>> folks
>> >> joining the effort.
>> >>
>> >> Comments and improvements welcome!
>> >>
>> >> Thanks,
>> >> Valentyn
>> >> [1]
>> >>
>> https://docs.google.com/document/d/1s1BJVCY65LB_SYK1SU1u7NbZiFANoq-nEYaEvzRbYlA
>> >>
>> >>
>> >> On 

Python 3: final step

2018-09-05 Thread Robbe Sneyders
Hi everyone,

With the merging of [1], we now have Python 3 tests running on Jenkins,
which allows us to move forward with the last step of the Python 3 porting.

You can follow the progress on the Jira Kanban Board [2]. If you're
interested in helping by porting a module, you can assign one of the issues
to yourself and start coding. You can find the different steps outlined in
the design document [3].

We could also use some extra reviewers. If you're interested, let us know,
and we'll tag you in our PRs.

[1] https://github.com/apache/beam/pull/6266
[2] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245
[3] https://s.apache.org/beam-python-3

kind regards,
Robbe
-- 

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


Re: Python 3 support in the Python SDK

2018-07-04 Thread Robbe Sneyders
Hi Sergei,

We're currently finishing the futurization step. We have some open PRs,
which I see you've started reviewing. Thanks for that!

As Charles said, we will start working on the Beam Python 3 tests when the
futurization step is finished completely. We're planning to start with the
coders package again, and define a strategy to apply to all packages. I
will tag you in the related PR for discussion, and then we can coordinate
the remaining work.

Kind regards,
Robbe

On Mon, 2 Jul 2018 at 22:57 Sergei Lebedev  wrote:

> Hi Charles,
>
> Thanks for the heads up. Looking at BEAM-2784, most of the sub-tickets are
> either DONE or IN PROGRESS, meaning that the futurization is almost
> finished, right? Should I wait a bit, and then help to port/debug the test
> code?
>
> Sergei
>
> On Mon, Jul 2, 2018 at 10:43 PM Charles Chen  wrote:
>
>> Hi Sergei,
>>
>> Matthias and Robbe are actively working on this support.  Their plan is
>> to futurize all relevant modules and then work on Beam Python 3 tests; this
>> is being tracked in https://issues.apache.org/jira/browse/BEAM-2784 and
>> I added https://issues.apache.org/jira/browse/BEAM-4715 as well.  We can
>> use https://issues.apache.org/jira/browse/BEAM-1251 to coordinate, as
>> you suggest.
>>
>> Best,
>> Charles
>>
>> On Mon, Jul 2, 2018 at 1:29 PM Sergei Lebedev 
>> wrote:
>>
>>> Hello,
>>>
>>> The Beam Python SDK does not currently support Python 3. This limits
>>> the use of Beam itself, as well as some other projects depending on it
>>> (e.g. TensorFlow Model Analysis [1]).
>>>
>>> There is an ongoing effort on making the SDK Python 3-compatible (see
>>> e.g. [2]). However, there is no up-to-date roadmap listing all the parts
>>> involved and the corresponding status. Therefore my question: what would be
>>> a good way to coordinate the work? Should I polish the umbrella ticket [3]
>>> and do status updates there?
>>>
>>> I'd be happy to discuss this further either on the #beam-python Slack
>>> channel, or directly on the mailing list.
>>>
>>> Regards,
>>> Sergei
>>>
>>> [1]: https://github.com/tensorflow/model-analysis/issues/8
>>> [2]:
>>> https://github.com/apache/beam/pulls?utf8=%E2%9C%93=is%3Aopen+is%3Apr+%22python+3%22
>>> [3]: https://issues.apache.org/jira/browse/BEAM-1251
>>>
>> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


Looking for contributors for Python 3 support

2018-05-11 Thread Robbe Sneyders
Hello everyone,

We have started adding Python 3 support to Beam. It took a while to get the
best approach sorted out, but the first PR [1] has been merged and we're
ready to start working on additional subpackages in parallel. We would like
to prevent regression as much as possible, so any help to speed up the
process is appreciated!

A Kanban board with the open issues can be found at [2]. If you're
interested to help, you can select a subpackage to port and assign yourself
the corresponding issue (or comment on the issue if you cannot assign
yourself).

The used approach is documented in [3] . We are currently working on step 2.

When submitting a new PR, you can tag me (@RobbeSneyders), @aaltay,
and @tvalentyn.

Thanks!
Robbe

[1] https://github.com/apache/beam/pull/5053
[2]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail=BEAM-3058

[3]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE
<https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit>

-- 

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


Re: [PROPOSAL] Python 3 support

2018-04-18 Thread Robbe Sneyders
Thanks!

Can someone give me permission to assign issues to myself?
And edit rights to the Kanban board?

Robbe

On Tue, 17 Apr 2018 at 22:56 Ahmet Altay <al...@google.com> wrote:

> Kanban board for python 3:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245
>
> (Thank you Davor!)
>
> Ahmet
>
> On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax <re...@google.com> wrote:
>
>> I had a similar problem.
>>
>> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> I tried to create a shared kanban board but I failed. I think I am
>>> lacking some permission to create a shared filter. Could someone help with
>>> creating this?
>>>
>>> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
>>> OR parent = BEAM-1251) ORDER BY Rank ASC"
>>>
>>> Ahmet
>>>
>>> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I don't seem to have the permissions to create a Kanban board or even
>>>> assign tasks to myself. Who could help me with this?
>>>>
>>>> I've updated the coders package pull request [1] and added the applied
>>>> strategy to the proposal document [2].
>>>> It would be great to get some feedback on this, so we can start moving
>>>> forward with other subpackages.
>>>>
>>>> Kind regards,
>>>> Robbe
>>>>
>>>> [1] https://github.com/apache/beam/pull/4990
>>>> [2]
>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>
>>>>
>>>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>>> wrote:
>>>>
>>>>> Hello Robert,
>>>>>
>>>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>>>>> this. I'll look into setting one up tomorrow.
>>>>>
>>>>> In the meantime, you can find the first pull request with the updated
>>>>> coders package here:
>>>>> https://github.com/apache/beam/pull/4990
>>>>>
>>>>> Kind regards,
>>>>> Robbe
>>>>>
>>>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ahmet and Robert,
>>>>>>>
>>>>>>> I think we can work on different subpackages in parallel, but it's
>>>>>>> important to apply the same strategy everywhere. I'm currently working 
>>>>>>> on
>>>>>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>>>>>> coders subpackage to create a first pull request. We can then discuss 
>>>>>>> the
>>>>>>> applied strategy in detail before merging and applying it to the other
>>>>>>> subpackages.
>>>>>>>
>>>>>>
>>>>>> Sounds good. Again, could you document (in a more permanent/easy to
>>>>>> look up state than email) when packages are started/done?
>>>>>>
>>>>>>
>>>>>>> This strategy also includes the choice of automated tools. I'm
>>>>>>> focusing on writing python 3 code with python 2 compatibility, which 
>>>>>>> means
>>>>>>> depending on the future package instead of the six package (which is
>>>>>>> already used in some places in the current code base). I have already
>>>>>>> noticed that this indeed requires a lot of manual work after running the
>>>>>>> automated script.
>>>>>>> The future package supports python 3.3+ compatibility, so I don't
>>>>>>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>>>>>>
>>>>>>
>>>>>> Sure. It may incur a higher maintenance burden long-term though.
>>>>>> (Basically, if we go out the door with 3.4 it's a promise to support it 
>>>>>> for
>>>>>> some time to come.)
>>>>>>
>>>>>>
>>>>>>> I ha

Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Robbe Sneyders
Hi all,

I don't seem to have the permissions to create a Kanban board or even
assign tasks to myself. Who could help me with this?

I've updated the coders package pull request [1] and added the applied
strategy to the proposal document [2].
It would be great to get some feedback on this, so we can start moving
forward with other subpackages.

Kind regards,
Robbe

[1] https://github.com/apache/beam/pull/4990
[2]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing


On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <robbe.sneyd...@ml6.eu> wrote:

> Hello Robert,
>
> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
> this. I'll look into setting one up tomorrow.
>
> In the meantime, you can find the first pull request with the updated
> coders package here:
> https://github.com/apache/beam/pull/4990
>
> Kind regards,
> Robbe
>
> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <rober...@google.com> wrote:
>
>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>> wrote:
>>
>>> Thanks Ahmet and Robert,
>>>
>>> I think we can work on different subpackages in parallel, but it's
>>> important to apply the same strategy everywhere. I'm currently working on
>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>> coders subpackage to create a first pull request. We can then discuss the
>>> applied strategy in detail before merging and applying it to the other
>>> subpackages.
>>>
>>
>> Sounds good. Again, could you document (in a more permanent/easy to look
>> up state than email) when packages are started/done?
>>
>>
>>> This strategy also includes the choice of automated tools. I'm focusing
>>> on writing python 3 code with python 2 compatibility, which means depending
>>> on the future package instead of the six package (which is already used in
>>> some places in the current code base). I have already noticed that this
>>> indeed requires a lot of manual work after running the automated script.
>>> The future package supports python 3.3+ compatibility, so I don't think
>>> there is a higher cost supporting 3.4 compared to 3.5+.
>>>
>>
>> Sure. It may incur a higher maintenance burden long-term though.
>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>> some time to come.)
>>
>>
>>> I have already added a tox environment to run pylint2 with the --py3k
>>> argument per updated subpackage, which should help avoid regression between
>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>> first pull request.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>>
>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> Thank you, Robbie, for your offer to help with contribution here. I
>>>> read over your doc and the one thing I'd like to add is that this work is
>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>> some way to coordinate so as to not overlap work (or just waste time
>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>> various automated/manual conversions along the other would be helpful?
>>>>
>>>> A note on automated tools, they're sometimes overly conservative, so we
>>>> should be sure to review the changes manually. (A typical example of this
>>>> is unnecessarily importing six.moves.xrange when there was no big reason to
>>>> use xrange over range in Python 2, or conversely using list(range(...) in
>>>> Python 3.)
>>>>
>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
>>>> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>>> identify it and decide that before widespread announcement.
>>>>
>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Anand,
>>>>>>>

Re: [PROPOSAL] Python 3 support

2018-04-02 Thread Robbe Sneyders
Hello Robert,

I think a Kanban board on Jira as proposed by Ahmet can be helpful for
this. I'll look into setting one up tomorrow.

In the meantime, you can find the first pull request with the updated
coders package here:
https://github.com/apache/beam/pull/4990

Kind regards,
Robbe

On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <rober...@google.com> wrote:

> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
> wrote:
>
>> Thanks Ahmet and Robert,
>>
>> I think we can work on different subpackages in parallel, but it's
>> important to apply the same strategy everywhere. I'm currently working on
>> applying step 1 (was mostly done already) and 2 of the proposal to the
>> coders subpackage to create a first pull request. We can then discuss the
>> applied strategy in detail before merging and applying it to the other
>> subpackages.
>>
>
> Sounds good. Again, could you document (in a more permanent/easy to look
> up state than email) when packages are started/done?
>
>
>> This strategy also includes the choice of automated tools. I'm focusing
>> on writing python 3 code with python 2 compatibility, which means depending
>> on the future package instead of the six package (which is already used in
>> some places in the current code base). I have already noticed that this
>> indeed requires a lot of manual work after running the automated script.
>> The future package supports python 3.3+ compatibility, so I don't think
>> there is a higher cost supporting 3.4 compared to 3.5+.
>>
>
> Sure. It may incur a higher maintenance burden long-term though.
> (Basically, if we go out the door with 3.4 it's a promise to support it for
> some time to come.)
>
>
>> I have already added a tox environment to run pylint2 with the --py3k
>> argument per updated subpackage, which should help avoid regression between
>> step 2 and step 3 of the proposal. This update will be pushed with the
>> first pull request.
>>
>> Kind regards,
>> Robbe
>>
>>
>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <rober...@google.com> wrote:
>>
>>> Thank you, Robbie, for your offer to help with contribution here. I read
>>> over your doc and the one thing I'd like to add is that this work is very
>>> parallelizable, but if we have enough people looking at it we'll want some
>>> way to coordinate so as to not overlap work (or just waste time discovering
>>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>>> a spreadsheet with modules/packages on one axis and the various
>>> automated/manual conversions along the other would be helpful?
>>>
>>> A note on automated tools, they're sometimes overly conservative, so we
>>> should be sure to review the changes manually. (A typical example of this
>>> is unnecessarily importing six.moves.xrange when there was no big reason to
>>> use xrange over range in Python 2, or conversely using list(range(...) in
>>> Python 3.)
>>>
>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>> identify it and decide that before widespread announcement.
>>>
>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>>
>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>>>> wrote:
>>>>>
>>>>>> Hi Anand,
>>>>>>
>>>>>> Thanks for the feedback.
>>>>>>
>>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>>> Are there any performance tests in place to check for performance
>>>>>> regressions?
>>>>>>
>>>>>
>>>> Yes there is a suite (
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>>> It may not be very comprehensive and seems to be failing for a while. I
>>>> would not block python 3 work on performance for now. That is the
>>>> unfortuante state of things.
>>>>
>>>> If anybody in the community is interested, this would be a great
>>>> opportunity to help with benchmarks in general.
>>>>
>>>>
>>>>>
>>>>>

Re: [PROPOSAL] Python 3 support

2018-03-30 Thread Robbe Sneyders
Thanks Ahmet and Robert,

I think we can work on different subpackages in parallel, but it's
important to apply the same strategy everywhere. I'm currently working on
applying step 1 (was mostly done already) and 2 of the proposal to the
coders subpackage to create a first pull request. We can then discuss the
applied strategy in detail before merging and applying it to the other
subpackages.

This strategy also includes the choice of automated tools. I'm focusing on
writing python 3 code with python 2 compatibility, which means depending on
the future package instead of the six package (which is already used in
some places in the current code base). I have already noticed that this
indeed requires a lot of manual work after running the automated script.
The future package supports python 3.3+ compatibility, so I don't think
there is a higher cost supporting 3.4 compared to 3.5+.

I have already added a tox environment to run pylint2 with the --py3k
argument per updated subpackage, which should help avoid regression between
step 2 and step 3 of the proposal. This update will be pushed with the
first pull request.

Kind regards,
Robbe


On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <rober...@google.com> wrote:

> Thank you, Robbie, for your offer to help with contribution here. I read
> over your doc and the one thing I'd like to add is that this work is very
> parallelizable, but if we have enough people looking at it we'll want some
> way to coordinate so as to not overlap work (or just waste time discovering
> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
> a spreadsheet with modules/packages on one axis and the various
> automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so we
> should be sure to review the changes manually. (A typical example of this
> is unnecessarily importing six.moves.xrange when there was no big reason to
> use xrange over range in Python 2, or conversely using list(range(...) in
> Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>> wrote:
>>>
>>>> Hi Anand,
>>>>
>>>> Thanks for the feedback.
>>>>
>>>> It should be no problem to run everything on DataflowRunner as well.
>>>> Are there any performance tests in place to check for performance
>>>> regressions?
>>>>
>>>
>> Yes there is a suite (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>> It may not be very comprehensive and seems to be failing for a while. I
>> would not block python 3 work on performance for now. That is the
>> unfortuante state of things.
>>
>> If anybody in the community is interested, this would be a great
>> opportunity to help with benchmarks in general.
>>
>>
>>>
>>>> Some questions were raised in the proposal document which I want to add
>>>> to this conversation:
>>>>
>>>> The first comment was about the targeted python 3 versions. We proposed
>>>> to target 3.6 since it is the latest version available and added 3.5
>>>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>>>> this though).
>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>> It would be great to get some opinions on this.
>>>>
>>>
>> My preference is to support 3.4+. I searched a bit on the web to
>> understand the usage statistics for python 3, it seems like python 3.4 has
>> ~20% usage and python 3.4+ has 99% (
>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>> Based on that, I think it makes sense to support it.
>>
>>
>>
>>>
>>>> Another comment was made on how to avoid regression during the porting
>>>> progress.
>>>> After applying step 1 

Re: [PROPOSAL] Python 3 support

2018-03-27 Thread Robbe Sneyders
Hi Anand,

Thanks for the feedback.

It should be no problem to run everything on DataflowRunner as well.
Are there any performance tests in place to check for performance
regressions?

Some questions were raised in the proposal document which I want to add to
this conversation:

The first comment was about the targeted python 3 versions. We proposed to
target 3.6 since it is the latest version available and added 3.5 because
3.6 adoption seems rather low (hard to find any relevant sources on this
though).
If the beam community prefers 3.4, I would propose to target 3.4 only
during porting and add 3.5 and 3.6 later so we don't slow down the porting
progress. 3.4 has the advantage of already being installed on the workers
and allows pySpark pipelines to be moved over to beam more easily.
It would be great to get some opinions on this.

Another comment was made on how to avoid regression during the porting
progress.
After applying step 1 and step 2, no python 3 compatibility lint warnings
should remain, so it would be great if we could enforce this check for
every pull request on an already updated subpackage.
After applying step 3, all tests should run on python 3, so again it would
be great if we can enforce these per updated subpackage.
Any insights on how to best accomplish this?

Thanks,
Robbe

On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:

> Thank you Robbe.
>
> I reviewed the document it looks reasonable to me. I will touch on some
> points that were not mentioned:
> - Runner exercise different code paths. Doing auto conversions and
> focusing on DirectRunner is not enough. It is worthwhile to run things on
> DataflowRunner as well. This can be triggered from Jenkins. It will
> validate that we are still compatible for python 2.
> - Similar to above but with an eye on perf regressions.
>
> For project tracking on JIRA, please feel free to create any new issues,
> close stale ones, or take ownership of any open issues. All JIRAs should be
> assigned to the people actively working on them. If you wan to track it in
> a separate way, you can also propose that. (For example a kanban board is
> used for portability effort which is fully supported in JIRA.)
>
> I will also call out to a few other people in addition to Holden who
> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
> people (and myself) for reviews and other questions that you have.
>
> Welcome again, and looking forward to your contributions.
>
> Thank you,
> Ahmet
>
>
>
> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <robbe.sneyd...@ml6.eu>
> wrote:
>
>> Hello everyone,
>>
>> In the next month(s), me and my colleague Matthias will commit a lot of
>> time and effort to python 3 support for beam and we would like to discuss
>> the best way to go forward with this.
>>
>> We have drawn up a document [1] with a high level outline of the proposed
>> approach and would like to get your feedback on this.
>>
>> The main Jira issue [2] for python 3 support has been mostly inactive for
>> the past year. Other smaller issues have been opened, but it's hard to
>> track the general progress. It would be great if anyone could offer some
>> insights on how to best handle this project on Jira.
>>
>> @Holden Karau, you seem to have already put in a lot of effort to add
>> python 3 support, so it would be great to get your insights and find a way
>> to merge our efforts.
>>
>> Kind regards,
>> Robbe
>>
>> [1]
>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>
>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
>
> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


[PROPOSAL] Python 3 support

2018-03-23 Thread Robbe Sneyders
Hello everyone,

In the next month(s), me and my colleague Matthias will commit a lot of
time and effort to python 3 support for beam and we would like to discuss
the best way to go forward with this.

We have drawn up a document [1] with a high level outline of the proposed
approach and would like to get your feedback on this.

The main Jira issue [2] for python 3 support has been mostly inactive for
the past year. Other smaller issues have been opened, but it's hard to
track the general progress. It would be great if anyone could offer some
insights on how to best handle this project on Jira.

@Holden Karau, you seem to have already put in a lot of effort to add
python 3 support, so it would be great to get your insights and find a way
to merge our efforts.

Kind regards,
Robbe

[1]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing

[2] https://issues.apache.org/jira/browse/BEAM-1251
-- 

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08