Re: [Final Reminder] Beam 2.14 release branch will be cut tomorrow at 6pm UTC

2019-06-18 Thread Ismaël Mejía
Cutting the next release branch is not equal to starting the release
vote. In the past we have cut the branch even if there are still open
issues and then give people some days to trim their issues.

So the release manager should create the release branch in the
specified date and sync with the people working on the open issues so
they cherry pick their PRs in the release branch if needed or move
them to the next release and start the vote ONLY when the open issue
list [1] count gets down to 0.

Note: We can propose a different alternative but this has been
effective in the past and gives contributors time to fix things to
solve critical/blocker issues or issues that somehow need to be
synced/discussed. Creating new RCs is ‘long’ and not yet 100%
automated (so error-prone), also votes take long, so the less
RCs/votes we have to do the better:

[1] 
https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0


On Wed, Jun 19, 2019 at 3:19 AM Chamikara Jayalath  wrote:
>
>
>
> On Tue, Jun 18, 2019 at 6:00 PM Anton Kedin  wrote:
>>
>> What is the right thing to do if it is not ready by the proposed branch cut 
>> time tomorrow? I don't think the Jira issue provides enough context about 
>> the severity of the problem and why it has to go out specifically in 2.14.0. 
>> Without additional context I think the expected path forward should look 
>> like this:
>> * if it's a regression or something that really needs to block the release 
>> then I think more information about the problem is needed;
>
>
> Context is that GCS may start throttling some of the requests and raising 429 
> errors so Beam should implement logic for retrying such failures with 
> exponential backoff. Java SDK is already handling such failures correctly. 
> +Heejong Lee is actively working on a fix for Python SDK. I believe this will 
> be a relatively small change and a PR should be available within a day or so. 
> We can also try to cherry-pick the fix to release branch after it is cut if 
> you want to go ahead with the scheduled branch cut time.
>
> Thanks,
> Cham
>
>>
>> * if it's not a regression, proceed with the release even without the fix;
>> * if the fix is ready before the release is completed, consider 
>> cherry-picking and re-doing the appropriate steps of the release process;
>> * if the fix is not ready, consider doing a follow-up 2.14.1 release;
>> * otherwise delay until 2.15.0;
>>
>> Regards,
>> Anton
>>
>>
>> On Tue, Jun 18, 2019 at 4:37 PM Chamikara Jayalath  
>> wrote:
>>>
>>> Please note that https://issues.apache.org/jira/browse/BEAM-7424 was marked 
>>> as a blocker and we'd like to get the fix to Python SDK into the 2.14 
>>> release.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Tue, Jun 18, 2019 at 4:16 PM Anton Kedin  wrote:

 It's a reminder, I am planning to cut the release branch tomorrow, on 
 Wednesday, June 19, at 11am PDT (Seattle local time, corresponds to 
 [19:00@GMT+1] and [18:00@UTC]). Please make sure all the code you want in 
 the release is submitted by that time, and that all blocking Jiras have 
 the release version attached.

 Thank you,
 Anton

 [1] 
 https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
 [2] 
 https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0


Re: [SQL] Let's split the TableProvider

2019-06-18 Thread Kenneth Knowles
Nice doc. I really appreciate how it gives an overview of the code.

Kenn

On Fri, Jun 14, 2019 at 4:14 PM Anton Kedin  wrote:

> Hi dev@, and especially anyone interested in SQL,
>
> We have an interface called TableProvider (and some other related classes)
> in Beam SQL that manages how we resolve the table schemas, construct IOs
> and do other related and unrelated things when parsing the queries. At the
> moment it feels very overloaded and not easy to use or extend. I propose we
> split it into few more abstractions. Here's an initial draft doc for
> discussion, let me know what you think:
>
> https://docs.google.com/document/d/1QAPz74XMctCsiUnutWR1ejEXjKGmpSQJ61qtfAiBE4E
>
> Regards,
> Anton
>
>
>
>
>


Looping timer blog

2019-06-18 Thread Reza Rokni
Hi folks,

Just wanted to drop a note here on a new pattern that folks may find
interesting, called  Looping Timers. It allows for default values to be
created in interval windows in the absence of any external data coming into
the pipeline. The details are in this blog below:

https://beam.apache.org/blog/2019/06/11/looping-timers.html

Its main utility is when dealing with time series data. There are still
rough edges, like dealing with TTL and it would be great to hear
feedback on ways it can be improved.

The next pattern to publish in this domain will assist will hold and
propagation of values from one interval window to the next, which coupled
to looping timers starts to solve some interesting problems.

Cheers

Reza



-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: [Final Reminder] Beam 2.14 release branch will be cut tomorrow at 6pm UTC

2019-06-18 Thread Chamikara Jayalath
On Tue, Jun 18, 2019 at 6:00 PM Anton Kedin  wrote:

> What is the right thing to do if it is not ready by the proposed branch
> cut time tomorrow? I don't think the Jira issue provides enough context
> about the severity of the problem and why it has to go out specifically in
> 2.14.0. Without additional context I think the expected path forward should
> look like this:
> * if it's a regression or something that really needs to block the release
> then I think more information about the problem is needed;
>

Context is that GCS may start throttling some of the requests and raising
429 errors so Beam should implement logic for retrying such failures with
exponential backoff. Java SDK is already handling such failures
correctly. +Heejong
Lee  is actively working on a fix for Python SDK. I
believe this will be a relatively small change and a PR should be available
within a day or so. We can also try to cherry-pick the fix to release
branch after it is cut if you want to go ahead with the scheduled branch
cut time.

Thanks,
Cham


> * if it's not a regression, proceed with the release even without the fix;
> * if the fix is ready before the release is completed, consider
> cherry-picking and re-doing the appropriate steps of the release process;
> * if the fix is not ready, consider doing a follow-up 2.14.1 release;
> * otherwise delay until 2.15.0;
>
> Regards,
> Anton
>
>
> On Tue, Jun 18, 2019 at 4:37 PM Chamikara Jayalath 
> wrote:
>
>> Please note that https://issues.apache.org/jira/browse/BEAM-7424 was
>> marked as a blocker and we'd like to get the fix to Python SDK into the
>> 2.14 release.
>>
>> Thanks,
>> Cham
>>
>> On Tue, Jun 18, 2019 at 4:16 PM Anton Kedin  wrote:
>>
>>> It's a reminder, I am planning to cut the release branch tomorrow, on
>>> Wednesday, June 19, at 11am PDT (Seattle local time, corresponds to
>>> [19:00@GMT+1] and [18:00@UTC]). Please make sure all the code you want
>>> in the release is submitted by that time, and that all blocking Jiras have
>>> the release version attached.
>>>
>>> Thank you,
>>> Anton
>>>
>>> [1]
>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>> [2]
>>> https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0
>>>
>>


Re: [Final Reminder] Beam 2.14 release branch will be cut tomorrow at 6pm UTC

2019-06-18 Thread Anton Kedin
What is the right thing to do if it is not ready by the proposed branch cut
time tomorrow? I don't think the Jira issue provides enough context about
the severity of the problem and why it has to go out specifically in
2.14.0. Without additional context I think the expected path forward should
look like this:
* if it's a regression or something that really needs to block the release
then I think more information about the problem is needed;
* if it's not a regression, proceed with the release even without the fix;
* if the fix is ready before the release is completed, consider
cherry-picking and re-doing the appropriate steps of the release process;
* if the fix is not ready, consider doing a follow-up 2.14.1 release;
* otherwise delay until 2.15.0;

Regards,
Anton


On Tue, Jun 18, 2019 at 4:37 PM Chamikara Jayalath 
wrote:

> Please note that https://issues.apache.org/jira/browse/BEAM-7424 was
> marked as a blocker and we'd like to get the fix to Python SDK into the
> 2.14 release.
>
> Thanks,
> Cham
>
> On Tue, Jun 18, 2019 at 4:16 PM Anton Kedin  wrote:
>
>> It's a reminder, I am planning to cut the release branch tomorrow, on
>> Wednesday, June 19, at 11am PDT (Seattle local time, corresponds to
>> [19:00@GMT+1] and [18:00@UTC]). Please make sure all the code you want
>> in the release is submitted by that time, and that all blocking Jiras have
>> the release version attached.
>>
>> Thank you,
>> Anton
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>> [2]
>> https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0
>>
>


Re: [Final Reminder] Beam 2.14 release branch will be cut tomorrow at 6pm UTC

2019-06-18 Thread Chamikara Jayalath
Please note that https://issues.apache.org/jira/browse/BEAM-7424 was marked
as a blocker and we'd like to get the fix to Python SDK into the 2.14
release.

Thanks,
Cham

On Tue, Jun 18, 2019 at 4:16 PM Anton Kedin  wrote:

> It's a reminder, I am planning to cut the release branch tomorrow, on
> Wednesday, June 19, at 11am PDT (Seattle local time, corresponds to
> [19:00@GMT+1] and [18:00@UTC]). Please make sure all the code you want in
> the release is submitted by that time, and that all blocking Jiras have the
> release version attached.
>
> Thank you,
> Anton
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
> [2]
> https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0
>


[Final Reminder] Beam 2.14 release branch will be cut tomorrow at 6pm UTC

2019-06-18 Thread Anton Kedin
It's a reminder, I am planning to cut the release branch tomorrow, on
Wednesday, June 19, at 11am PDT (Seattle local time, corresponds to
[19:00@GMT+1] and [18:00@UTC]). Please make sure all the code you want in
the release is submitted by that time, and that all blocking Jiras have the
release version attached.

Thank you,
Anton

[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
[2]
https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0


Re: Plan for dropping python 2 support

2019-06-18 Thread Ahmet Altay
Thank you for the update, very helpful. It might be worthwhile to share a
version of this with user mailing list after 2.14.

Remaining question for me is: There is no plan for an LTS release
currently. Would it make sense for us to target one after known remaining
issues are mostly fixed. What release would that be?

On Tue, Jun 18, 2019 at 12:08 AM Valentyn Tymofieiev 
wrote:

> To give a better understanding where we are w.r.t. Python 3,  I'd like to
> give a quick overview of the recent work that has been happening in Beam
> community to support Python 3, and to summarize the current status of this
> effort.
>
> Current status:
>
>1.
>
>Beam 2.11.0 was the first release that offered Python 3 support,
>specifically Python 3.5 support. Due to several limitations that have been
>fixed since 2.11.0, Beam 2.13.0 (or newer version) is recommended for
>Python 3 pipelines.
>2.
>
>Pipelines running on Portable Flink / Spark runners may have to use
>Beam 2.14.0 once it becomes available.
>3.
>
>Python 3.5 or newer version of the interpreter is required to install
>Beam and run Python 3 pipelines.
>
>
> Known remaining limitations of current Python 3 offering:
>
>
>1.
>
>Several syntactic constructs introduced in Python 3 (keyword-only
>arguments, dataclasses), are not yet supported. See: BEAM-5878, BEAM-7284.
>2.
>
>Pickling errors occasionally prevent usage of --save_main_session
>flag, but changes to the pipeline code may help to overcome this
>limitation. See: BEAM-6158, BEAM-7540
>3.
>
>Beam has limited type inference capabilities support in Python 3.6+,
>and type checking of Beam typehints is not always enforced, see: BEAM-2713,
>BEAM-7377.
>
>
> The cause of limitations 1-2 largely lies in Beam dependency 'dill' that
> supports pickling. In the immediate future we will be working on evaluating
> replacements or/and fixes to address this. We are also working on an
> improved typehints support in Python 3, see: BEAM-2713.
>
> The efforts to make Beam codebase Python3-compatible started back in 2017.
> Most of this work is visible in BEAM-1251[1] and in Kanban Board [2].
>
>
> 2017:
>
>-
>
>BEAM-1251 is opened, and first efforts to make Beam codebase
>Python3-compatible followed shortly.
>
>
> Q3-Q4 2019:
>
>-
>
>Active work on "futurizing" Beam codebase piece-by-piece while
>preventing regressions in performance in existing Python 2 offering.
>-
>
>Building test infrastructure to incorporate Python 3 test scenarios.
>
>
> Apache Beam 2.11.0 (Q1 2019):
>
>-
>
>"Futurization" of Beam Python codebase completed.
>-
>
>Apache Beam 2.11.0 is released with Python 3 support, with limitations.
>-
>
>Continuous pre-commit and post-commit test suites added for Python
>3.5.
>-
>
>Gaps in Python 3 support in Datastore IO, Avro IO, Bigquery IO
>identified and scoped.
>-
>
>Continuous testing mostly limited to Python 3.5.
>
>
> Apache Beam 2.12.0 (Q2 2019):
>
>-
>
>Pre and Post-commit test coverage expanded to Python 3.5, 3.6, 3.7.
>-
>
>Direct and Dataflow runners added support for Python 3.6 - 3.7.
>
>
> Apache Beam 2.13.0 (Q2 2019)
>
>-
>
>Avro IO support enabled on Python 3.
>-
>
>Datastore IO support enabled on Python 3.
>-
>
>Bigquery IO support for BYTES datatype enabled on Python 3.
>
>
> Apache Beam 2.14.0 (to be released in Q3 2019)
>
>-
>
>Python 3 bug fixes for Bigquery IO and Portable Runner
>-
>
>Every Python SDK commit exercises Direct, Dataflow, and Portable Flink
>runners on Python 3 in various test suites.
>-
>
>Beam 2.14.0 will declare Python 3.5, 3.6, 3.7 support in PyPi.
>
>
> Next steps:
>
>-
>
>Address known limitations and user feedback.
>-
>
>Increase Python 3 test coverage in portable runner.
>-
>
>Assist Beam users in Python 2 -> Python 3 migration.
>-
>
>Deprecate of Python 2 support in Beam, cleanup the codebase.
>
>
> I'd like to thank all Beam contributors who have been helping to push this
> effort so far.
>
>
> [1] https://issues.apache.org/jira/browse/BEAM-1251
>
> [2]
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail
>
> On Tue, Jun 18, 2019 at 12:03 AM Valentyn Tymofieiev 
> wrote:
>
>> I like the update Ismaël referenced [1], I think we should prepare a
>> similar update for Beam users. I would propose the following:
>> - Designate last LTS release that we will have in 2019 to be the last LTS
>> release with Python 2 support.
>> - Add a Beam-specific deprecation warning on Python 2 starting from the
>> last LTS release, or last 2 releases of Beam in 2019, whichever happens
>> earlier.
>> - Remove Python 2 support starting from the first release in 2020.
>>
>> The cost of maintaining Python 2.7 support is higher than 0. Some issues
>> that come to mind:
>> - Maintaining Py2.7 / Py 3+ compatibility 

Re: [DISCUSS] Portability representation of schemas

2019-06-18 Thread Robert Bradshaw
Thanks for updating that alternative.

As for the to/from functions, it does seem pragmatic to dangle them
off the purely portable representation (either as a field there, or as
an opaque logical type whose payload contains the to/from functions,
or a separate coder that wraps the schema coder (though I can't see
how the latter would work well if nesting is allowed)) until we figure
out a good way to attach them to the operations themselves.

On Tue, Jun 18, 2019 at 2:37 AM Brian Hulette  wrote:
>
> Realized I completely ignored one of your points, added another response 
> inline.
>
> On Fri, Jun 14, 2019 at 2:20 AM Robert Bradshaw  wrote:
>>
>> On Thu, Jun 13, 2019 at 8:42 PM Reuven Lax  wrote:
>> >
>> > Spoke to Brian about his proposal. It is essentially this:
>> >
>> > We create PortableSchemaCoder, with a well-known URN. This coder is 
>> > parameterized by the schema (i.e. list of field name -> field type pairs).
>>
>> Given that we have a field type that is (list of field names -> field
>> type pairs), is there a reason to do this enumeration at the top level
>> as well? This would likely also eliminate some of the strangeness
>> where we want to treat a PCollection with a single-field row as a
>> PCollection with just that value instead.
>
>
> This is part of what I was suggesting in my "Root schema is a logical type" 
> alternative [1], except that the language about SDK-specific logical types is 
> now obsolete. I'll update it to better reflect this alternative.
> I do think at the very least we should just have one (list of field names -> 
> field type pairs) that is re-used, which is what I did in my PR [2].
>
> [1] 
> https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.7570feur1qin
> [2] 
> https://github.com/apache/beam/pull/8853/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2L686
>
>>
>>
>> > Java also continues to have its own CustomSchemaCoder. This is 
>> > parameterized by the schema as well as the to/from functions needed to 
>> > make the Java API "nice."
>> >
>> > When the expansion service expands a Java PTransform for usage across 
>> > languages, it will add a transform mapping the  PCollection with 
>> > CustomSchemaCoder to a PCollection which has PortableSchemaCoder. This way 
>> > Java can maintain the information needed to maintain its API (and Python 
>> > can do the same), but there's no need to shove this information into the 
>> > well-known portable representation.
>> >
>> > Brian, can you confirm that this was your proposal? If so, I like it.
>>
>> The major downside of this that I see is that it assumes that
>> transparency is only needed at certain "boundaries" and everything
>> between these boundaries is opaque. I think we'd be better served by a
>> format where schemas are transparently represented throughout. For
>> example, the "boundaries" between runner and SDK are not known at
>> pipeline construction time, and we want the runner <-> SDK
>> communication to understand the schemas to be able to use more
>> efficient transport mechanisms (e.g. batches of arrow records). It may
>> also be common for a pipeline in language X to invoke two transforms
>> in language Y in succession (e.g. two SQL statements) in which case
>> introducing two extra transforms in the expansion service would be
>> wasteful. I also think we want to allow the flexibility for runners to
>> swap out transforms an optimizations regardless of construction-time
>> boundaries (e.g. implementing a projection natively, rather than
>> outsourcing to the SDK).
>>
>> Are the to/from conversion functions the only extra information needed
>> to make the Java APIs nice? If so, can they be attached to the
>> operations themselves (where it seems they're actually needed/used),
>> rather than to the schema/coder of the PCollection? Alternatively, I'd
>> prefer this be opaque metadata attached to a transparent schema rather
>> than making the whole schema opaque.
>>
>> > We've gone back and forth discussing abstracts for over a month now. I 
>> > suggest that the next step should be to create a PR, and move discussion 
>> > to that PR. Having actual code can often make discussion much more 
>> > concrete.
>>
>> +1 to a PR, though I feel like there are fundamental high-level issues
>> that are still not decided. (I suppose we should be open to throwing
>> whole PRs away in that case.) There are certainly pieces that we'll
>> know that we need (like the ability to serialize a row consistently in
>> all languages) we can get in immediately.
>>
>> > Reuven
>> >
>> > On Thu, Jun 13, 2019 at 6:28 AM Robert Bradshaw  
>> > wrote:
>> >>
>> >> On Thu, Jun 13, 2019 at 5:47 AM Reuven Lax  wrote:
>> >>>
>> >>>
>> >>> On Wed, Jun 12, 2019 at 8:29 PM Kenneth Knowles  wrote:
>> 
>>  Can we choose a first step? I feel there's consensus around:
>> 
>>   - the basic idea of what a schema looks like, ignoring logical types 
>>  or SDK-specific bits
>>   - the 

Re: Plan for dropping python 2 support

2019-06-18 Thread Valentyn Tymofieiev
To give a better understanding where we are w.r.t. Python 3,  I'd like to
give a quick overview of the recent work that has been happening in Beam
community to support Python 3, and to summarize the current status of this
effort.

Current status:

   1.

   Beam 2.11.0 was the first release that offered Python 3 support,
   specifically Python 3.5 support. Due to several limitations that have been
   fixed since 2.11.0, Beam 2.13.0 (or newer version) is recommended for
   Python 3 pipelines.
   2.

   Pipelines running on Portable Flink / Spark runners may have to use Beam
   2.14.0 once it becomes available.
   3.

   Python 3.5 or newer version of the interpreter is required to install
   Beam and run Python 3 pipelines.


Known remaining limitations of current Python 3 offering:


   1.

   Several syntactic constructs introduced in Python 3 (keyword-only
   arguments, dataclasses), are not yet supported. See: BEAM-5878, BEAM-7284.
   2.

   Pickling errors occasionally prevent usage of --save_main_session flag,
   but changes to the pipeline code may help to overcome this limitation.
   See: BEAM-6158, BEAM-7540
   3.

   Beam has limited type inference capabilities support in Python 3.6+, and
   type checking of Beam typehints is not always enforced, see: BEAM-2713,
   BEAM-7377.


The cause of limitations 1-2 largely lies in Beam dependency 'dill' that
supports pickling. In the immediate future we will be working on evaluating
replacements or/and fixes to address this. We are also working on an
improved typehints support in Python 3, see: BEAM-2713.

The efforts to make Beam codebase Python3-compatible started back in 2017.
Most of this work is visible in BEAM-1251[1] and in Kanban Board [2].


2017:

   -

   BEAM-1251 is opened, and first efforts to make Beam codebase
   Python3-compatible followed shortly.


Q3-Q4 2019:

   -

   Active work on "futurizing" Beam codebase piece-by-piece while
   preventing regressions in performance in existing Python 2 offering.
   -

   Building test infrastructure to incorporate Python 3 test scenarios.


Apache Beam 2.11.0 (Q1 2019):

   -

   "Futurization" of Beam Python codebase completed.
   -

   Apache Beam 2.11.0 is released with Python 3 support, with limitations.
   -

   Continuous pre-commit and post-commit test suites added for Python 3.5.
   -

   Gaps in Python 3 support in Datastore IO, Avro IO, Bigquery IO
   identified and scoped.
   -

   Continuous testing mostly limited to Python 3.5.


Apache Beam 2.12.0 (Q2 2019):

   -

   Pre and Post-commit test coverage expanded to Python 3.5, 3.6, 3.7.
   -

   Direct and Dataflow runners added support for Python 3.6 - 3.7.


Apache Beam 2.13.0 (Q2 2019)

   -

   Avro IO support enabled on Python 3.
   -

   Datastore IO support enabled on Python 3.
   -

   Bigquery IO support for BYTES datatype enabled on Python 3.


Apache Beam 2.14.0 (to be released in Q3 2019)

   -

   Python 3 bug fixes for Bigquery IO and Portable Runner
   -

   Every Python SDK commit exercises Direct, Dataflow, and Portable Flink
   runners on Python 3 in various test suites.
   -

   Beam 2.14.0 will declare Python 3.5, 3.6, 3.7 support in PyPi.


Next steps:

   -

   Address known limitations and user feedback.
   -

   Increase Python 3 test coverage in portable runner.
   -

   Assist Beam users in Python 2 -> Python 3 migration.
   -

   Deprecate of Python 2 support in Beam, cleanup the codebase.


I'd like to thank all Beam contributors who have been helping to push this
effort so far.


[1] https://issues.apache.org/jira/browse/BEAM-1251

[2]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail

On Tue, Jun 18, 2019 at 12:03 AM Valentyn Tymofieiev 
wrote:

> I like the update Ismaël referenced [1], I think we should prepare a
> similar update for Beam users. I would propose the following:
> - Designate last LTS release that we will have in 2019 to be the last LTS
> release with Python 2 support.
> - Add a Beam-specific deprecation warning on Python 2 starting from the
> last LTS release, or last 2 releases of Beam in 2019, whichever happens
> earlier.
> - Remove Python 2 support starting from the first release in 2020.
>
> The cost of maintaining Python 2.7 support is higher than 0. Some issues
> that come to mind:
> - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
> difficult to use Python 3 syntax in Beam which may be necessary to support
> and test syntactic constructs introduced in Python 3.
> - Running additional test suites increases the load on test infrastructure
> and increases flakiness.
>
> [1] https://spark.apache.org/news/plan-for-dropping-python-2-support.html
>
> On Tue, Jun 11, 2019 at 7:57 AM Robert Bradshaw 
> wrote:
>
>> Sounds good.
>>
>> On Fri, Jun 7, 2019 at 8:28 PM Ahmet Altay  wrote:
>>
>>> I agree with you. A more recent LTS release with python 2 support will
>>> be good. Cost of maintaining python 2 support is also fairly low (maybe
>>> zero 

Re: Testing code in extensions against runner

2019-06-18 Thread Reza Rokni
Thanx!

It would definitely be great to have the ability for folks adding utility /
extensions to be able to have them run against all runners.

Cheers
Reza

On Fri, 7 Jun 2019, 19:05 Lukasz Cwik,  wrote:

> We have been currently been having every runner define and manage its own
> suite/tests so yes modifying flink_runner.gradle is currently the correct
> thing to do.
>
> There is a larger discussion about whether this is the right way since we
> would like to capture things like perf benchmarks and validates runner
> tests so we can add information to the website about how well a feature is
> supported by each runner automatically.
>
>
>
> On Thu, Jun 6, 2019 at 8:36 PM Reza Rokni  wrote:
>
>> Hi,
>>
>> I would like to validate some code that I am building under
>> extensions against different runners. It makes use of some caches in a DoFn
>> which are a little off the beaten path.
>>
>> I have added @ValidatesRunner to the class and by adding the right
>> values to the gradle file in flink_runner have got the tests to run.
>> However it does not feel right for me to change the flink_runner.gradle
>> file to achieve this, especially as this is all experimental and under
>> extensions.
>>
>> I could copy over all the bits needed from the gradle file over to my
>> extensions gradle, but then I would need to do that for all runners , which
>> also feels a bit heavy weight. Is there a way, or should there be a way of
>> having a task added to my gradle file which will do tests against all
>> runners for me?
>>
>> Cheers
>> Reza
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>


Re: Plan for dropping python 2 support

2019-06-18 Thread Valentyn Tymofieiev
I like the update Ismaël referenced [1], I think we should prepare a
similar update for Beam users. I would propose the following:
- Designate last LTS release that we will have in 2019 to be the last LTS
release with Python 2 support.
- Add a Beam-specific deprecation warning on Python 2 starting from the
last LTS release, or last 2 releases of Beam in 2019, whichever happens
earlier.
- Remove Python 2 support starting from the first release in 2020.

The cost of maintaining Python 2.7 support is higher than 0. Some issues
that come to mind:
- Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
difficult to use Python 3 syntax in Beam which may be necessary to support
and test syntactic constructs introduced in Python 3.
- Running additional test suites increases the load on test infrastructure
and increases flakiness.

[1] https://spark.apache.org/news/plan-for-dropping-python-2-support.html

On Tue, Jun 11, 2019 at 7:57 AM Robert Bradshaw  wrote:

> Sounds good.
>
> On Fri, Jun 7, 2019 at 8:28 PM Ahmet Altay  wrote:
>
>> I agree with you. A more recent LTS release with python 2 support will be
>> good. Cost of maintaining python 2 support is also fairly low (maybe zero
>> actually besides keeping some pre-existing compatibility code).
>>
>> I believe we are referring to two separate things with support:
>> - Supporting existing releases for patches - I agree that we need to give
>> users a long enough window to upgrade. Great if it happens with an LTS
>> release. Even if it does not, I think it will be fair to offer patches on
>> the last python 2 supporting release during some part of 2020 if that
>> becomes necessary.
>> - Making new releases with python 2 support - Each new Beam release with
>> python 2 support will implicitly extend the lifetime of beam's python 2
>> support. I do not think we need to extend this to beyond 2019. 2 releases
>> (~ 3 months) after solid python 3 support will very likely put the last
>> python 2 supporting release to last quarter of 2019 already.
>>
>> On Fri, Jun 7, 2019 at 2:15 AM Robert Bradshaw 
>> wrote:
>>
>>> I don't think the second release with robust/recommended Python 3
>>> support should be the last release with Python 2 support--that is
>>> simply not enough time for people to migrate. (Look at how long it
>>> took us...) It does make a lot of sense to at least have one LTS
>>> release with support for both.
>>>
>>> Regarding timeline, I think we could safely say we expect to support
>>> Python 2 through 2019, likely for some of 2020 (possibly only via an
>>> LTS release), and (very) unlikely beyond 2020.
>>>
>>> On Wed, Jun 5, 2019 at 6:34 PM Ahmet Altay  wrote:
>>> >
>>> > I agree with the sentiment on this thread. Our priority needs to be
>>> offering good python 3 support that we can comfortably recommend users to
>>> switch. Progress on that so far has been promising and I do anticipate that
>>> we will reach there in the near future.
>>> >
>>> > My proposal would be, once we reach to that state, we can mark the
>>> first subsequent Beam release as the last Beam release that supports Python
>>> 2. (Alternatively: in line with the previous experimental/deprecated
>>> discussion we can make 2 more release with python 2 support rather than
>>> just 1 more.) With the current state, we would not give users plenty of
>>> time to upgrade python 3. So in addition, I would suggest we can consider
>>> and upgrade relief by offering something like a 6-month support on the last
>>> python 2 compatible release. We might do that in the context of an LTS
>>> release.
>>> >
>>> > I do not believe we have a timeline we can share with users at this
>>> point. However if we go with this suggestion, we will probably support
>>> python 2 approximately until mid-2020.
>>> >
>>> > Ahmet
>>> >
>>> > On Wed, Jun 5, 2019 at 4:53 AM Tanay Tummalapalli 
>>> wrote:
>>> >>
>>> >> We can support Python 2 for some time in 2020, but, we should target
>>> a date no later than 2020 to drop support.
>>> >> If we do plan to drop support for Python 2 in 2020, we should sign
>>> the Python 3 statement[1], declaring that we will "drop support for Python
>>> 2.7 no later than 2020".
>>> >>
>>> >> In addition to the statement, keeping a target release and date(if
>>> possible) or timeline to drop support would also help users to decide when
>>> they need to work on migrating to Python 3.
>>> >>
>>> >> Regards,
>>> >> - TT
>>> >>
>>> >> [1] https://python3statement.org/
>>> >>
>>> >> On Wed, Jun 5, 2019 at 4:37 PM Robert Bradshaw 
>>> wrote:
>>> >>>
>>> >>> Until Python 3 support for Beam is officially out of beta and
>>> >>> recommended, I don't think we can tell people to stop using Python 2.
>>> >>> Given that 2020 is just over 6 months away, that seems a short
>>> >>> transition time, so I would guess we'll have to continue supporting
>>> >>> Python 2 sometime into 2020.
>>> >>>
>>> >>> A quick survey of users would be valuable here. But first priority is
>>> >>> making