Re: Python PreCommit failing due to broken target ":sdks:python:test-suites:tox:py2:docs"

2019-09-19 Thread Chamikara Jayalath
This was fixed. Thanks all.

- Cham

On Thu, Sep 19, 2019 at 1:49 PM Chamikara Jayalath 
wrote:

> Please see https://issues.apache.org/jira/browse/BEAM-8286
>
> Thanks,
> Cham
>


Re: Plan for dropping python 2 support

2019-09-19 Thread Ahmet Altay
Thanks a lot for sharing your thoughts, I completely agree that we need to
minimize the burden on our users as much as possible. Especially in this
case when we are offering a robust python 3 solution just now. However I do
share the same concerns related to dependencies and tool chains, It will be
increasingly difficult for us to keep our code base compatible with python2
and python3 overtime. (To be very explicit, one of those dependencies is
Dataflow's python pre-portability workers.)

On Thu, Sep 19, 2019 at 5:17 PM Maximilian Michels  wrote:

> Granted that we just have finalized the Python 3 support, we should
> allow time for it to mature and for users to make the switch.
>
> > Oh, and one more thing, I think it'd make sense for Apache Beam to
> > sign https://python3statement.org/. The promise is that we'd
> > discontinue Python 2 support *in* 2020, which is not committing us to
> > January if we're not ready. Worth a vote?
>
> +1
>

+1


>
> On 19.09.19 15:59, Robert Bradshaw wrote:
> > Oh, and one more thing, I think it'd make sense for Apache Beam to
> > sign https://python3statement.org/. The promise is that we'd
> > discontinue Python 2 support *in* 2020, which is not committing us to
> > January if we're not ready. Worth a vote?
> >
> >
> > On Thu, Sep 19, 2019 at 3:58 PM Robert Bradshaw 
> wrote:
> >>
> >> Exactly how long we support Python 2 depends on our users. Other than
> >> those that speak up (such as yourself, thanks!), it's hard to get a
> >> handle on how many need Python 2 and for how long. (Should we send out
> >> a survey? Maybe after some experience with 2.16?)
>

+1, we had some success with collecting information from users using
Twitter surveys.


> >>
> >> On the one hand, the whole ecosystem is finally moving on, and even if
> >> Beam continues to support Python 2 our dependencies, or other projects
> >> that are being used in conjunction with Beam, will also be going
> >> Python 3 only. On the other hand, Beam is, admittedly, quite late to
> >> the party and could be the one holding people back, and looking at how
> >> long it took us, if we just barely make it by the end of the year it's
> >> unreasonable to say at that point "oh, and we're dropping 2.7 at the
> >> same time."
> >>
> >> The good news is that 2.16 is shaping up to be a release I would
> >> recommend everyone migrate to Python 3 on. The remaining issues are
> >> things like some issues with main sessions (which already has issues
> >> in Python 2) and not supporting keyword-only arguments (a new feature,
> >> not a regression). I would guess that even 2.15 is already good enough
> >> for most people, at least to kick the tires and running tests to start
> >> the effort.
>

I share the same sentiment. Beam 2.16 will offer a strong python 3
offering. Yes, there are known issues but this is not much different than
the known issues for rest of the python offering.


> >>
> >> (I also agree with the sentiment that once we go 3.x only, it'll be
> >> likely harder to maintain a 2.x LTS... but the whole LTS thing is
> >> being discussed in another thread.)

>>
> >> On Thu, Sep 19, 2019 at 2:44 PM Chad Dombrova 
> wrote:
> >>>
> >>> Hi all,
> >>> I had a read through this thread in the archives. It occurred before I
> joined the mailing list, so I hope that this email connects up with the
> thread properly for everyone.
> >>>
> >>> I'd like to respond to the following points:
> >>>
>  I believe we are referring to two separate things with support:
>  - Supporting existing releases for patches - I agree that we need to
> give
>  users a long enough window to upgrade. Great if it happens with an LTS
>  release. Even if it does not, I think it will be fair to offer
> patches on
>  the last python 2 supporting release during some part of 2020 if that
>  becomes necessary.
>  - Making new releases with python 2 support - Each new Beam release
> with
>  python 2 support will implicitly extend the lifetime of beam's python
> 2
>  support. I do not think we need to extend this to beyond 2019. 2
> releases
>  (~ 3 months) after solid python 3 support will very likely put the
> last
>  python 2 supporting release to last quarter of 2019 already.
> >>>
> >>>
> >>> With so many important features still under active development
> (portability, expansion, external IO transforms, schema coders) and new
> versions of executors tied to the Beam source, staying behind is not really
> an option for many of us, and with python3 support not yet fully completed,
> the window in which Beam is fully working for both python versions is
> rapidly approaching 2 months, and could ultimately be even less, depending
> on how long it takes to complete the dozen remaining issues in Jira, and
> whatever pops up thereafter.
> >>>
>  The cost of maintaining Python 2.7 support is higher than 0. Some
> issues
>  that come to mind:
>  - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase 

Re: Plan for dropping python 2 support

2019-09-19 Thread Maximilian Michels
Granted that we just have finalized the Python 3 support, we should 
allow time for it to mature and for users to make the switch.



Oh, and one more thing, I think it'd make sense for Apache Beam to
sign https://python3statement.org/. The promise is that we'd
discontinue Python 2 support *in* 2020, which is not committing us to
January if we're not ready. Worth a vote?


+1

On 19.09.19 15:59, Robert Bradshaw wrote:

Oh, and one more thing, I think it'd make sense for Apache Beam to
sign https://python3statement.org/. The promise is that we'd
discontinue Python 2 support *in* 2020, which is not committing us to
January if we're not ready. Worth a vote?


On Thu, Sep 19, 2019 at 3:58 PM Robert Bradshaw  wrote:


Exactly how long we support Python 2 depends on our users. Other than
those that speak up (such as yourself, thanks!), it's hard to get a
handle on how many need Python 2 and for how long. (Should we send out
a survey? Maybe after some experience with 2.16?)

On the one hand, the whole ecosystem is finally moving on, and even if
Beam continues to support Python 2 our dependencies, or other projects
that are being used in conjunction with Beam, will also be going
Python 3 only. On the other hand, Beam is, admittedly, quite late to
the party and could be the one holding people back, and looking at how
long it took us, if we just barely make it by the end of the year it's
unreasonable to say at that point "oh, and we're dropping 2.7 at the
same time."

The good news is that 2.16 is shaping up to be a release I would
recommend everyone migrate to Python 3 on. The remaining issues are
things like some issues with main sessions (which already has issues
in Python 2) and not supporting keyword-only arguments (a new feature,
not a regression). I would guess that even 2.15 is already good enough
for most people, at least to kick the tires and running tests to start
the effort.

(I also agree with the sentiment that once we go 3.x only, it'll be
likely harder to maintain a 2.x LTS... but the whole LTS thing is
being discussed in another thread.)

On Thu, Sep 19, 2019 at 2:44 PM Chad Dombrova  wrote:


Hi all,
I had a read through this thread in the archives. It occurred before I joined 
the mailing list, so I hope that this email connects up with the thread 
properly for everyone.

I'd like to respond to the following points:


I believe we are referring to two separate things with support:
- Supporting existing releases for patches - I agree that we need to give
users a long enough window to upgrade. Great if it happens with an LTS
release. Even if it does not, I think it will be fair to offer patches on
the last python 2 supporting release during some part of 2020 if that
becomes necessary.
- Making new releases with python 2 support - Each new Beam release with
python 2 support will implicitly extend the lifetime of beam's python 2
support. I do not think we need to extend this to beyond 2019. 2 releases
(~ 3 months) after solid python 3 support will very likely put the last
python 2 supporting release to last quarter of 2019 already.



With so many important features still under active development (portability, 
expansion, external IO transforms, schema coders) and new versions of executors 
tied to the Beam source, staying behind is not really an option for many of us, 
and with python3 support not yet fully completed, the window in which Beam is 
fully working for both python versions is rapidly approaching 2 months, and 
could ultimately be even less, depending on how long it takes to complete the 
dozen remaining issues in Jira, and whatever pops up thereafter.


The cost of maintaining Python 2.7 support is higher than 0. Some issues
that come to mind:
- Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
difficult to use Python 3 syntax in Beam which may be necessary to support
and test syntactic constructs introduced in Python 3.
- Running additional test suites increases the load on test infrastructure
and increases flakiness.



I would argue that the cost of maintaining a python2-only LTS version will be 
far greater than maintaining python2 support for a little while longer.  
Dropping support for python2 could mean a number of things from simply 
disabling the python2 tests, to removing 2-to-3 idioms in favor of python3-only 
constructs.  If what you have in mind is anything like the latter then the 
master branch will become quite divergent from the LTS release, and backporting 
changes will be not be as simple as cherry-picking commits.  All-in-all, I 
think it's a lose/lose for everyone -- users and developers, of which I am both 
-- to drop python2 support on such a short timeline.

I'm an active contributor to this project and it will put me and the company 
that I work for in a very bad position if you force us onto an LTS release in 
early 2020.  I understand the appeal of moving to python3-only code and I want 
to get there too, but I would hope that you give your 

Re: Next LTS?

2019-09-19 Thread Ahmet Altay
I agree with retiring 2.7 as the LTS family. Based on my experience with
users 2.7 does not have a particularly high adoption and as pointed out has
known critical issues. Declaring another LTS pending demand sounds
reasonable but how are we going to gauge this demand?

+Yifan Zou  +Alan Myrvold  on the
tooling question as well. Unless we address the tooling problem it seems
difficult to feasibly maintain LTS versions over time.

On Thu, Sep 19, 2019 at 3:45 PM Austin Bennett 
wrote:

> To be clear, I was picking on - or reminding us of - the promise: I don't
> have a strong personal need/desire (at least currently) for LTS to exist.
> Though, worth ensuring we live up to what we keep on the website.  And,
> without an active LTS, probably something we should take off the site?
>
> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada  wrote:
>
>> +Łukasz Gajowy  had at some point thought of
>> setting up jenkins jobs without coupling them to the state of the repo
>> during the last Seed Job. It may be that that improvement can help test
>> older LTS-type releases?
>>
>> On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw 
>> wrote:
>>
>>> In many ways the 2.7 LTS was trying to flesh out the process. I think
>>> we learned some valuable lessons. It would have been good to push out
>>> something (even if it didn't have everything we wanted) but that is
>>> unlikely to be worth pursuing now (and 2.7 should probably be retired
>>> as LTS and no longer recommended).
>>>
>>> I agree that it does not seem there is strong demand for an LTS at
>>> this point. I would propose that we keep 2.16, etc. as potential
>>> candidates, but only declare one as LTS pending demand. The question
>>> of how to keep our tooling stable (or backwards/forwards compatible)
>>> is a good one, especially as we move to drop Python 2.7 in 2020 (which
>>> could itself be a driver for an LTS).
>>>
>>> On Thu, Sep 19, 2019 at 12:27 PM Kenneth Knowles 
>>> wrote:
>>> >
>>> > Yes, I pretty much dropped 2.7.1 release process due to lack of
>>> interest.
>>> >
>>> > There are known problems so that I cannot recommend anyone to use
>>> 2.7.0, yet 2.7 it is the current LTS family. So my work on 2.7.1 was
>>> philosophical. I did not like the fact that we had a designated LTS family
>>> with no usable releases.
>>> >
>>> > But many backports were proposed to block 2.7.1 and took a very long
>>> time to get contirbutors to implement the backports. I ended up doing many
>>> of them just to move it along. This indicates a lack of interest to me. The
>>> problem is that we cannot really use a strict cut off date as a way to
>>> ensure people do the important things and skip the unimportant things,
>>> because we do know that the issues are critical.
>>> >
>>> > And, yes, the fact that Jenkins jobs are separately evolving but
>>> pretty tightly coupled to the repo contents is a serious problem that I
>>> wish we had fixed. So verification of each PR was manual.
>>> >
>>> > Altogether, I still think LTS is valuable to have as a promise to
>>> users that we will backport critical fixes. I would like to keep that
>>> promise and continue to try. Things that are rapidly changing (which
>>> something always will be) just won't have fixes backported, and that seems
>>> OK.
>>> >
>>> > Kenn
>>> >
>>> > On Thu, Sep 19, 2019 at 10:59 AM Maximilian Michels 
>>> wrote:
>>> >>
>>> >> An LTS only makes sense if we end up patching the LTS, which so far we
>>> >> have never done. There has been work done in backporting fixes, see
>>> >> https://github.com/apache/beam/commits/release-2.7.1 but the effort
>>> was
>>> >> never completed. The main reason I believe were complications with
>>> >> running the evolved release scripts against old Beam versions.
>>> >>
>>> >> Now that the portability layer keeps maturing, it makes me optimistic
>>> >> that we might have a maintained LTS in the future.
>>> >>
>>> >> -Max
>>> >>
>>> >> On 19.09.19 08:40, Ismaël Mejía wrote:
>>> >> > The fact that end users never asked AFAIK in the ML for an LTS and
>>> for
>>> >> > a subsequent minor release of the existing LTS shows IMO the low
>>> >> > interest on having a LTS.
>>> >> >
>>> >> > We still are heavily iterating in many areas (portability/schema)
>>> and
>>> >> > I am not sure users (and in particular users of open source runners)
>>> >> > get a big benefit of relying on an old version. Maybe this is the
>>> >> > moment to reconsider if having a LTS does even make sense given (1)
>>> >> > that our end user facing APIs are 'mostly' stable (even if many
>>> still
>>> >> > called @Experimental). (2) that users get mostly improvements on
>>> >> > runners translation and newer APIs with a low cost just by updating
>>> >> > the version number, and (3) that in case of any regression in an
>>> >> > intermediary release we still can do a minor release even if we have
>>> >> > not yet done so, let's not forget that the only thing we need to do
>>> >> > this is enough interest to do the release from 

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Maximilian Michels

That's even better.

On 19.09.19 16:35, Robert Bradshaw wrote:

On Thu, Sep 19, 2019 at 4:33 PM Maximilian Michels  wrote:



This is obviously less than ideal for the user... Should we "fix" the
Java SDK? Of is the long-terms solution here to have runners do this
rewrite?


I think ideal would be that the Runner adds the Impulse override. That
way also the Python SDK would not have to have separate code paths for
Reads.


Or, rather, that the Runner adds the non-Impuls override (in Java and Python).


On 19.09.19 11:46, Robert Bradshaw wrote:

On Thu, Sep 19, 2019 at 11:22 AM Maximilian Michels  wrote:


The flag is insofar relevant to the PortableRunner because it affects
the translation of the pipeline. Without the flag we will generate
primitive Reads which are unsupported in portability. The workaround we
have used so far is to check for the Runner (e.g. PortableRunner) during
pipeline translation and then add it automatically.

A search in the Java code base reveals 18 occurrences of the flag, all
inside the Dataflow Runner. This is good because the Java SDK itself
does not make use of it. In portable Java pipelines the pipeline author
has to take care to override primitive reads with the JavaReadViaImpulse
wrapper.


This is obviously less than ideal for the user... Should we "fix" the
Java SDK? Of is the long-terms solution here to have runners do this
rewrite?


On the Python side the IO code uses the flag directly to either generate
a primitive Read or a portable Impulse + ParDoReadAdapter.

Would it be conceivable to remove the beam_fn_api flag and introduce a
legacy flag which the Dataflow Runner could then use? With more runners
implementing portability, I believe this would make sense.

Thanks,
Max

On 18.09.19 18:29, Ahmet Altay wrote:

I believe the flag was never relevant for PortableRunner. I might be
wrong as well. The flag affects a few bits in the core code and that is
why the solution cannot be by just setting the flag in Dataflow runner.
It requires some amount of clean up. I agree that it would be good to
clean this up, and I also agree to not rush this especially if this is
not currently impacting users.

Ahmet

On Wed, Sep 18, 2019 at 12:56 PM Maximilian Michels mailto:m...@apache.org>> wrote:

   > I disagree that this flag is obsolete. It is still serving a
  purpose for batch users using dataflow runner and that is decent
  chunk of beam python users.

  It is obsolete for the PortableRunner. If the Dataflow Runner needs
  this
  flag, couldn't we simply add it there? As far as I know Dataflow users
  do not use the PortableRunner. I might be wrong.

  As Kyle mentioned, he already fixed the issue. The fix is only present
  in the 2.16.0 release though. This flag has repeatedly caused friction
  for users and that's why I want to get rid of it.

  There is of course no need to rush this but it would be great to tackle
  this for the next release. Filed a JIRA:
  https://jira.apache.org/jira/browse/BEAM-8274

  Cheers,
  Max

  On 17.09.19 15:39, Kyle Weaver wrote:
   > Actually, the reported issues are already fixed on head. We're just
   > trying to prevent similar issues in the future.
   >
   > Kyle Weaver | Software Engineer | github.com/ibzib
  
   >  | kcwea...@google.com
   >
   >
   >
   > On Tue, Sep 17, 2019 at 3:38 PM Ahmet Altay mailto:al...@google.com>
   > >> wrote:
   >
   >
   >
   > On Tue, Sep 17, 2019 at 2:26 PM Maximilian Michels
  mailto:m...@apache.org>
   > >> wrote:
   >
   >  > Is not this flag set automatically for the portable runner
   >
   > Yes, the flag is set automatically, but it has been broken
   > before and
   > likely will be again. It just adds additional complexity to
   > portable
   > Runners. There is no other portability API then the Fn
  API. This
   > flag
   > historically had its justification, but seems obsolete now.
   >
   >
   > I disagree that this flag is obsolete. It is still serving a
  purpose
   > for batch users using dataflow runner and that is decent chunk of
   > beam python users.
   >
   > I agree with switching the default. I would like to give
  enough time
   > to decouple the flag from the core code. (With a quick search
  I saw
   > two instances related to Read and Create.) Have time to test
  changes
   > and then switch the default.
   >
   >
   > An isinstance check might be smarter, but does not get rid of
   > 

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Robert Bradshaw
On Thu, Sep 19, 2019 at 4:33 PM Maximilian Michels  wrote:
>
> > This is obviously less than ideal for the user... Should we "fix" the
> > Java SDK? Of is the long-terms solution here to have runners do this
> > rewrite?
>
> I think ideal would be that the Runner adds the Impulse override. That
> way also the Python SDK would not have to have separate code paths for
> Reads.

Or, rather, that the Runner adds the non-Impuls override (in Java and Python).

> On 19.09.19 11:46, Robert Bradshaw wrote:
> > On Thu, Sep 19, 2019 at 11:22 AM Maximilian Michels  wrote:
> >>
> >> The flag is insofar relevant to the PortableRunner because it affects
> >> the translation of the pipeline. Without the flag we will generate
> >> primitive Reads which are unsupported in portability. The workaround we
> >> have used so far is to check for the Runner (e.g. PortableRunner) during
> >> pipeline translation and then add it automatically.
> >>
> >> A search in the Java code base reveals 18 occurrences of the flag, all
> >> inside the Dataflow Runner. This is good because the Java SDK itself
> >> does not make use of it. In portable Java pipelines the pipeline author
> >> has to take care to override primitive reads with the JavaReadViaImpulse
> >> wrapper.
> >
> > This is obviously less than ideal for the user... Should we "fix" the
> > Java SDK? Of is the long-terms solution here to have runners do this
> > rewrite?
> >
> >> On the Python side the IO code uses the flag directly to either generate
> >> a primitive Read or a portable Impulse + ParDoReadAdapter.
> >>
> >> Would it be conceivable to remove the beam_fn_api flag and introduce a
> >> legacy flag which the Dataflow Runner could then use? With more runners
> >> implementing portability, I believe this would make sense.
> >>
> >> Thanks,
> >> Max
> >>
> >> On 18.09.19 18:29, Ahmet Altay wrote:
> >>> I believe the flag was never relevant for PortableRunner. I might be
> >>> wrong as well. The flag affects a few bits in the core code and that is
> >>> why the solution cannot be by just setting the flag in Dataflow runner.
> >>> It requires some amount of clean up. I agree that it would be good to
> >>> clean this up, and I also agree to not rush this especially if this is
> >>> not currently impacting users.
> >>>
> >>> Ahmet
> >>>
> >>> On Wed, Sep 18, 2019 at 12:56 PM Maximilian Michels  >>> > wrote:
> >>>
> >>>   > I disagree that this flag is obsolete. It is still serving a
> >>>  purpose for batch users using dataflow runner and that is decent
> >>>  chunk of beam python users.
> >>>
> >>>  It is obsolete for the PortableRunner. If the Dataflow Runner needs
> >>>  this
> >>>  flag, couldn't we simply add it there? As far as I know Dataflow 
> >>> users
> >>>  do not use the PortableRunner. I might be wrong.
> >>>
> >>>  As Kyle mentioned, he already fixed the issue. The fix is only 
> >>> present
> >>>  in the 2.16.0 release though. This flag has repeatedly caused 
> >>> friction
> >>>  for users and that's why I want to get rid of it.
> >>>
> >>>  There is of course no need to rush this but it would be great to 
> >>> tackle
> >>>  this for the next release. Filed a JIRA:
> >>>  https://jira.apache.org/jira/browse/BEAM-8274
> >>>
> >>>  Cheers,
> >>>  Max
> >>>
> >>>  On 17.09.19 15:39, Kyle Weaver wrote:
> >>>   > Actually, the reported issues are already fixed on head. We're 
> >>> just
> >>>   > trying to prevent similar issues in the future.
> >>>   >
> >>>   > Kyle Weaver | Software Engineer | github.com/ibzib
> >>>  
> >>>   >  | kcwea...@google.com
> >>>    >>>  >
> >>>   >
> >>>   >
> >>>   > On Tue, Sep 17, 2019 at 3:38 PM Ahmet Altay  >>>  
> >>>   > >> wrote:
> >>>   >
> >>>   >
> >>>   >
> >>>   > On Tue, Sep 17, 2019 at 2:26 PM Maximilian Michels
> >>>  mailto:m...@apache.org>
> >>>   > >> wrote:
> >>>   >
> >>>   >  > Is not this flag set automatically for the portable 
> >>> runner
> >>>   >
> >>>   > Yes, the flag is set automatically, but it has been broken
> >>>   > before and
> >>>   > likely will be again. It just adds additional complexity 
> >>> to
> >>>   > portable
> >>>   > Runners. There is no other portability API then the Fn
> >>>  API. This
> >>>   > flag
> >>>   > historically had its justification, but seems obsolete 
> >>> now.
> >>>   >
> >>>   >
> >>>   > I disagree that this flag is obsolete. It is still serving a
> >>>  purpose
> >>>   > for batch users using dataflow runner 

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Maximilian Michels

This is obviously less than ideal for the user... Should we "fix" the
Java SDK? Of is the long-terms solution here to have runners do this
rewrite?


I think ideal would be that the Runner adds the Impulse override. That 
way also the Python SDK would not have to have separate code paths for 
Reads.


On 19.09.19 11:46, Robert Bradshaw wrote:

On Thu, Sep 19, 2019 at 11:22 AM Maximilian Michels  wrote:


The flag is insofar relevant to the PortableRunner because it affects
the translation of the pipeline. Without the flag we will generate
primitive Reads which are unsupported in portability. The workaround we
have used so far is to check for the Runner (e.g. PortableRunner) during
pipeline translation and then add it automatically.

A search in the Java code base reveals 18 occurrences of the flag, all
inside the Dataflow Runner. This is good because the Java SDK itself
does not make use of it. In portable Java pipelines the pipeline author
has to take care to override primitive reads with the JavaReadViaImpulse
wrapper.


This is obviously less than ideal for the user... Should we "fix" the
Java SDK? Of is the long-terms solution here to have runners do this
rewrite?


On the Python side the IO code uses the flag directly to either generate
a primitive Read or a portable Impulse + ParDoReadAdapter.

Would it be conceivable to remove the beam_fn_api flag and introduce a
legacy flag which the Dataflow Runner could then use? With more runners
implementing portability, I believe this would make sense.

Thanks,
Max

On 18.09.19 18:29, Ahmet Altay wrote:

I believe the flag was never relevant for PortableRunner. I might be
wrong as well. The flag affects a few bits in the core code and that is
why the solution cannot be by just setting the flag in Dataflow runner.
It requires some amount of clean up. I agree that it would be good to
clean this up, and I also agree to not rush this especially if this is
not currently impacting users.

Ahmet

On Wed, Sep 18, 2019 at 12:56 PM Maximilian Michels mailto:m...@apache.org>> wrote:

  > I disagree that this flag is obsolete. It is still serving a
 purpose for batch users using dataflow runner and that is decent
 chunk of beam python users.

 It is obsolete for the PortableRunner. If the Dataflow Runner needs
 this
 flag, couldn't we simply add it there? As far as I know Dataflow users
 do not use the PortableRunner. I might be wrong.

 As Kyle mentioned, he already fixed the issue. The fix is only present
 in the 2.16.0 release though. This flag has repeatedly caused friction
 for users and that's why I want to get rid of it.

 There is of course no need to rush this but it would be great to tackle
 this for the next release. Filed a JIRA:
 https://jira.apache.org/jira/browse/BEAM-8274

 Cheers,
 Max

 On 17.09.19 15:39, Kyle Weaver wrote:
  > Actually, the reported issues are already fixed on head. We're just
  > trying to prevent similar issues in the future.
  >
  > Kyle Weaver | Software Engineer | github.com/ibzib
 
  >  | kcwea...@google.com
  >
  >
  >
  > On Tue, Sep 17, 2019 at 3:38 PM Ahmet Altay mailto:al...@google.com>
  > >> wrote:
  >
  >
  >
  > On Tue, Sep 17, 2019 at 2:26 PM Maximilian Michels
 mailto:m...@apache.org>
  > >> wrote:
  >
  >  > Is not this flag set automatically for the portable runner
  >
  > Yes, the flag is set automatically, but it has been broken
  > before and
  > likely will be again. It just adds additional complexity to
  > portable
  > Runners. There is no other portability API then the Fn
 API. This
  > flag
  > historically had its justification, but seems obsolete now.
  >
  >
  > I disagree that this flag is obsolete. It is still serving a
 purpose
  > for batch users using dataflow runner and that is decent chunk of
  > beam python users.
  >
  > I agree with switching the default. I would like to give
 enough time
  > to decouple the flag from the core code. (With a quick search
 I saw
  > two instances related to Read and Create.) Have time to test
 changes
  > and then switch the default.
  >
  >
  > An isinstance check might be smarter, but does not get rid of
  > the root
  > of the problem.
  >
  >
  > I might be wrong, IIUC, it will temporarily resolve the reported
  > issues. Is this not accurate?
  >
  >
  > -Max
  >
  > On 17.09.19 14:20, Ahmet Altay 

Re: Plan for dropping python 2 support

2019-09-19 Thread Robert Bradshaw
Oh, and one more thing, I think it'd make sense for Apache Beam to
sign https://python3statement.org/. The promise is that we'd
discontinue Python 2 support *in* 2020, which is not committing us to
January if we're not ready. Worth a vote?


On Thu, Sep 19, 2019 at 3:58 PM Robert Bradshaw  wrote:
>
> Exactly how long we support Python 2 depends on our users. Other than
> those that speak up (such as yourself, thanks!), it's hard to get a
> handle on how many need Python 2 and for how long. (Should we send out
> a survey? Maybe after some experience with 2.16?)
>
> On the one hand, the whole ecosystem is finally moving on, and even if
> Beam continues to support Python 2 our dependencies, or other projects
> that are being used in conjunction with Beam, will also be going
> Python 3 only. On the other hand, Beam is, admittedly, quite late to
> the party and could be the one holding people back, and looking at how
> long it took us, if we just barely make it by the end of the year it's
> unreasonable to say at that point "oh, and we're dropping 2.7 at the
> same time."
>
> The good news is that 2.16 is shaping up to be a release I would
> recommend everyone migrate to Python 3 on. The remaining issues are
> things like some issues with main sessions (which already has issues
> in Python 2) and not supporting keyword-only arguments (a new feature,
> not a regression). I would guess that even 2.15 is already good enough
> for most people, at least to kick the tires and running tests to start
> the effort.
>
> (I also agree with the sentiment that once we go 3.x only, it'll be
> likely harder to maintain a 2.x LTS... but the whole LTS thing is
> being discussed in another thread.)
>
> On Thu, Sep 19, 2019 at 2:44 PM Chad Dombrova  wrote:
> >
> > Hi all,
> > I had a read through this thread in the archives. It occurred before I 
> > joined the mailing list, so I hope that this email connects up with the 
> > thread properly for everyone.
> >
> > I'd like to respond to the following points:
> >
> >> I believe we are referring to two separate things with support:
> >> - Supporting existing releases for patches - I agree that we need to give
> >> users a long enough window to upgrade. Great if it happens with an LTS
> >> release. Even if it does not, I think it will be fair to offer patches on
> >> the last python 2 supporting release during some part of 2020 if that
> >> becomes necessary.
> >> - Making new releases with python 2 support - Each new Beam release with
> >> python 2 support will implicitly extend the lifetime of beam's python 2
> >> support. I do not think we need to extend this to beyond 2019. 2 releases
> >> (~ 3 months) after solid python 3 support will very likely put the last
> >> python 2 supporting release to last quarter of 2019 already.
> >
> >
> > With so many important features still under active development 
> > (portability, expansion, external IO transforms, schema coders) and new 
> > versions of executors tied to the Beam source, staying behind is not really 
> > an option for many of us, and with python3 support not yet fully completed, 
> > the window in which Beam is fully working for both python versions is 
> > rapidly approaching 2 months, and could ultimately be even less, depending 
> > on how long it takes to complete the dozen remaining issues in Jira, and 
> > whatever pops up thereafter.
> >
> >> The cost of maintaining Python 2.7 support is higher than 0. Some issues
> >> that come to mind:
> >> - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
> >> difficult to use Python 3 syntax in Beam which may be necessary to support
> >> and test syntactic constructs introduced in Python 3.
> >> - Running additional test suites increases the load on test infrastructure
> >> and increases flakiness.
> >
> >
> > I would argue that the cost of maintaining a python2-only LTS version will 
> > be far greater than maintaining python2 support for a little while longer.  
> > Dropping support for python2 could mean a number of things from simply 
> > disabling the python2 tests, to removing 2-to-3 idioms in favor of 
> > python3-only constructs.  If what you have in mind is anything like the 
> > latter then the master branch will become quite divergent from the LTS 
> > release, and backporting changes will be not be as simple as cherry-picking 
> > commits.  All-in-all, I think it's a lose/lose for everyone -- users and 
> > developers, of which I am both -- to drop python2 support on such a short 
> > timeline.
> >
> > I'm an active contributor to this project and it will put me and the 
> > company that I work for in a very bad position if you force us onto an LTS 
> > release in early 2020.  I understand the appeal of moving to python3-only 
> > code and I want to get there too, but I would hope that you give your users 
> > are much time to transition their own code as the Beam project itself has 
> > taken.  I'm not asking for a full 12 months to transition, but 

Re: Plan for dropping python 2 support

2019-09-19 Thread Robert Bradshaw
Exactly how long we support Python 2 depends on our users. Other than
those that speak up (such as yourself, thanks!), it's hard to get a
handle on how many need Python 2 and for how long. (Should we send out
a survey? Maybe after some experience with 2.16?)

On the one hand, the whole ecosystem is finally moving on, and even if
Beam continues to support Python 2 our dependencies, or other projects
that are being used in conjunction with Beam, will also be going
Python 3 only. On the other hand, Beam is, admittedly, quite late to
the party and could be the one holding people back, and looking at how
long it took us, if we just barely make it by the end of the year it's
unreasonable to say at that point "oh, and we're dropping 2.7 at the
same time."

The good news is that 2.16 is shaping up to be a release I would
recommend everyone migrate to Python 3 on. The remaining issues are
things like some issues with main sessions (which already has issues
in Python 2) and not supporting keyword-only arguments (a new feature,
not a regression). I would guess that even 2.15 is already good enough
for most people, at least to kick the tires and running tests to start
the effort.

(I also agree with the sentiment that once we go 3.x only, it'll be
likely harder to maintain a 2.x LTS... but the whole LTS thing is
being discussed in another thread.)

On Thu, Sep 19, 2019 at 2:44 PM Chad Dombrova  wrote:
>
> Hi all,
> I had a read through this thread in the archives. It occurred before I joined 
> the mailing list, so I hope that this email connects up with the thread 
> properly for everyone.
>
> I'd like to respond to the following points:
>
>> I believe we are referring to two separate things with support:
>> - Supporting existing releases for patches - I agree that we need to give
>> users a long enough window to upgrade. Great if it happens with an LTS
>> release. Even if it does not, I think it will be fair to offer patches on
>> the last python 2 supporting release during some part of 2020 if that
>> becomes necessary.
>> - Making new releases with python 2 support - Each new Beam release with
>> python 2 support will implicitly extend the lifetime of beam's python 2
>> support. I do not think we need to extend this to beyond 2019. 2 releases
>> (~ 3 months) after solid python 3 support will very likely put the last
>> python 2 supporting release to last quarter of 2019 already.
>
>
> With so many important features still under active development (portability, 
> expansion, external IO transforms, schema coders) and new versions of 
> executors tied to the Beam source, staying behind is not really an option for 
> many of us, and with python3 support not yet fully completed, the window in 
> which Beam is fully working for both python versions is rapidly approaching 2 
> months, and could ultimately be even less, depending on how long it takes to 
> complete the dozen remaining issues in Jira, and whatever pops up thereafter.
>
>> The cost of maintaining Python 2.7 support is higher than 0. Some issues
>> that come to mind:
>> - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
>> difficult to use Python 3 syntax in Beam which may be necessary to support
>> and test syntactic constructs introduced in Python 3.
>> - Running additional test suites increases the load on test infrastructure
>> and increases flakiness.
>
>
> I would argue that the cost of maintaining a python2-only LTS version will be 
> far greater than maintaining python2 support for a little while longer.  
> Dropping support for python2 could mean a number of things from simply 
> disabling the python2 tests, to removing 2-to-3 idioms in favor of 
> python3-only constructs.  If what you have in mind is anything like the 
> latter then the master branch will become quite divergent from the LTS 
> release, and backporting changes will be not be as simple as cherry-picking 
> commits.  All-in-all, I think it's a lose/lose for everyone -- users and 
> developers, of which I am both -- to drop python2 support on such a short 
> timeline.
>
> I'm an active contributor to this project and it will put me and the company 
> that I work for in a very bad position if you force us onto an LTS release in 
> early 2020.  I understand the appeal of moving to python3-only code and I 
> want to get there too, but I would hope that you give your users are much 
> time to transition their own code as the Beam project itself has taken.  I'm 
> not asking for a full 12 months to transition, but more than a couple will be 
> required.
>
> thanks,
> -chad
>
>
>
>


Re: Next LTS?

2019-09-19 Thread Austin Bennett
To be clear, I was picking on - or reminding us of - the promise: I don't
have a strong personal need/desire (at least currently) for LTS to exist.
Though, worth ensuring we live up to what we keep on the website.  And,
without an active LTS, probably something we should take off the site?

On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada  wrote:

> +Łukasz Gajowy  had at some point thought of
> setting up jenkins jobs without coupling them to the state of the repo
> during the last Seed Job. It may be that that improvement can help test
> older LTS-type releases?
>
> On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw 
> wrote:
>
>> In many ways the 2.7 LTS was trying to flesh out the process. I think
>> we learned some valuable lessons. It would have been good to push out
>> something (even if it didn't have everything we wanted) but that is
>> unlikely to be worth pursuing now (and 2.7 should probably be retired
>> as LTS and no longer recommended).
>>
>> I agree that it does not seem there is strong demand for an LTS at
>> this point. I would propose that we keep 2.16, etc. as potential
>> candidates, but only declare one as LTS pending demand. The question
>> of how to keep our tooling stable (or backwards/forwards compatible)
>> is a good one, especially as we move to drop Python 2.7 in 2020 (which
>> could itself be a driver for an LTS).
>>
>> On Thu, Sep 19, 2019 at 12:27 PM Kenneth Knowles  wrote:
>> >
>> > Yes, I pretty much dropped 2.7.1 release process due to lack of
>> interest.
>> >
>> > There are known problems so that I cannot recommend anyone to use
>> 2.7.0, yet 2.7 it is the current LTS family. So my work on 2.7.1 was
>> philosophical. I did not like the fact that we had a designated LTS family
>> with no usable releases.
>> >
>> > But many backports were proposed to block 2.7.1 and took a very long
>> time to get contirbutors to implement the backports. I ended up doing many
>> of them just to move it along. This indicates a lack of interest to me. The
>> problem is that we cannot really use a strict cut off date as a way to
>> ensure people do the important things and skip the unimportant things,
>> because we do know that the issues are critical.
>> >
>> > And, yes, the fact that Jenkins jobs are separately evolving but pretty
>> tightly coupled to the repo contents is a serious problem that I wish we
>> had fixed. So verification of each PR was manual.
>> >
>> > Altogether, I still think LTS is valuable to have as a promise to users
>> that we will backport critical fixes. I would like to keep that promise and
>> continue to try. Things that are rapidly changing (which something always
>> will be) just won't have fixes backported, and that seems OK.
>> >
>> > Kenn
>> >
>> > On Thu, Sep 19, 2019 at 10:59 AM Maximilian Michels 
>> wrote:
>> >>
>> >> An LTS only makes sense if we end up patching the LTS, which so far we
>> >> have never done. There has been work done in backporting fixes, see
>> >> https://github.com/apache/beam/commits/release-2.7.1 but the effort
>> was
>> >> never completed. The main reason I believe were complications with
>> >> running the evolved release scripts against old Beam versions.
>> >>
>> >> Now that the portability layer keeps maturing, it makes me optimistic
>> >> that we might have a maintained LTS in the future.
>> >>
>> >> -Max
>> >>
>> >> On 19.09.19 08:40, Ismaël Mejía wrote:
>> >> > The fact that end users never asked AFAIK in the ML for an LTS and
>> for
>> >> > a subsequent minor release of the existing LTS shows IMO the low
>> >> > interest on having a LTS.
>> >> >
>> >> > We still are heavily iterating in many areas (portability/schema) and
>> >> > I am not sure users (and in particular users of open source runners)
>> >> > get a big benefit of relying on an old version. Maybe this is the
>> >> > moment to reconsider if having a LTS does even make sense given (1)
>> >> > that our end user facing APIs are 'mostly' stable (even if many still
>> >> > called @Experimental). (2) that users get mostly improvements on
>> >> > runners translation and newer APIs with a low cost just by updating
>> >> > the version number, and (3) that in case of any regression in an
>> >> > intermediary release we still can do a minor release even if we have
>> >> > not yet done so, let's not forget that the only thing we need to do
>> >> > this is enough interest to do the release from the maintainers.
>> >> >
>> >> >
>> >> > On Tue, Sep 17, 2019 at 12:00 AM Valentyn Tymofieiev
>> >> >  wrote:
>> >> >>
>> >> >> I support nominating 2.16.0 as LTS release since in has robust
>> Python 3 support compared with prior releases, and also for reasons of
>> pending Python 2 deprecation. This has been discussed before [1]. As Robert
>> pointed out in that thread, LTS nomination in Beam is currently
>> retroactive. If we keep the retroactive policy, the question is how long we
>> should wait for a release to be considered "safe" for nomination.  Looks
>> like in case of 2.7.0 we 

Re: When will Beam drop support for python2?

2019-09-19 Thread Chad Dombrova
thanks, email sent!  let me know if it got screwed up. I don't have a Pony
login, so I used the "reply via mail client" button.

-chad


On Thu, Sep 19, 2019 at 1:44 PM Valentyn Tymofieiev 
wrote:

> There was a previous thread on this topic[1]. Should we continue this
> conversation there?
>
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
>
> On Thu, Sep 19, 2019 at 12:55 PM Chad Dombrova  wrote:
>
>> Hi all,
>> I saw it mentioned on another thread that Beam will drop python2 support
>> by the end of the year, and I'd like to voice my concern over this
>> timeline.  As far as I can tell, Beam's support for python3 is brand new,
>> and based on the master Jira ticket on this topic [1], there are still at
>> least a dozen *known* issues remaining to be resolved.  If we assume it
>> takes another month to resolve all of those, and python2 support is dropped
>> at the end of the year, that leaves a window of barely over 2 months where
>> Beam is fully working for both python versions.  I think that will be an
>> uncomfortably short window for some users to transition their production
>> pipelines to Beam on python3, my company included.  Of course, users can
>> choose to stay on older versions, but with so many important features still
>> under active development (portability, expansion, external IO transforms,
>> schema coders) and new versions of executors tied to the Beam source,
>> staying behind is not really an option for many of us.
>>
>> So I'm hoping we could extend support for python2 for a bit longer, if
>> possible.
>>
>> I'm curious who is using Beam on python3 in production, and for which
>> runners?
>>
>> thanks,
>> -chad
>>
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-1251
>>
>>


Re: Plan for dropping python 2 support

2019-09-19 Thread Chad Dombrova
Hi all,
I had a read through this thread in the archives. It occurred before I
joined the mailing list, so I hope that this email connects up with the
thread properly for everyone.

I'd like to respond to the following points:

I believe we are referring to two separate things with support:
> - Supporting existing releases for patches - I agree that we need to give
> users a long enough window to upgrade. Great if it happens with an LTS
> release. Even if it does not, I think it will be fair to offer patches on
> the last python 2 supporting release during some part of 2020 if that
> becomes necessary.
> - Making new releases with python 2 support - Each new Beam release with
> python 2 support will implicitly extend the lifetime of beam's python 2
> support. I do not think we need to extend this to beyond 2019. 2 releases
> (~ 3 months) after solid python 3 support will very likely put the last
> python 2 supporting release to last quarter of 2019 already.


With so many important features still under active development
(portability, expansion, external IO transforms, schema coders) and new
versions of executors tied to the Beam source, staying behind is not really
an option for many of us, and with python3 support not yet fully completed,
the window in which Beam is fully working for both python versions is
rapidly approaching 2 months, and could ultimately be even less, depending
on how long it takes to complete the dozen remaining issues in Jira, and
whatever pops up thereafter.

The cost of maintaining Python 2.7 support is higher than 0. Some issues
> that come to mind:
> - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
> difficult to use Python 3 syntax in Beam which may be necessary to support
> and test syntactic constructs introduced in Python 3.
> - Running additional test suites increases the load on test infrastructure
> and increases flakiness.


I would argue that the cost of maintaining a python2-only LTS version will
be far greater than maintaining python2 support for a little while longer.
Dropping support for python2 could mean a number of things from simply
disabling the python2 tests, to removing 2-to-3 idioms in favor of
python3-only constructs.  If what you have in mind is anything like the
latter then the master branch will become quite divergent from the LTS
release, and backporting changes will be not be as simple as cherry-picking
commits.  All-in-all, I think it's a lose/lose for everyone -- users and
developers, of which I am both -- to drop python2 support on such a short
timeline.

I'm an active contributor to this project and it will put me and the
company that I work for in a very bad position if you force us onto an LTS
release in early 2020.  I understand the appeal of moving to python3-only
code and I want to get there too, but I would hope that you give your users
are much time to transition their own code as the Beam project itself has
taken.  I'm not asking for a full 12 months to transition, but more than a
couple will be required.

thanks,
-chad


Re: When will Beam drop support for python2?

2019-09-19 Thread Valentyn Tymofieiev
There was a previous thread on this topic[1]. Should we continue this
conversation there?
https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E

On Thu, Sep 19, 2019 at 12:55 PM Chad Dombrova  wrote:

> Hi all,
> I saw it mentioned on another thread that Beam will drop python2 support
> by the end of the year, and I'd like to voice my concern over this
> timeline.  As far as I can tell, Beam's support for python3 is brand new,
> and based on the master Jira ticket on this topic [1], there are still at
> least a dozen *known* issues remaining to be resolved.  If we assume it
> takes another month to resolve all of those, and python2 support is dropped
> at the end of the year, that leaves a window of barely over 2 months where
> Beam is fully working for both python versions.  I think that will be an
> uncomfortably short window for some users to transition their production
> pipelines to Beam on python3, my company included.  Of course, users can
> choose to stay on older versions, but with so many important features still
> under active development (portability, expansion, external IO transforms,
> schema coders) and new versions of executors tied to the Beam source,
> staying behind is not really an option for many of us.
>
> So I'm hoping we could extend support for python2 for a bit longer, if
> possible.
>
> I'm curious who is using Beam on python3 in production, and for which
> runners?
>
> thanks,
> -chad
>
>
> [1] https://issues.apache.org/jira/browse/BEAM-1251
>
>


Re: Next LTS?

2019-09-19 Thread Pablo Estrada
+Łukasz Gajowy  had at some point thought of
setting up jenkins jobs without coupling them to the state of the repo
during the last Seed Job. It may be that that improvement can help test
older LTS-type releases?

On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw  wrote:

> In many ways the 2.7 LTS was trying to flesh out the process. I think
> we learned some valuable lessons. It would have been good to push out
> something (even if it didn't have everything we wanted) but that is
> unlikely to be worth pursuing now (and 2.7 should probably be retired
> as LTS and no longer recommended).
>
> I agree that it does not seem there is strong demand for an LTS at
> this point. I would propose that we keep 2.16, etc. as potential
> candidates, but only declare one as LTS pending demand. The question
> of how to keep our tooling stable (or backwards/forwards compatible)
> is a good one, especially as we move to drop Python 2.7 in 2020 (which
> could itself be a driver for an LTS).
>
> On Thu, Sep 19, 2019 at 12:27 PM Kenneth Knowles  wrote:
> >
> > Yes, I pretty much dropped 2.7.1 release process due to lack of interest.
> >
> > There are known problems so that I cannot recommend anyone to use 2.7.0,
> yet 2.7 it is the current LTS family. So my work on 2.7.1 was
> philosophical. I did not like the fact that we had a designated LTS family
> with no usable releases.
> >
> > But many backports were proposed to block 2.7.1 and took a very long
> time to get contirbutors to implement the backports. I ended up doing many
> of them just to move it along. This indicates a lack of interest to me. The
> problem is that we cannot really use a strict cut off date as a way to
> ensure people do the important things and skip the unimportant things,
> because we do know that the issues are critical.
> >
> > And, yes, the fact that Jenkins jobs are separately evolving but pretty
> tightly coupled to the repo contents is a serious problem that I wish we
> had fixed. So verification of each PR was manual.
> >
> > Altogether, I still think LTS is valuable to have as a promise to users
> that we will backport critical fixes. I would like to keep that promise and
> continue to try. Things that are rapidly changing (which something always
> will be) just won't have fixes backported, and that seems OK.
> >
> > Kenn
> >
> > On Thu, Sep 19, 2019 at 10:59 AM Maximilian Michels 
> wrote:
> >>
> >> An LTS only makes sense if we end up patching the LTS, which so far we
> >> have never done. There has been work done in backporting fixes, see
> >> https://github.com/apache/beam/commits/release-2.7.1 but the effort was
> >> never completed. The main reason I believe were complications with
> >> running the evolved release scripts against old Beam versions.
> >>
> >> Now that the portability layer keeps maturing, it makes me optimistic
> >> that we might have a maintained LTS in the future.
> >>
> >> -Max
> >>
> >> On 19.09.19 08:40, Ismaël Mejía wrote:
> >> > The fact that end users never asked AFAIK in the ML for an LTS and for
> >> > a subsequent minor release of the existing LTS shows IMO the low
> >> > interest on having a LTS.
> >> >
> >> > We still are heavily iterating in many areas (portability/schema) and
> >> > I am not sure users (and in particular users of open source runners)
> >> > get a big benefit of relying on an old version. Maybe this is the
> >> > moment to reconsider if having a LTS does even make sense given (1)
> >> > that our end user facing APIs are 'mostly' stable (even if many still
> >> > called @Experimental). (2) that users get mostly improvements on
> >> > runners translation and newer APIs with a low cost just by updating
> >> > the version number, and (3) that in case of any regression in an
> >> > intermediary release we still can do a minor release even if we have
> >> > not yet done so, let's not forget that the only thing we need to do
> >> > this is enough interest to do the release from the maintainers.
> >> >
> >> >
> >> > On Tue, Sep 17, 2019 at 12:00 AM Valentyn Tymofieiev
> >> >  wrote:
> >> >>
> >> >> I support nominating 2.16.0 as LTS release since in has robust
> Python 3 support compared with prior releases, and also for reasons of
> pending Python 2 deprecation. This has been discussed before [1]. As Robert
> pointed out in that thread, LTS nomination in Beam is currently
> retroactive. If we keep the retroactive policy, the question is how long we
> should wait for a release to be considered "safe" for nomination.  Looks
> like in case of 2.7.0 we waited a month, see [2,3].
> >> >>
> >> >> Thanks,
> >> >> Valentyn
> >> >>
> >> >> [1]
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> >> >> [2] https://beam.apache.org/blog/2018/10/03/beam-2.7.0.html
> >> >> [3]
> https://lists.apache.org/thread.html/896cbc9fef2e60f19b466d6b1e12ce1aeda49ce5065a0b1156233f01@%3Cdev.beam.apache.org%3E
> >> >>
> >> >> On Mon, Sep 16, 2019 at 2:46 

When will Beam drop support for python2?

2019-09-19 Thread Chad Dombrova
Hi all,
I saw it mentioned on another thread that Beam will drop python2 support by
the end of the year, and I'd like to voice my concern over this timeline.
As far as I can tell, Beam's support for python3 is brand new, and based on
the master Jira ticket on this topic [1], there are still at least a dozen
*known* issues remaining to be resolved.  If we assume it takes another
month to resolve all of those, and python2 support is dropped at the end of
the year, that leaves a window of barely over 2 months where Beam is fully
working for both python versions.  I think that will be an uncomfortably
short window for some users to transition their production pipelines to
Beam on python3, my company included.  Of course, users can choose to stay
on older versions, but with so many important features still under active
development (portability, expansion, external IO transforms, schema coders)
and new versions of executors tied to the Beam source, staying behind is
not really an option for many of us.

So I'm hoping we could extend support for python2 for a bit longer, if
possible.

I'm curious who is using Beam on python3 in production, and for which
runners?

thanks,
-chad


[1] https://issues.apache.org/jira/browse/BEAM-1251


Re: Next LTS?

2019-09-19 Thread Kenneth Knowles
Yes, I pretty much dropped 2.7.1 release process due to lack of interest.

There are known problems so that I cannot recommend anyone to use 2.7.0,
yet 2.7 it is the current LTS family. So my work on 2.7.1 was
philosophical. I did not like the fact that we had a designated LTS family
with no usable releases.

But many backports were proposed to block 2.7.1 and took a very long time
to get contirbutors to implement the backports. I ended up doing many of
them just to move it along. This indicates a lack of interest to me. The
problem is that we cannot really use a strict cut off date as a way to
ensure people do the important things and skip the unimportant things,
because we do know that the issues are critical.

And, yes, the fact that Jenkins jobs are separately evolving but pretty
tightly coupled to the repo contents is a serious problem that I wish we
had fixed. So verification of each PR was manual.

Altogether, I still think LTS is valuable to have as a promise to users
that we will backport critical fixes. I would like to keep that promise and
continue to try. Things that are rapidly changing (which something always
will be) just won't have fixes backported, and that seems OK.

Kenn

On Thu, Sep 19, 2019 at 10:59 AM Maximilian Michels  wrote:

> An LTS only makes sense if we end up patching the LTS, which so far we
> have never done. There has been work done in backporting fixes, see
> https://github.com/apache/beam/commits/release-2.7.1 but the effort was
> never completed. The main reason I believe were complications with
> running the evolved release scripts against old Beam versions.
>
> Now that the portability layer keeps maturing, it makes me optimistic
> that we might have a maintained LTS in the future.
>
> -Max
>
> On 19.09.19 08:40, Ismaël Mejía wrote:
> > The fact that end users never asked AFAIK in the ML for an LTS and for
> > a subsequent minor release of the existing LTS shows IMO the low
> > interest on having a LTS.
> >
> > We still are heavily iterating in many areas (portability/schema) and
> > I am not sure users (and in particular users of open source runners)
> > get a big benefit of relying on an old version. Maybe this is the
> > moment to reconsider if having a LTS does even make sense given (1)
> > that our end user facing APIs are 'mostly' stable (even if many still
> > called @Experimental). (2) that users get mostly improvements on
> > runners translation and newer APIs with a low cost just by updating
> > the version number, and (3) that in case of any regression in an
> > intermediary release we still can do a minor release even if we have
> > not yet done so, let's not forget that the only thing we need to do
> > this is enough interest to do the release from the maintainers.
> >
> >
> > On Tue, Sep 17, 2019 at 12:00 AM Valentyn Tymofieiev
> >  wrote:
> >>
> >> I support nominating 2.16.0 as LTS release since in has robust Python 3
> support compared with prior releases, and also for reasons of pending
> Python 2 deprecation. This has been discussed before [1]. As Robert pointed
> out in that thread, LTS nomination in Beam is currently retroactive. If we
> keep the retroactive policy, the question is how long we should wait for a
> release to be considered "safe" for nomination.  Looks like in case of
> 2.7.0 we waited a month, see [2,3].
> >>
> >> Thanks,
> >> Valentyn
> >>
> >> [1]
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> >> [2] https://beam.apache.org/blog/2018/10/03/beam-2.7.0.html
> >> [3]
> https://lists.apache.org/thread.html/896cbc9fef2e60f19b466d6b1e12ce1aeda49ce5065a0b1156233f01@%3Cdev.beam.apache.org%3E
> >>
> >> On Mon, Sep 16, 2019 at 2:46 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> According to our policies page [1]: "There will be at least one new
> LTS release in a 12 month period, and LTS releases are considered
> deprecated after 12 months"
> >>>
> >>> The last LTS was released 2018-10-02 [2].
> >>>
> >>> Does that mean the next release (2.16) should be the next LTS?  It
> looks like we are in danger of not living up to that promise.
> >>>
> >>> Cheers,
> >>> Austin
> >>>
> >>>
> >>>
> >>> [1] https://beam.apache.org/community/policies/
> >>>
> >>> [2]  https://beam.apache.org/get-started/downloads/
>


Jenkins queue times steadily increasing for a few months now

2019-09-19 Thread Daniel Oliveira
Hi everyone,

A little while ago I was taking a look at the Precommit Latency metrics on
Grafana (link
)
and saw that the monthly 90th percentile metric has been really increasing
the past few months, from around 10 minutes to currently around 30 minutes.

After doing some light digging I was shown this page (beam load statistics
) which
seems to imply that queue times are shooting up when all the test executors
are occupied, and it seems this is happening longer and more often
recently. I also took a look at the commit history for our Jenkins tests

and
I see that new tests have steadily been added.

I wanted to bring this up with the dev@ to ask:

1. Is this accurate? Can anyone provide insight into the metrics? Does
anyone know how to double check my assumptions with more concrete metrics?

2. Does anyone have ideas on how to address this?

Thanks,
Daniel Oliveira


Re: Prevent Shuffling on Writing Files

2019-09-19 Thread Shannon Duncan
As a follow up the pricing as the number of bytes written + read to the
shuffle is confirmed.

However we were able to figure out a way to lower shuffle costs and things
are right in the world again.

Thanks ya'll!
Shannon

On Wed, Sep 18, 2019 at 4:52 PM Reuven Lax  wrote:

> I believe that the Total shuffle data process counter counts the number of
> bytes written to shuffle + the number of bytes read. So if you shuffle 1GB
> of data, you should expect to see 2GB on the counter.
>
> On Wed, Sep 18, 2019 at 2:39 PM Shannon Duncan 
> wrote:
>
>> Ok just ran the job on a small input and did not specify numShards. so
>> it's literally just:
>>
>> .apply("WriteLines", TextIO.write().to(options.getOutput()));
>>
>> Output of map for join:
>> [image: image.png]
>>
>> Details of Shuffle:
>> [image: image.png]
>>
>> Reported Bytes Shuffled:
>> [image: image.png]
>>
>>
>> On Wed, Sep 18, 2019 at 4:24 PM Reuven Lax  wrote:
>>
>>>
>>>
>>> On Wed, Sep 18, 2019 at 2:12 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 I will attempt to do without sharding (though I believe we did do a run
 without shards and it incurred the extra shuffle costs).

>>>
>>> It shouldn't. There will be a shuffle, but that shuffle should contain a
>>> small amount of data (essentially a list of filenames).
>>>

 Pipeline is simple.

 The only shuffle that is explicitly defined is the shuffle after
 merging files together into a single PCollection (Flatten Transform).

 So it's a Read > Flatten > Shuffle > Map (Format) > Write. We expected
 to pay for shuffles on the middle shuffle but were surprised to see that
 the output data from the Flatten was quadrupled in the reflected shuffled
 GB shown in Dataflow. Which lead me down this path of finding things.

 [image: image.png]

 On Wed, Sep 18, 2019 at 4:08 PM Reuven Lax  wrote:

> In that case you should be able to leave sharding unspecified, and you
> won't incur the extra shuffle. Specifying explicit sharding is generally
> necessary only for streaming.
>
> On Wed, Sep 18, 2019 at 2:06 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> batch on dataflowRunner.
>>
>> On Wed, Sep 18, 2019 at 4:05 PM Reuven Lax  wrote:
>>
>>> Are you using streaming or batch? Also which runner are you using?
>>>
>>> On Wed, Sep 18, 2019 at 1:57 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 So I followed up on why TextIO shuffles and dug into the code some.
 It is using the shards and getting all the values into a keyed group to
 write to a single file.

 However... I wonder if there is way to just take the records that
 are on a worker and write them out. Thus not needing a shard number and
 doing this. Closer to how hadoop handle's writes.

 Maybe just a regular pardo and on bundleSetup it creates a writer
 and processElement reuses that writter to write to the same file for 
 all
 elements within a bundle?

 I feel like this goes beyond scope of simple user mailing list so
 I'm expanding it to dev as well.
 +dev 

 Finding a solution that prevents quadrupling shuffle costs when
 simply writing out a file is a necessity for large scale jobs that work
 with 100+ TB of data. If anyone has any ideas I'd love to hear them.

 Thanks,
 Shannon Duncan

 On Wed, Sep 18, 2019 at 1:06 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> We have been using Beam for a bit now. However we just turned on
> the dataflow shuffle service and were very surprised that the 
> shuffled data
> amounts were quadruple the amounts we expected.
>
> Turns out that the file writing TextIO is doing shuffles within
> itself.
>
> Is there a way to prevent shuffling in the writing phase?
>
> Thanks,
> Shannon Duncan
>



Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Robert Bradshaw
On Thu, Sep 19, 2019 at 11:22 AM Maximilian Michels  wrote:
>
> The flag is insofar relevant to the PortableRunner because it affects
> the translation of the pipeline. Without the flag we will generate
> primitive Reads which are unsupported in portability. The workaround we
> have used so far is to check for the Runner (e.g. PortableRunner) during
> pipeline translation and then add it automatically.
>
> A search in the Java code base reveals 18 occurrences of the flag, all
> inside the Dataflow Runner. This is good because the Java SDK itself
> does not make use of it. In portable Java pipelines the pipeline author
> has to take care to override primitive reads with the JavaReadViaImpulse
> wrapper.

This is obviously less than ideal for the user... Should we "fix" the
Java SDK? Of is the long-terms solution here to have runners do this
rewrite?

> On the Python side the IO code uses the flag directly to either generate
> a primitive Read or a portable Impulse + ParDoReadAdapter.
>
> Would it be conceivable to remove the beam_fn_api flag and introduce a
> legacy flag which the Dataflow Runner could then use? With more runners
> implementing portability, I believe this would make sense.
>
> Thanks,
> Max
>
> On 18.09.19 18:29, Ahmet Altay wrote:
> > I believe the flag was never relevant for PortableRunner. I might be
> > wrong as well. The flag affects a few bits in the core code and that is
> > why the solution cannot be by just setting the flag in Dataflow runner.
> > It requires some amount of clean up. I agree that it would be good to
> > clean this up, and I also agree to not rush this especially if this is
> > not currently impacting users.
> >
> > Ahmet
> >
> > On Wed, Sep 18, 2019 at 12:56 PM Maximilian Michels  > > wrote:
> >
> >  > I disagree that this flag is obsolete. It is still serving a
> > purpose for batch users using dataflow runner and that is decent
> > chunk of beam python users.
> >
> > It is obsolete for the PortableRunner. If the Dataflow Runner needs
> > this
> > flag, couldn't we simply add it there? As far as I know Dataflow users
> > do not use the PortableRunner. I might be wrong.
> >
> > As Kyle mentioned, he already fixed the issue. The fix is only present
> > in the 2.16.0 release though. This flag has repeatedly caused friction
> > for users and that's why I want to get rid of it.
> >
> > There is of course no need to rush this but it would be great to tackle
> > this for the next release. Filed a JIRA:
> > https://jira.apache.org/jira/browse/BEAM-8274
> >
> > Cheers,
> > Max
> >
> > On 17.09.19 15:39, Kyle Weaver wrote:
> >  > Actually, the reported issues are already fixed on head. We're just
> >  > trying to prevent similar issues in the future.
> >  >
> >  > Kyle Weaver | Software Engineer | github.com/ibzib
> > 
> >  >  | kcwea...@google.com
> >   > >
> >  >
> >  >
> >  > On Tue, Sep 17, 2019 at 3:38 PM Ahmet Altay  > 
> >  > >> wrote:
> >  >
> >  >
> >  >
> >  > On Tue, Sep 17, 2019 at 2:26 PM Maximilian Michels
> > mailto:m...@apache.org>
> >  > >> wrote:
> >  >
> >  >  > Is not this flag set automatically for the portable runner
> >  >
> >  > Yes, the flag is set automatically, but it has been broken
> >  > before and
> >  > likely will be again. It just adds additional complexity to
> >  > portable
> >  > Runners. There is no other portability API then the Fn
> > API. This
> >  > flag
> >  > historically had its justification, but seems obsolete now.
> >  >
> >  >
> >  > I disagree that this flag is obsolete. It is still serving a
> > purpose
> >  > for batch users using dataflow runner and that is decent chunk of
> >  > beam python users.
> >  >
> >  > I agree with switching the default. I would like to give
> > enough time
> >  > to decouple the flag from the core code. (With a quick search
> > I saw
> >  > two instances related to Read and Create.) Have time to test
> > changes
> >  > and then switch the default.
> >  >
> >  >
> >  > An isinstance check might be smarter, but does not get rid of
> >  > the root
> >  > of the problem.
> >  >
> >  >
> >  > I might be wrong, IIUC, it will temporarily resolve the reported
> >  > issues. Is this not accurate?
> >  >
> >  >
> >  > -Max
> >  >
> >  > On 17.09.19 14:20, Ahmet Altay wrote:
> >  >

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Maximilian Michels
The flag is insofar relevant to the PortableRunner because it affects 
the translation of the pipeline. Without the flag we will generate 
primitive Reads which are unsupported in portability. The workaround we 
have used so far is to check for the Runner (e.g. PortableRunner) during 
pipeline translation and then add it automatically.


A search in the Java code base reveals 18 occurrences of the flag, all 
inside the Dataflow Runner. This is good because the Java SDK itself 
does not make use of it. In portable Java pipelines the pipeline author 
has to take care to override primitive reads with the JavaReadViaImpulse 
wrapper.


On the Python side the IO code uses the flag directly to either generate 
a primitive Read or a portable Impulse + ParDoReadAdapter.


Would it be conceivable to remove the beam_fn_api flag and introduce a 
legacy flag which the Dataflow Runner could then use? With more runners 
implementing portability, I believe this would make sense.


Thanks,
Max

On 18.09.19 18:29, Ahmet Altay wrote:
I believe the flag was never relevant for PortableRunner. I might be 
wrong as well. The flag affects a few bits in the core code and that is 
why the solution cannot be by just setting the flag in Dataflow runner. 
It requires some amount of clean up. I agree that it would be good to 
clean this up, and I also agree to not rush this especially if this is 
not currently impacting users.


Ahmet

On Wed, Sep 18, 2019 at 12:56 PM Maximilian Michels > wrote:


 > I disagree that this flag is obsolete. It is still serving a
purpose for batch users using dataflow runner and that is decent
chunk of beam python users.

It is obsolete for the PortableRunner. If the Dataflow Runner needs
this
flag, couldn't we simply add it there? As far as I know Dataflow users
do not use the PortableRunner. I might be wrong.

As Kyle mentioned, he already fixed the issue. The fix is only present
in the 2.16.0 release though. This flag has repeatedly caused friction
for users and that's why I want to get rid of it.

There is of course no need to rush this but it would be great to tackle
this for the next release. Filed a JIRA:
https://jira.apache.org/jira/browse/BEAM-8274

Cheers,
Max

On 17.09.19 15:39, Kyle Weaver wrote:
 > Actually, the reported issues are already fixed on head. We're just
 > trying to prevent similar issues in the future.
 >
 > Kyle Weaver | Software Engineer | github.com/ibzib

 >  | kcwea...@google.com
 >
 >
 >
 > On Tue, Sep 17, 2019 at 3:38 PM Ahmet Altay mailto:al...@google.com>
 > >> wrote:
 >
 >
 >
 >     On Tue, Sep 17, 2019 at 2:26 PM Maximilian Michels
mailto:m...@apache.org>
 >     >> wrote:
 >
 >          > Is not this flag set automatically for the portable runner
 >
 >         Yes, the flag is set automatically, but it has been broken
 >         before and
 >         likely will be again. It just adds additional complexity to
 >         portable
 >         Runners. There is no other portability API then the Fn
API. This
 >         flag
 >         historically had its justification, but seems obsolete now.
 >
 >
 >     I disagree that this flag is obsolete. It is still serving a
purpose
 >     for batch users using dataflow runner and that is decent chunk of
 >     beam python users.
 >
 >     I agree with switching the default. I would like to give
enough time
 >     to decouple the flag from the core code. (With a quick search
I saw
 >     two instances related to Read and Create.) Have time to test
changes
 >     and then switch the default.
 >
 >
 >         An isinstance check might be smarter, but does not get rid of
 >         the root
 >         of the problem.
 >
 >
 >     I might be wrong, IIUC, it will temporarily resolve the reported
 >     issues. Is this not accurate?
 >
 >
 >         -Max
 >
 >         On 17.09.19 14:20, Ahmet Altay wrote:
 >          > Could you make that change and see if it would have
addressed
 >         the issue
 >          > here?
 >          >
 >          > On Tue, Sep 17, 2019 at 2:18 PM Kyle Weaver
 >         mailto:kcwea...@google.com>
>
 >          >            >
 >          >     The flag is automatically set, but not in a smart
way. Taking
 >          >     another 

Failed Nexmark non-Direct runner tests with vendored Calcite

2019-09-19 Thread Kai Jiang
Hi Community,

When vendoring Calcite in Beam SQL (pull/9189
), we only pass direct runner
nexmark test, but, all non-Direct runners tests were not passed (Spark
runner
,
Flink
runner
,
Dataflow
runner

)

>From stacktraces
,
it looks like transform translator registered with AutoService is not a
subtype of the expected interface.

java.lang.RuntimeException: java.util.ServiceConfigurationError:
org.apache.beam.runners.core.construction.TransformPayloadTranslatorRegistrar:
Provider
org.apache.beam.runners.direct.TransformEvaluatorRegistry$DirectTransformsRegistrar
not a subtype

Appreciated if any ideas or helps with this.

Best,
Kai


Re: Next LTS?

2019-09-19 Thread Maximilian Michels
An LTS only makes sense if we end up patching the LTS, which so far we 
have never done. There has been work done in backporting fixes, see 
https://github.com/apache/beam/commits/release-2.7.1 but the effort was 
never completed. The main reason I believe were complications with 
running the evolved release scripts against old Beam versions.


Now that the portability layer keeps maturing, it makes me optimistic 
that we might have a maintained LTS in the future.


-Max

On 19.09.19 08:40, Ismaël Mejía wrote:

The fact that end users never asked AFAIK in the ML for an LTS and for
a subsequent minor release of the existing LTS shows IMO the low
interest on having a LTS.

We still are heavily iterating in many areas (portability/schema) and
I am not sure users (and in particular users of open source runners)
get a big benefit of relying on an old version. Maybe this is the
moment to reconsider if having a LTS does even make sense given (1)
that our end user facing APIs are 'mostly' stable (even if many still
called @Experimental). (2) that users get mostly improvements on
runners translation and newer APIs with a low cost just by updating
the version number, and (3) that in case of any regression in an
intermediary release we still can do a minor release even if we have
not yet done so, let's not forget that the only thing we need to do
this is enough interest to do the release from the maintainers.


On Tue, Sep 17, 2019 at 12:00 AM Valentyn Tymofieiev
 wrote:


I support nominating 2.16.0 as LTS release since in has robust Python 3 support compared 
with prior releases, and also for reasons of pending Python 2 deprecation. This has been 
discussed before [1]. As Robert pointed out in that thread, LTS nomination in Beam is 
currently retroactive. If we keep the retroactive policy, the question is how long we 
should wait for a release to be considered "safe" for nomination.  Looks like 
in case of 2.7.0 we waited a month, see [2,3].

Thanks,
Valentyn

[1] 
https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
[2] https://beam.apache.org/blog/2018/10/03/beam-2.7.0.html
[3] 
https://lists.apache.org/thread.html/896cbc9fef2e60f19b466d6b1e12ce1aeda49ce5065a0b1156233f01@%3Cdev.beam.apache.org%3E

On Mon, Sep 16, 2019 at 2:46 PM Austin Bennett  
wrote:


Hi All,

According to our policies page [1]: "There will be at least one new LTS release in a 
12 month period, and LTS releases are considered deprecated after 12 months"

The last LTS was released 2018-10-02 [2].

Does that mean the next release (2.16) should be the next LTS?  It looks like 
we are in danger of not living up to that promise.

Cheers,
Austin



[1] https://beam.apache.org/community/policies/

[2]  https://beam.apache.org/get-started/downloads/


Re: Next LTS?

2019-09-19 Thread Ismaël Mejía
The fact that end users never asked AFAIK in the ML for an LTS and for
a subsequent minor release of the existing LTS shows IMO the low
interest on having a LTS.

We still are heavily iterating in many areas (portability/schema) and
I am not sure users (and in particular users of open source runners)
get a big benefit of relying on an old version. Maybe this is the
moment to reconsider if having a LTS does even make sense given (1)
that our end user facing APIs are 'mostly' stable (even if many still
called @Experimental). (2) that users get mostly improvements on
runners translation and newer APIs with a low cost just by updating
the version number, and (3) that in case of any regression in an
intermediary release we still can do a minor release even if we have
not yet done so, let's not forget that the only thing we need to do
this is enough interest to do the release from the maintainers.


On Tue, Sep 17, 2019 at 12:00 AM Valentyn Tymofieiev
 wrote:
>
> I support nominating 2.16.0 as LTS release since in has robust Python 3 
> support compared with prior releases, and also for reasons of pending Python 
> 2 deprecation. This has been discussed before [1]. As Robert pointed out in 
> that thread, LTS nomination in Beam is currently retroactive. If we keep the 
> retroactive policy, the question is how long we should wait for a release to 
> be considered "safe" for nomination.  Looks like in case of 2.7.0 we waited a 
> month, see [2,3].
>
> Thanks,
> Valentyn
>
> [1] 
> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/blog/2018/10/03/beam-2.7.0.html
> [3] 
> https://lists.apache.org/thread.html/896cbc9fef2e60f19b466d6b1e12ce1aeda49ce5065a0b1156233f01@%3Cdev.beam.apache.org%3E
>
> On Mon, Sep 16, 2019 at 2:46 PM Austin Bennett  
> wrote:
>>
>> Hi All,
>>
>> According to our policies page [1]: "There will be at least one new LTS 
>> release in a 12 month period, and LTS releases are considered deprecated 
>> after 12 months"
>>
>> The last LTS was released 2018-10-02 [2].
>>
>> Does that mean the next release (2.16) should be the next LTS?  It looks 
>> like we are in danger of not living up to that promise.
>>
>> Cheers,
>> Austin
>>
>>
>>
>> [1] https://beam.apache.org/community/policies/
>>
>> [2]  https://beam.apache.org/get-started/downloads/


Re: [discuss] How we support our users on Slack / Mailing list / StackOverflow

2019-09-19 Thread Ismaël Mejía
Sorry for late answer. The issue here is that once we have a
communication channel, users expect
answers on it. Python SDK is getting momentum and we need to serve users where
they are (as mentioned by others above).

One strong advantage of 'real-time' communication (Slack/IRC) is that it is
better suited for collaboration, and to create community bonds, think for
example of how many people who answered a question you were looking for at
stackoverflow you can remember by their 'name', versus the people with whom you
have interacted in a short conversation in an IRC-like channel. I mention this
because this is a way to make users welcomed and many times a first step towards
contribution (for example the 'would you be willing to add this to the docs
case').

StackOverflow is probably the most 'scalable' system because of many aspects
like being indexed in a better way by search engines helping future users to
find answers quickly, but it is also not perfect, the reputation system is
basically elitist against casual people answering questions. In any case there
is value in encouraging moving some answers from Slack to SO, but there is also
value in improving our own website docs so this should probably be done case by
case.

A first approach is probably to document (and recommend) to users that if they
don't get their questions answered in slack to better ask in SO or the user
mailing list.

I personally think there is value in getting more people involved in 'real-time'
communications. Of course this is probably not for everyone, I understand that
people may not want to do this to avoid being interrupted or for other reasons,
but this is a trade-off to pay not only to help people but eventually to
grow the community as in the go-lang case Robert mentioned so it is probably
worth considering.

On Wed, Sep 11, 2019 at 3:27 AM Robert Burke  wrote:
>
> For the Go SDK, emailing the dev list or asking on Slack are probably the 
> best ways to get an answer from me. I'm not in the habit to search for open 
> Go SDK questions on stack overflow right now, but will chip in if they're 
> pointed out to me
>
> As Alexey mentions, Slack largely works for quick back and forths with 
> community members, especially if both folks are awake at the same time. Eg. 
> I've been handling a few questions there, and helping the user in question 
> even get a few quick fix PRs in, making the SDK better for everyone.
> On the other hand, I can be more responsive on beam-go because it's low 
> enough traffic I can be notified of every question/response. I look forward 
> to when there's enough traffic there I can turn that off. :D
>
>
> On Tue, Sep 10, 2019, 4:45 PM Alexey Romanenko  
> wrote:
>>
>> Pablo, thank you for raising this question.
>>
>> I can’t say for Python, but as a someone, who tries to keep an eye on Java 
>> SDK related questions on ML/Slack/SO for a while, I’d say that Slack is not 
>> very effective for this.
>> There are several reasons for that:
>> - People tend to expect a quick feedback on Slack which is not happening all 
>> the time, especially, for not evident questions where you need some time to 
>> provide an answer. Also, timezones difference play its role in terms of 
>> reaction time.
>> - Discussions are not always happened inside Slack threads, so it could be 
>> messed up with the messages of other questions/topics and it becomes 
>> difficult to follow.
>> - It’s not so easy to search for similar issues and provide quick link with 
>> already answered question.
>>
>> So, I’d say that Slack is perfect to discuss quick and urgent questions but 
>> not sure it should be placed on the first place as a users support thing. 
>> IMHO, we need to redirect users to user@ or SO for that (up to them to 
>> choose). Though, the more important thing is to regulalrly keep track of non 
>> answered questions there and do our best to minimise this number.
>>
>>
>> On 9 Sep 2019, at 11:38, Kyle Weaver  wrote:
>>
>> I pinned a message to #beam reminding people of the user@, but pinned 
>> messages aren't immediately visible. We might be better off editing the 
>> topic, which always appears at the top of the channel, to include 
>> https://beam.apache.org/community/contact-us/ or links to user@ and SO. We 
>> should also add the same topic to the #beam-java and #beam-python channels, 
>> which currently don't have any topic.
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>>
>>
>> On Mon, Sep 9, 2019 at 9:06 AM Pablo Estrada  wrote:
>>>
>>> +Ismaël Mejía can you share your impressions from Slack? Do you think 
>>> Java/Python/other users get appropriate support there?
>>>
>>> On Fri, Sep 6, 2019 at 7:16 PM Ahmet Altay  wrote:

 I agree Slack can be used by Beam users and it would be good to meet users 
 where they are. If I understand correctly, the issue Pablo is raising is 
 that there are not enough people online in Slack that can answer python 
 questions. We 

Re: Pointers on Contributing to Structured Streaming Spark Runner

2019-09-19 Thread Ismaël Mejía
25/09 looks ok. I just updated the meeting invitation to the new
date.I will prepare a mini agenda in the shared minute document in the
meantime.
I cannot see the old invitees, can someone please confirm me they see
the date updated.
Thanks,
Ismaël

On Thu, Sep 19, 2019 at 2:13 PM Etienne Chauchot  wrote:
>
> Hi Rahul and Xinyu,
> I just added you to the list of guests in the meeting. Time is 5pm GMT +2.
> That being said, for some reason last meeting scheduled was 08/28. Ismael 
> initially created the meeting, I do not have the rights to add a new date. 
> Ismael can you add a date ? I suggest 09/25. WDYT ?
>
> Best
> Etienne
>
> Le jeudi 19 septembre 2019 à 00:49 +0530, rahul patwari a écrit :
>
> Hi,
>
> I would love to join the call.
> Can you also share the meeting invitation with me?
>
> Thanks,
> Rahul
>
> On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu,  wrote:
>
> Alexey and Etienne: I'm very happy to join the sync-up meeting. Please 
> forward the meeting info to me. I am based in California, US and hopefully 
> the time will work :).
>
> Thanks,
> Xinyu
>
> On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot  wrote:
>
> Hi Xinyu,
>
> Thanks for offering help ! My comments are inline:
>
> Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit :
>
> Hi, Etienne,
>
> The slides are very informative! Thanks for sharing the details about how the 
> Beam API are mapped into Spark Structural Streaming.
>
>
> Thanks !
>
> We (LinkedIn) are also interested in trying the new SparkRunner to run Beam 
> pipeine in batch, and contribute to it too. From my understanding, seems the 
> functionality on batch side is mostly complete and covers quite a large 
> percentage of the tests (a few missing pieces like state and timer in ParDo 
> and SDF).
>
>
> Correct, it passes 89% of the tests, but there is more than SDF, state and 
> timer missing, there is also ongoing encoders work that I would like to 
> commit/push before merging.
>
> If so, is it possible to merge the new runner sooner into master so it's much 
> easier for us to pull it in (we have an internal fork) and contribute back?
>
>
> Sure, see my other mail on this thread. As Alexey mentioned, please join the 
> sync meeting we have, the more the merrier !
>
>
> Also curious about the scheme part in the runner. Seems we can leverage the 
> schema-aware work in PCollection and translate from Beam schema to Spark, so 
> it can be optimized in the planner layer. It will be great to hear back your 
> plans on that.
>
>
> Well, it is not designed yet but, if you remember my talk, we need to store 
> beam windowing information with the data itself, so ending up having a 
> dataset . One lead that was discussed is to store it as a 
> Spark schema such as this:
>
> 1. field1: binary data for beam windowing information (cannot be mapped to 
> fields because beam windowing info is complex structure)
>
> 2. fields of data as defined in the Beam schema if there is one
>
>
> Congrats on this great work!
>
> Thanks !
>
> Best,
>
> Etienne
>
> Thanks,
> Xinyu
>
> On Wed, Sep 11, 2019 at 6:02 PM Rui Wang  wrote:
>
> Hello Etienne,
>
> Your slide mentioned that streaming mode development is blocked because Spark 
> lacks supporting multiple-aggregations in its streaming mode but design is 
> ongoing. Do you have a link or something else to their design discussion/doc?
>
>
> -Rui
>
> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot  wrote:
>
> Hi Rahul,
> Sure, and great ! Thanks for proposing !
> If you want details, here is the presentation I did 30 mins ago at the 
> apachecon. You will find the video on youtube shortly but in the meantime, 
> here is my presentation slides.
>
> And here is the structured streaming branch. I'll be happy to review your 
> PRs, thanks !
>
> https://github.com/apache/beam/tree/spark-runner_structured-streaming
>
> Best
> Etienne
>
> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
>
> Hi Etienne,
>
> I came to know about the work going on in Structured Streaming Spark Runner 
> from Apache Beam Wiki - Works in Progress.
> I have contributed to BeamSql earlier. And I am working on supporting 
> PCollectionView in BeamSql.
>
> I would love to understand the Runner's side of Apache Beam and contribute to 
> the Structured Streaming Spark Runner.
>
> Can you please point me in the right direction?
>
> Thanks,
> Rahul


Re: Cassandra flaky on Jenkins?

2019-09-19 Thread Jean-Baptiste Onofré
Hi Etienne,

let me take a look, I'm not sure.

Regards
JB

On 19/09/2019 16:42, Etienne Chauchot wrote:
> Hi all,
> I just created a PR (1) that tries to fix the flakiness of
> CassandraIOTest (underlying
> ticket https://jira.apache.org/jira/browse/BEAM-8025 that was assigned
> to me). We will see with the test repetitions if it is no more flaky.
> 
> JB, I don't know if my PR will also fix the ticket
> https://issues.apache.org/jira/browse/BEAM-7355 assigned to you, or if
> the tickets are the same/related. I hope it does.
> 
> 
> [1]
> https://github.com/apache/beam/pull/9614
> 
> Best,
> Etienne
> Le mercredi 04 septembre 2019 à 16:27 +0200, Jean-Baptiste Onofré a écrit :
>> Thanks David,
>>
>> it makes sense, it gives me time to investigate and fix.
>>
>> Regards
>> JB
>>
>> On 04/09/2019 15:01, David Morávek wrote:
>> Hi, temporarily disabling the test
>> , until BEAM-8025
>>  is resolved (marking it
>> as blocker for 2.16), so we can unblock ongoing pull requests.
>>
>> Best,
>> D.
>>
>> On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré > 
>> >> wrote:
>>
>> Hi Max,
>>
>> yup, I'm starting the investigation.
>>
>> I keep you posted.
>>
>> Regards
>> JB
>>
>> On 03/09/2019 15:34, Maximilian Michels wrote:
>> > The newest incarnation of this is here:
>> > https://jira.apache.org/jira/browse/BEAM-8025
>> >
>> > Would be good if you could take a look JB.
>> >
>> > Thanks,
>> > Max
>> >
>> > On 03.09.19 15:32, David Morávek wrote:
>> >> yes, that looks similar. example:
>> >>
>> >> https://github.com/apache/beam/pull/9464
>> >>
>> >> D.
>> >>
>> >> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré > 
>> >
>> >>  
>> > >>
>> >>> Thanks David,
>> >>>
>> >>> the build is running on my machine to see if I can reproduce
>> locally.
>> >>>
>> >>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355
>> right ?
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On 03/09/2019 15:11, David Morávek wrote:
>>  I’m running into these failures too
>> 
>>  D.
>> 
>>  Sent from my iPhone
>> 
>> > On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré > 
>> >
>> >  
>> > >
>> > Hi,
>> >
>> > Let me take a look. Do you always have this issue on Jenkins or
>> > randomly ?
>> >
>> > Regards
>> > JB
>> >
>> >> On 03/09/2019 14:19, Alex Van Boxel wrote:
>> >> Hi, is it only me that are bumping on the flaky Cassandra on
>> >> Jenkins? I
>> >> like to get my PR approved but I can't get past the Cassandra
>> >> error...
>> >>
>> >> * org.apache.beam.sdk.io
>> .cassandra.CassandraIOTest.classMethod
>> >>
>>   
>> 
>> >>
>> >>
>> >>
>> >>
>> >> _/
>> >> _/ Alex Van Boxel
>> >
>> > -- 
>> > Jean-Baptiste Onofré
>> > jbono...@apache.org  
>> >
>>  
>> >>
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>> >>>
>> >>> -- 
>> >>> Jean-Baptiste Onofré
>> >>> jbono...@apache.org  
>> >
>>  
>> >>
>> >>> http://blog.nanthrax.net
>> >>> Talend - http://www.talend.com
>>
>> -- 
>> Jean-Baptiste Onofré
>> jbono...@apache.org  
>> >
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra flaky on Jenkins?

2019-09-19 Thread Etienne Chauchot
Hi all,I just created a PR (1) that tries to fix the flakiness of 
CassandraIOTest (underlying ticket 
https://jira.apache.org/jira/browse/BEAM-8025 that was assigned to me). We will 
see with the test repetitions if it is
no more flaky.
JB, I don't know if my PR will also fix the ticket 
https://issues.apache.org/jira/browse/BEAM-7355 assigned to you, or
if the tickets are the same/related. I hope it does.

[1] https://github.com/apache/beam/pull/9614 id="-x-evo-selection-start-marker">
Best,Etienne Le mercredi 04 septembre 2019 à 16:27 +0200, Jean-Baptiste Onofré 
a écrit :
> Thanks David,
> it makes sense, it gives me time to investigate and fix.
> RegardsJB
> On 04/09/2019 15:01, David Morávek wrote:
> Hi, temporarily disabling the test, 
> until BEAM-8025<
> https://jira.apache.org/jira/browse/BEAM-8025> is resolved (marking itas 
> blocker for 2.16), so we can unblock ongoing
> pull requests.
> Best,D.
> On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré 
> mailto:j...@nanthrax.net>> wrote:
> Hi Max,
> yup, I'm starting the investigation.
> I keep you posted.
> RegardsJB
> On 03/09/2019 15:34, Maximilian Michels wrote:> The newest 
> incarnation of this is here:> 
> https://jira.apache.org/jira/browse/BEAM-80255>> Would be good if you 
> could take a look JB.>>
> Thanks,> Max>> On 03.09.19 15:32, David Morávek wrote:>> yes, 
> that looks similar. example:>>>>
> https://github.com/apache/beam/pull/94644>>>> D.>>>> On 3 Sep 
> 2019, at 15:18, Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>>> 
> >>
> wrote:>>>>> Thanks David,>>>>>> the build is running on my 
> machine to see if I can
> reproducelocally.>>>>>> It sounds like 
> https://issues.apache.org/jira/browse/BEAM-7355right
> ?>>>>>> Regards>>> JB>>>>>> On 03/09/2019 15:11, David 
> Morávek wrote: I’m running into
> these failures too D. Sent from my iPhone
> > On 3 Sep 2019, at 14:34,
> Jean-Baptiste Onofré mailto:j...@nanthrax.net>
> >  >> wrote:>> Hi,>> 
> Let me take a look. Do you always have this
> issue on Jenkins or> randomly ?>> Regards> JB 
>>>> On 03/09/2019 14:19,
> Alex Van Boxel wrote:>> Hi, is it only me that are bumping on the 
> flaky Cassandra on>> Jenkins?
> I>> like to get my PR approved but I can't get past the Cassandra
> >> error...>>>> *
> org.apache.beam.sdk.io
> .cassandra.CassandraIOTest.classMethod
> >>   //builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOT
> est/classMethod/>>>>>>>>>>> _/
> >> _/ Alex Van Boxel>>
> -- > Jean-Baptiste Onofré> jbono...@apache.org
>  >> 
> http://blog.nanthrax.nett> Talend - http://www.talend.comm>>>
> >>> -- >>> Jean-Baptiste
> Onofré>>> jbono...@apache.org 
>  >>>> http://blog.nanthrax.nett>>> Talend 
> - http://www.talend.com
> -- Jean-Baptiste Onofréjbono...@apache.org 
> http://blog.nanthrax.netTal
> end - http://www.talend.com
> 
>