Re: [ANNOUNCE] Beam 2.19.0 Released

2020-02-04 Thread Hannah Jiang
Thanks Boyuan! On Tue, Feb 4, 2020 at 4:46 PM Connell O'Callaghan wrote: > Well done and thank you Boyuan (and all involved)!!! > > On Tue, Feb 4, 2020 at 4:25 PM Boyuan Zhang wrote: > >> The Apache Beam team is pleased to announce the release of version 2.19.0 >> . >> >> Apache Beam is an

Re: [ANNOUNCE] Beam 2.19.0 Released

2020-02-04 Thread Connell O'Callaghan
Well done and thank you Boyuan (and all involved)!!! On Tue, Feb 4, 2020 at 4:25 PM Boyuan Zhang wrote: > The Apache Beam team is pleased to announce the release of version 2.19.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelines,

Re: [RELEASE VOTE RESULT] Release 2.19.0, release candidate #1

2020-02-04 Thread Boyuan Zhang
Thanks everyone for the kind words! I want to say thank you to every release manager, who not only completed but also tried the best to improve the process. It's possible that making a release process perfect is super hard, but it's being better, and better! Thanks, everyone : D On Mon, Feb 3,

[ANNOUNCE] Beam 2.19.0 Released

2020-02-04 Thread Boyuan Zhang
The Apache Beam team is pleased to announce the release of version 2.19.0. Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. See https://beam.apache.org You can download the release

Seattle Beam Meetup - March 2

2020-02-04 Thread Aizhamal Nurmamat kyzy
Hello everyone, We are hosting a Beam Meetup in Seattle on March 2! If you are in the Seattle area please come and join us at Google office in South Lake Union. Meetup agenda: 18:00 - Registration, speed networking, food and drinks. 18:30 - Encoding free-text drug names in electronic health

Re: Python2.7 Beam End-of-Life Date

2020-02-04 Thread Robert Bradshaw
On Tue, Feb 4, 2020 at 12:12 PM Chad Dombrova wrote: >> >> Not to mention that all the nice work for the type hints will have to be >> redone in the for 3.x. > > Note that there's a tool for automatically converting type comments to > annotations: https://github.com/ilevkivskyi/com2ann > > So

Re: [DISCUSSION] Improve release notes by adding a change list file

2020-02-04 Thread Ahmet Altay
There are a few approvals on the PR and no objections in this thread. I will merge the PR. Thank you all for the feedback. And please remind PR authors to update this file to make it a reality. Hopefully this will make releases a tiny bit easier and provide better information for users. Ahmet

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
Seems like the image we use in KFP to orchestrate the job has cloudpickle==0.8.1 and that one doesn't seem to cause issues. I think I'm unblock for now but I'm sure I won't be the last one to try to do this using GCP managed notebooks :( Thanks for all the help! On Tue, Feb 4, 2020 at 12:24 PM

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
I'm using a managed notebook instance from GCP It seems those already come with cloudpickle==1.2.2 as soon as you provision it. apache-beam[gcp] will then install dill==0.3.1.1 I'm going to try to uninstall cloudpickle before installing apache-beam and see if this fixes the problem Thank you On

Re: Python2.7 Beam End-of-Life Date

2020-02-04 Thread Chad Dombrova
> > > Not to mention that all the nice work for the type hints will have to be > redone in the for 3.x. > Note that there's a tool for automatically converting type comments to annotations: https://github.com/ilevkivskyi/com2ann So don't let that part bother you. I'm curious what other

Re: Python2.7 Beam End-of-Life Date

2020-02-04 Thread Ahmet Altay
For reference, this was last discussed in September [1]. I agree, that it is a good time to re-think about this, and I also lean towards deprecating sooner. /cc +Valentyn Tymofieiev +Robert Bradshaw +Chad [1]

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Valentyn Tymofieiev
The fact that you have cloudpickle==1.2.2 further confirms that you may be hitting the same error as https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype . Could you try to start over with a clean virtual environment? On Tue,

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
Hi Valentyn, Here is my pip freeze on my machine (note that the error is in dataflow, the job runs fine in my machine) ansiwrap==0.8.4 apache-beam==2.19.0 arrow==0.15.5 asn1crypto==1.3.0 astroid==2.3.3 astropy==3.2.3 attrs==19.3.0 avro-python3==1.9.1 azure-common==1.1.24

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Valentyn Tymofieiev
It don't think there is a mismatch between dill versions here, but https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype mentions a similar error and may be related. What is the output of pip freeze on your machine (or better: pip

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
BTW it doesn't seem to be related to the BQ sink. My job is failing now too without that part (and it wasn't earlier today): def test_error( bq_table: str) -> str: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions class

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
Here is a test job that sometimes fails and sometimes doesn't (but most times do). There seems to be something stochastic that causes this as after several tests a couple of them did succeed def test_error( bq_table: str) -> str: import apache_beam as beam from

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Mikhail Gryzykhin
Hi Alan, +Valentyn Tymofieiev Can you verify if my assumption is correct? It seems that the problem might come from dill version mismatch. Dill version should match on worker and user code. Between Beam 2.17 and Beam 2.18 we upgraded dill version to 0.3.1.1 which has an incompatible format

Re: Jenkins jobs not running for my PR 10438

2020-02-04 Thread Tomo Suzuki
Thank you, Ahmet! On Tue, Feb 4, 2020 at 12:36 PM Tomo Suzuki wrote: > > Hi Beam Committers, > > Would you run the precommit checks for > https://github.com/apache/beam/pull/10765 > with following 6 additional commands (one command per comment) ?: > > Run Java PostCommit > Run Java

Python2.7 Beam End-of-Life Date

2020-02-04 Thread Sam Rohde
Hi All, Just curious when Beam will drop support for Python 2.7? Not being able to use all the nice features of 3.x and appeasing both 2.7 and 3.x linters is somewhat troublesome. Not to mention that all the nice work for the type hints will have to be redone in the for 3.x. It seems the faster

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
I tried breaking apart my pipeline. Seems the step that breaks it is: beam.io.WriteToBigQuery Let me see if I can create a self contained example that breaks to share with you Thanks! On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada wrote: > Hm that's odd. No changes to the pipeline? Are you able

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Pablo Estrada
Hm that's odd. No changes to the pipeline? Are you able to share some of the code? +Udi Meiri do you have any idea what could be going on here? On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz wrote: > Hi Pablo, > This is strange... it doesn't seem to be the last beam release as last > night it

Re: Jenkins jobs not running for my PR 10438

2020-02-04 Thread Tomo Suzuki
Hi Beam Committers, Would you run the precommit checks for https://github.com/apache/beam/pull/10765 with following 6 additional commands (one command per comment) ?: Run Java PostCommit Run Java HadoopFormatIO Performance Test Run BigQueryIO Streaming Performance Test Java Run Dataflow

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Alan Krumholz
Hi Pablo, This is strange... it doesn't seem to be the last beam release as last night it was already using 2.19.0 I wonder if it was some release from the DataFlow team (not beam related): Job typeBatch Job status Succeeded SDK version Apache Beam Python 3.5 SDK 2.19.0 Region us-central1 Start

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Pablo Estrada
Hi Alan, could it be that you're picking up the new Apache Beam 2.19.0 release? Could you try depending on beam 2.18.0 to see if the issue surfaces when using the new release? If something was working and no longer works, it sounds like a bug. This may have to do with how we pickle (dill /