Support of KafkaIO Dynamic Read

2021-01-07 Thread Boyuan Zhang
Hi team,

I'm working on KafkaIO dynamic read support which is tracked by BEAM-11325
 and I started the
documentation here:
https://docs.google.com/document/d/1FU3GxVRetHPLVizP3Mdv6mP5tpjZ3fd99qNjUI5DT5k/edit?usp=sharing,
which states the problem I want to solve and the proposed solutions.

Please feel free to drop any comments/concerns/suggestions/ideas : ) If the
design looks good in general, I'll start the dev work as soon as possible.

Thanks for your help!


Re: pulllicenses fails while building

2021-01-07 Thread Reuven Lax
Unfortunately, setting that env variable didn't change anything.

On Thu, Jan 7, 2021 at 7:09 PM Ahmet Altay  wrote:

> Googled this a bit. Setting this env variable might fix the problem by
> bypassing the check:
> export PYTHONHTTPSVERIFY=0
>
> Could you try that? If it works we can make that part of the script.
>
> Also, I am not sure why licenses are pulled for regular development case.
> I thought it was not meant to run by default.
>
> /cc +Tyson Hamilton  +Emily Ye 
>
> On Thu, Jan 7, 2021 at 5:42 PM Reuven Lax  wrote:
>
>> I recently upgraded to Python 3 on my build machine. Now all attempts to
>> build Beam fail with the following. Anyone know how to ix this?
>>
>> urllib.error.URLError: > certificate verify failed: unable to get local issuer certificate
>> (_ssl.c:1123)>
>>
>> ERROR:root:Invalid url for paranamer-2.7:
>> https://raw.githubusercontent.com/paul-hammant/paranamer/master/LICENSE.txt
>> .
>>
>>
>>
>>


Re: pulllicenses fails while building

2021-01-07 Thread Ahmet Altay
Googled this a bit. Setting this env variable might fix the problem by
bypassing the check:
export PYTHONHTTPSVERIFY=0

Could you try that? If it works we can make that part of the script.

Also, I am not sure why licenses are pulled for regular development case. I
thought it was not meant to run by default.

/cc +Tyson Hamilton  +Emily Ye 

On Thu, Jan 7, 2021 at 5:42 PM Reuven Lax  wrote:

> I recently upgraded to Python 3 on my build machine. Now all attempts to
> build Beam fail with the following. Anyone know how to ix this?
>
> urllib.error.URLError:  certificate verify failed: unable to get local issuer certificate
> (_ssl.c:1123)>
>
> ERROR:root:Invalid url for paranamer-2.7:
> https://raw.githubusercontent.com/paul-hammant/paranamer/master/LICENSE.txt
> .
>
>
>
>


pulllicenses fails while building

2021-01-07 Thread Reuven Lax
I recently upgraded to Python 3 on my build machine. Now all attempts to
build Beam fail with the following. Anyone know how to ix this?

urllib.error.URLError: 

ERROR:root:Invalid url for paranamer-2.7:
https://raw.githubusercontent.com/paul-hammant/paranamer/master/LICENSE.txt.


Re: Why are all the website files failing RAT?

2021-01-07 Thread Brian Hulette
I just merged https://github.com/apache/beam/pull/13697 which should
resolve this.

On Thu, Jan 7, 2021 at 3:39 PM Ahmet Altay  wrote:

>
>
> On Thu, Jan 7, 2021 at 3:35 PM Brian Hulette  wrote:
>
>>
>>
>> On Thu, Jan 7, 2021 at 2:00 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Thu, Jan 7, 2021 at 12:25 PM Kyle Weaver  wrote:
>>>
 I repro'd this by running "./gradlew :rat". If I understand correctly,
 these are all Hugo dependencies that are downloaded automatically. I looked
 at a few of them and they do have licenses, but I guess rat just doesn't
 recognize them for whatever reason.

>>>
>>> Do we know why precommits are not failing? They are running rat. And
>>> what changed recently?
>>>
>>
>> These are build files that only exist if you've built the website. So
>> they don't exist for the rat precommit
>>
>>
>>>

 The rat task is supposed to ignore everything in beam/.gitignore [1],
 but the website directory has its own .gitignore [2]. The website's
 .gitignore includes www/node_modules. I don't think there's really a need
 for the website to have its own .gitignore, so one potential fix would be
 to move the website .gitignore rules into the root one.

>>>
>>> Merging gitignores sounds good. Another option we could add rat
>>> exclusions using the second gitignore list as well.
>>>
>>
>>>
 I filed a JIRA for this as well:
 https://issues.apache.org/jira/browse/BEAM-11582.

>>>
>>> Are you interested in doing one of the above things? :)
>>>
>>
>> I'll send a PR to merge the gitignores
>>
>
> Thank you.
>
>
>>
>>
>>>
>>>

 [1]
 https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/build.gradle#L119
 [2]
 https://github.com/apache/beam/blob/2ad28542ec051dac6ebb8f0c6ea0c1b86a70f2cf/website/.gitignore#L16

 On Thu, Jan 7, 2021 at 9:50 AM Reuven Lax  wrote:

> My builds are failing recently, with complaints of 2445 license
> violations. It appears to be a bunch of website files, see below. Any idea
> what is happening here?
>
> Unapproved Licenses:
> /Users/relax/beam/website/www/node_modules/callsites/index.js
> /Users/relax/beam/website/www/node_modules/callsites/readme.md
> /Users/relax/beam/website/www/node_modules/reusify/test.js
> /Users/relax/beam/website/www/node_modules/reusify/README.md
>
> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/reuseNoCodeFunction.js
> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/fib.js
>
> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/createNoCodeFunction.js
> /Users/relax/beam/website/www/node_modules/reusify/.coveralls.yml
> /Users/relax/beam/website/www/node_modules/reusify/reusify.js
> /Users/relax/beam/website/www/node_modules/reusify/.travis.yml
> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.npmignore
> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.jshintignore
> /Users/relax/beam/website/www/node_modules/pretty-hrtime/index.js
> /Users/relax/beam/website/www/node_modules/@types/color-name/README.md
>



Re: Website is MIT licensed?

2021-01-07 Thread Ahmet Altay
I agree with you. I guess we missed this during the website migration.

https://github.com/apache/beam/pull/13698 is to fix.

On Thu, Jan 7, 2021 at 12:42 PM Kyle Weaver  wrote:

> Hi all,
>
> I discovered the Beam website's NPM module purports to be MIT licensed
> [1]. I don't think we publish the website's NPM package anywhere, but
> wouldn't the website be Apache 2.0 licensed like everything else?
>
> Thanks,
> Kyle
>
> [1]
> https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/website/www/package.json#L6
>


Re: Why are all the website files failing RAT?

2021-01-07 Thread Ahmet Altay
On Thu, Jan 7, 2021 at 3:35 PM Brian Hulette  wrote:

>
>
> On Thu, Jan 7, 2021 at 2:00 PM Ahmet Altay  wrote:
>
>>
>>
>> On Thu, Jan 7, 2021 at 12:25 PM Kyle Weaver  wrote:
>>
>>> I repro'd this by running "./gradlew :rat". If I understand correctly,
>>> these are all Hugo dependencies that are downloaded automatically. I looked
>>> at a few of them and they do have licenses, but I guess rat just doesn't
>>> recognize them for whatever reason.
>>>
>>
>> Do we know why precommits are not failing? They are running rat. And what
>> changed recently?
>>
>
> These are build files that only exist if you've built the website. So they
> don't exist for the rat precommit
>
>
>>
>>>
>>> The rat task is supposed to ignore everything in beam/.gitignore [1],
>>> but the website directory has its own .gitignore [2]. The website's
>>> .gitignore includes www/node_modules. I don't think there's really a need
>>> for the website to have its own .gitignore, so one potential fix would be
>>> to move the website .gitignore rules into the root one.
>>>
>>
>> Merging gitignores sounds good. Another option we could add rat
>> exclusions using the second gitignore list as well.
>>
>
>>
>>> I filed a JIRA for this as well:
>>> https://issues.apache.org/jira/browse/BEAM-11582.
>>>
>>
>> Are you interested in doing one of the above things? :)
>>
>
> I'll send a PR to merge the gitignores
>

Thank you.


>
>
>>
>>
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/build.gradle#L119
>>> [2]
>>> https://github.com/apache/beam/blob/2ad28542ec051dac6ebb8f0c6ea0c1b86a70f2cf/website/.gitignore#L16
>>>
>>> On Thu, Jan 7, 2021 at 9:50 AM Reuven Lax  wrote:
>>>
 My builds are failing recently, with complaints of 2445 license
 violations. It appears to be a bunch of website files, see below. Any idea
 what is happening here?

 Unapproved Licenses:
 /Users/relax/beam/website/www/node_modules/callsites/index.js
 /Users/relax/beam/website/www/node_modules/callsites/readme.md
 /Users/relax/beam/website/www/node_modules/reusify/test.js
 /Users/relax/beam/website/www/node_modules/reusify/README.md

 /Users/relax/beam/website/www/node_modules/reusify/benchmarks/reuseNoCodeFunction.js
 /Users/relax/beam/website/www/node_modules/reusify/benchmarks/fib.js

 /Users/relax/beam/website/www/node_modules/reusify/benchmarks/createNoCodeFunction.js
 /Users/relax/beam/website/www/node_modules/reusify/.coveralls.yml
 /Users/relax/beam/website/www/node_modules/reusify/reusify.js
 /Users/relax/beam/website/www/node_modules/reusify/.travis.yml
 /Users/relax/beam/website/www/node_modules/pretty-hrtime/.npmignore
 /Users/relax/beam/website/www/node_modules/pretty-hrtime/.jshintignore
 /Users/relax/beam/website/www/node_modules/pretty-hrtime/index.js
 /Users/relax/beam/website/www/node_modules/@types/color-name/README.md

>>>


Re: Why are all the website files failing RAT?

2021-01-07 Thread Brian Hulette
On Thu, Jan 7, 2021 at 2:00 PM Ahmet Altay  wrote:

>
>
> On Thu, Jan 7, 2021 at 12:25 PM Kyle Weaver  wrote:
>
>> I repro'd this by running "./gradlew :rat". If I understand correctly,
>> these are all Hugo dependencies that are downloaded automatically. I looked
>> at a few of them and they do have licenses, but I guess rat just doesn't
>> recognize them for whatever reason.
>>
>
> Do we know why precommits are not failing? They are running rat. And what
> changed recently?
>

These are build files that only exist if you've built the website. So they
don't exist for the rat precommit


>
>>
>> The rat task is supposed to ignore everything in beam/.gitignore [1], but
>> the website directory has its own .gitignore [2]. The website's .gitignore
>> includes www/node_modules. I don't think there's really a need for the
>> website to have its own .gitignore, so one potential fix would be to move
>> the website .gitignore rules into the root one.
>>
>
> Merging gitignores sounds good. Another option we could add rat exclusions
> using the second gitignore list as well.
>

>
>> I filed a JIRA for this as well:
>> https://issues.apache.org/jira/browse/BEAM-11582.
>>
>
> Are you interested in doing one of the above things? :)
>

I'll send a PR to merge the gitignores


>
>
>>
>> [1]
>> https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/build.gradle#L119
>> [2]
>> https://github.com/apache/beam/blob/2ad28542ec051dac6ebb8f0c6ea0c1b86a70f2cf/website/.gitignore#L16
>>
>> On Thu, Jan 7, 2021 at 9:50 AM Reuven Lax  wrote:
>>
>>> My builds are failing recently, with complaints of 2445 license
>>> violations. It appears to be a bunch of website files, see below. Any idea
>>> what is happening here?
>>>
>>> Unapproved Licenses:
>>> /Users/relax/beam/website/www/node_modules/callsites/index.js
>>> /Users/relax/beam/website/www/node_modules/callsites/readme.md
>>> /Users/relax/beam/website/www/node_modules/reusify/test.js
>>> /Users/relax/beam/website/www/node_modules/reusify/README.md
>>>
>>> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/reuseNoCodeFunction.js
>>> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/fib.js
>>>
>>> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/createNoCodeFunction.js
>>> /Users/relax/beam/website/www/node_modules/reusify/.coveralls.yml
>>> /Users/relax/beam/website/www/node_modules/reusify/reusify.js
>>> /Users/relax/beam/website/www/node_modules/reusify/.travis.yml
>>> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.npmignore
>>> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.jshintignore
>>> /Users/relax/beam/website/www/node_modules/pretty-hrtime/index.js
>>> /Users/relax/beam/website/www/node_modules/@types/color-name/README.md
>>>
>>


Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Brian Hulette
+1 (non-binding)

Tested Python DataFrame pipeline and sql_taxi example (multi-language,
streaming) on Dataflow.

On Thu, Jan 7, 2021 at 2:35 PM Chamikara Jayalath 
wrote:

> +1 (non-binding).
>
> Tested Java quickstart for Direct/Dataflow runners and multi-language
> pipelines.
>
> Thanks,
> Cham
>
> On Thu, Jan 7, 2021 at 10:47 AM Pablo Estrada  wrote:
>
>> Thanks Valentyn,
>> these were built using Maven 3.6.0, and OpenJDK version 1.8.0_232.
>> Best
>> -P.
>>
>> On Thu, Jan 7, 2021 at 9:45 AM Yichi Zhang  wrote:
>>
>>> +1, I verified the python mobile game walkthrough on core runners.
>>>
>>> On Thu, Jan 7, 2021 at 8:29 AM Valentyn Tymofieiev 
>>> wrote:
>>>
 On Thu, Jan 7, 2021 at 8:06 AM Ismaël Mejía  wrote:

> > Also I wonder if we now need to clarify both Java 8 and Java 11
> versions separately?
>
> You mean for the docker images? Otherwise we should not be using Java
> 11 at all to produce the artifacts.
>

 Thanks for clarification, yes, looks like Docker images are the only
 artifacts where this concern applies.  The Docker images include several
 jars[1], that I believe are currently built locally (by a release manager
 at container image build time). Are the jars included in Java 11 images
 built with Java 11? If so, is it worth calling out the compiler?  Also,
 perhaps we should look into building release artifacts using Jenkins or
 Github Actions, with a preconfigured version of compilers, instead of
 whatever version is installed on the release manager's machine.

 [1]
 https://github.com/apache/beam/blob/master/sdks/java/container/Dockerfile#L24-L33


>
> On Thu, Jan 7, 2021 at 4:51 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> >
> > Noting that announcement does not include the version of the Java
> compilers used - looks like the release guide still requires it:
> >
> > * Java artifacts were built with Maven MAVEN_VERSION and
> OpenJDK/Oracle JDK JDK_VERSION.
> >
> >
> > Could you please add this info to this thread for posterity?
> >
> > Also I wonder if we now need to clarify both Java 8 and Java 11
> versions separately?
> >
> > Other than that, +1 from me. Ran several mobile gaming pipelines on
> Direct and Dataflow runners with Python 3.8.
> >
> > On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský 
> wrote:
> >>
> >> +1 (non-binding).
> >>
> >> I've validated the RC against my dependent projects (mainly Java
> SDK, Flink and DirectRunner).
> >>
> >> Thanks,
> >>
> >>  Jan
> >>
> >> On 1/7/21 2:15 AM, Ahmet Altay wrote:
> >>
> >> +1 (binding) - validated python quickstarts.
> >>
> >> Thank you Pablo.
> >>
> >> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada 
> wrote:
> >>>
> >>> +1 (binding)
> >>> I've built and unit tested existing Dataflow Templates with the
> new version.
> >>> Best
> >>> -P.
> >>>
> >>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada 
> wrote:
> 
>  Hi everyone,
>  Please review and vote on the release candidate #4 for the
> version 2.27.0, as follows:
>  [ ] +1, Approve the release
>  [ ] -1, Do not approve the release (please provide specific
> comments)
> 
>  NOTE. What happened to RC #2? I started building RC2 before
> completing all the cherry-picks, so the tag for RC2 was created on an
> incorrect commit.
> 
>  NOTE. What happened to RC #3? I started building RC3, but a new
> bug was discovered (BEAM-11569) that required amending the branch. Thus
> this is now RC4.
> 
>  Reviewers are encouraged to test their own use cases with the
> release candidate, and vote +1
>   if no issues are found.
> 
>  The complete staging area is available for your review, which
> includes:
>  * JIRA release notes [1],
>  * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
>  * all artifacts to be deployed to the Maven Central Repository
> [4],
>  * source code tag "v2.27.0-RC4" [5],
>  * website pull request listing the release [6], publishing the
> API reference manual [7], and the blog post [8].
>  * Python artifacts are deployed along with the source release to
> the dist.apache.org [2].
>  * Validation sheet with a tab for 2.27.0 release to help with
> validation [9].
>  * Docker images published to Docker Hub [10].
> 
>  The vote will be open for at least 72 hours, but given the
> holidays, we will likely extend for a few more days. The release will be
> adopted by majority approval, with at least 3 PMC affirmative votes.
> 

Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-07 Thread Antonio Si
Hi Jan,

I create this jira: https://issues.apache.org/jira/browse/BEAM-11583

Thanks.

Antonio.

On 2021/01/07 08:43:34, Jan Lukavský  wrote: 
> Hi Antonio,
> 
> can you please create one?
> 
> Thanks,
> 
>   Jan
> 
> On 1/6/21 10:31 PM, Antonio Si wrote:
> > Thanks for the information. Do we have a jira to track this issue or do you 
> > want me to create a jira for this?
> >
> > Thanks.
> >
> > Antonio.
> >
> > On 2021/01/06 17:59:47, Kenneth Knowles  wrote:
> >> Agree with Boyuan & Kyle. That PR is the problem, and we probably do not
> >> have adequate testing. We have a cultural understanding of not breaking
> >> encoded data forms but this is the encoded form of the TypeSerializer, and
> >> actually there are two problems.
> >>
> >> 1. When you have a serialized object that does not have the
> >> serialVersionUid explicitly set, the UID is generated based on many details
> >> that are irrelevant for binary compatibility. Any Java-serialized object
> >> that is intended for anything other than transient transmission *must* have
> >> a serialVersionUid set and an explicit serialized form. Else it is
> >> completely normal for it to break due to irrelevant changes. The
> >> serialVersionUid has no mechanism for upgrade/downgrade so you *must* keep
> >> it the same forever, and any versioning or compat scheme exists within the
> >> single serialVersionUid.
> >> 2. In this case there was an actual change to the fields of the object
> >> stored, so you need to explicitly add the serialized form and also the
> >> ability to read from prior serialized forms.
> >>
> >> I believe explicitly setting the serialVersionUid to the original (and
> >> keeping it that way forever) and adding the ability to decode prior forms
> >> will regain the ability to read the snapshot. But also this seems like
> >> something that would be part of Flink best practice documentation since
> >> naive use of Java serialization often hits this problem.
> >>
> >> Kenn
> >>
> >> On Tue, Jan 5, 2021 at 4:30 PM Kyle Weaver  wrote:
> >>
> >>> This raises a few related questions from me:
> >>>
> >>> 1. Do we claim to support resuming Flink checkpoints made with previous
> >>> Beam versions?
> >>> 2. Does 1. require full binary compatibility between different versions of
> >>> runner internals like CoderTypeSerializer?
> >>>
> >> 3. Do we have tests for 1.?
> >> Kenn
> >>
> >>
> >>> On Tue, Jan 5, 2021 at 4:05 PM Boyuan Zhang  wrote:
> >>>
>  https://github.com/apache/beam/pull/13240 seems suspicious to me.
> 
>    +Maximilian Michels  Any insights here?
> 
>  On Tue, Jan 5, 2021 at 8:48 AM Antonio Si  wrote:
> 
> > Hi,
> >
> > I would like to followup with this question to see if there is a
> > solution/workaround for this issue.
> >
> > Thanks.
> >
> > Antonio.
> >
> > On 2020/12/19 18:33:48, Antonio Si  wrote:
> >> Hi,
> >>
> >> We were using Beam v2.23 and recently, we are testing upgrade to Beam
> > v2.26. For Beam v2.26, we are passing --experiments=use_deprecated_read 
> > and
> > --fasterCopy=true.
> >> We run into this exception when we resume our pipeline:
> >>
> >> Caused by: java.io.InvalidClassException:
> > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer; 
> > local
> > class incompatible: stream classdesc serialVersionUID =
> > 5241803328188007316, local class serialVersionUID = 7247319138941746449
> >>at
> > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
> >>at
> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942)
> >>at
> > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)
> >>at
> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099)
> >>at
> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
> >>at
> > java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
> >>at
> > java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
> >>at
> > org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtil$TypeSerializerSerializationProxy.read(TypeSerializerSerializationUtil.java:301)
> >>at
> > org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtil.tryReadSerializer(TypeSerializerSerializationUtil.java:116)
> >>at
> > org.apache.flink.api.common.typeutils.TypeSerializerConfigSnapshot.readSnapshot(TypeSerializerConfigSnapshot.java:113)
> >>at
> > org.apache.flink.api.common.typeutils.TypeSerializerSnapshot.readVersionedSnapshot(TypeSerializerSnapshot.java:174)
> >>at
> > org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.deserializeV2(TypeSerializerSnapshotSerializationUtil.java:179)
> >>at
> 

Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Chamikara Jayalath
+1 (non-binding).

Tested Java quickstart for Direct/Dataflow runners and multi-language
pipelines.

Thanks,
Cham

On Thu, Jan 7, 2021 at 10:47 AM Pablo Estrada  wrote:

> Thanks Valentyn,
> these were built using Maven 3.6.0, and OpenJDK version 1.8.0_232.
> Best
> -P.
>
> On Thu, Jan 7, 2021 at 9:45 AM Yichi Zhang  wrote:
>
>> +1, I verified the python mobile game walkthrough on core runners.
>>
>> On Thu, Jan 7, 2021 at 8:29 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> On Thu, Jan 7, 2021 at 8:06 AM Ismaël Mejía  wrote:
>>>
 > Also I wonder if we now need to clarify both Java 8 and Java 11
 versions separately?

 You mean for the docker images? Otherwise we should not be using Java
 11 at all to produce the artifacts.

>>>
>>> Thanks for clarification, yes, looks like Docker images are the only
>>> artifacts where this concern applies.  The Docker images include several
>>> jars[1], that I believe are currently built locally (by a release manager
>>> at container image build time). Are the jars included in Java 11 images
>>> built with Java 11? If so, is it worth calling out the compiler?  Also,
>>> perhaps we should look into building release artifacts using Jenkins or
>>> Github Actions, with a preconfigured version of compilers, instead of
>>> whatever version is installed on the release manager's machine.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/container/Dockerfile#L24-L33
>>>
>>>

 On Thu, Jan 7, 2021 at 4:51 PM Valentyn Tymofieiev 
 wrote:
 >
 > Noting that announcement does not include the version of the Java
 compilers used - looks like the release guide still requires it:
 >
 > * Java artifacts were built with Maven MAVEN_VERSION and
 OpenJDK/Oracle JDK JDK_VERSION.
 >
 >
 > Could you please add this info to this thread for posterity?
 >
 > Also I wonder if we now need to clarify both Java 8 and Java 11
 versions separately?
 >
 > Other than that, +1 from me. Ran several mobile gaming pipelines on
 Direct and Dataflow runners with Python 3.8.
 >
 > On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský  wrote:
 >>
 >> +1 (non-binding).
 >>
 >> I've validated the RC against my dependent projects (mainly Java
 SDK, Flink and DirectRunner).
 >>
 >> Thanks,
 >>
 >>  Jan
 >>
 >> On 1/7/21 2:15 AM, Ahmet Altay wrote:
 >>
 >> +1 (binding) - validated python quickstarts.
 >>
 >> Thank you Pablo.
 >>
 >> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada 
 wrote:
 >>>
 >>> +1 (binding)
 >>> I've built and unit tested existing Dataflow Templates with the new
 version.
 >>> Best
 >>> -P.
 >>>
 >>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada 
 wrote:
 
  Hi everyone,
  Please review and vote on the release candidate #4 for the version
 2.27.0, as follows:
  [ ] +1, Approve the release
  [ ] -1, Do not approve the release (please provide specific
 comments)
 
  NOTE. What happened to RC #2? I started building RC2 before
 completing all the cherry-picks, so the tag for RC2 was created on an
 incorrect commit.
 
  NOTE. What happened to RC #3? I started building RC3, but a new
 bug was discovered (BEAM-11569) that required amending the branch. Thus
 this is now RC4.
 
  Reviewers are encouraged to test their own use cases with the
 release candidate, and vote +1
   if no issues are found.
 
  The complete staging area is available for your review, which
 includes:
  * JIRA release notes [1],
  * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
  * all artifacts to be deployed to the Maven Central Repository [4],
  * source code tag "v2.27.0-RC4" [5],
  * website pull request listing the release [6], publishing the API
 reference manual [7], and the blog post [8].
  * Python artifacts are deployed along with the source release to
 the dist.apache.org [2].
  * Validation sheet with a tab for 2.27.0 release to help with
 validation [9].
  * Docker images published to Docker Hub [10].
 
  The vote will be open for at least 72 hours, but given the
 holidays, we will likely extend for a few more days. The release will be
 adopted by majority approval, with at least 3 PMC affirmative votes.
 
  Thanks,
  -P.
 
  [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
  [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
  [3] https://dist.apache.org/repos/dist/release/beam/KEYS
  [4]
 

Re: Why are all the website files failing RAT?

2021-01-07 Thread Ahmet Altay
On Thu, Jan 7, 2021 at 12:25 PM Kyle Weaver  wrote:

> I repro'd this by running "./gradlew :rat". If I understand correctly,
> these are all Hugo dependencies that are downloaded automatically. I looked
> at a few of them and they do have licenses, but I guess rat just doesn't
> recognize them for whatever reason.
>

Do we know why precommits are not failing? They are running rat. And what
changed recently?


>
> The rat task is supposed to ignore everything in beam/.gitignore [1], but
> the website directory has its own .gitignore [2]. The website's .gitignore
> includes www/node_modules. I don't think there's really a need for the
> website to have its own .gitignore, so one potential fix would be to move
> the website .gitignore rules into the root one.
>

Merging gitignores sounds good. Another option we could add rat exclusions
using the second gitignore list as well.


> I filed a JIRA for this as well:
> https://issues.apache.org/jira/browse/BEAM-11582.
>

Are you interested in doing one of the above things? :)


>
> [1]
> https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/build.gradle#L119
> [2]
> https://github.com/apache/beam/blob/2ad28542ec051dac6ebb8f0c6ea0c1b86a70f2cf/website/.gitignore#L16
>
> On Thu, Jan 7, 2021 at 9:50 AM Reuven Lax  wrote:
>
>> My builds are failing recently, with complaints of 2445 license
>> violations. It appears to be a bunch of website files, see below. Any idea
>> what is happening here?
>>
>> Unapproved Licenses:
>> /Users/relax/beam/website/www/node_modules/callsites/index.js
>> /Users/relax/beam/website/www/node_modules/callsites/readme.md
>> /Users/relax/beam/website/www/node_modules/reusify/test.js
>> /Users/relax/beam/website/www/node_modules/reusify/README.md
>>
>> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/reuseNoCodeFunction.js
>> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/fib.js
>>
>> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/createNoCodeFunction.js
>> /Users/relax/beam/website/www/node_modules/reusify/.coveralls.yml
>> /Users/relax/beam/website/www/node_modules/reusify/reusify.js
>> /Users/relax/beam/website/www/node_modules/reusify/.travis.yml
>> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.npmignore
>> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.jshintignore
>> /Users/relax/beam/website/www/node_modules/pretty-hrtime/index.js
>> /Users/relax/beam/website/www/node_modules/@types/color-name/README.md
>>
>


Website is MIT licensed?

2021-01-07 Thread Kyle Weaver
Hi all,

I discovered the Beam website's NPM module purports to be MIT licensed [1].
I don't think we publish the website's NPM package anywhere, but wouldn't
the website be Apache 2.0 licensed like everything else?

Thanks,
Kyle

[1]
https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/website/www/package.json#L6


Re: Why are all the website files failing RAT?

2021-01-07 Thread Kyle Weaver
I repro'd this by running "./gradlew :rat". If I understand correctly,
these are all Hugo dependencies that are downloaded automatically. I looked
at a few of them and they do have licenses, but I guess rat just doesn't
recognize them for whatever reason.

The rat task is supposed to ignore everything in beam/.gitignore [1], but
the website directory has its own .gitignore [2]. The website's .gitignore
includes www/node_modules. I don't think there's really a need for the
website to have its own .gitignore, so one potential fix would be to move
the website .gitignore rules into the root one.

I filed a JIRA for this as well:
https://issues.apache.org/jira/browse/BEAM-11582.

[1]
https://github.com/apache/beam/blob/30f9a607509940f78459e4fba847617399780246/build.gradle#L119
[2]
https://github.com/apache/beam/blob/2ad28542ec051dac6ebb8f0c6ea0c1b86a70f2cf/website/.gitignore#L16

On Thu, Jan 7, 2021 at 9:50 AM Reuven Lax  wrote:

> My builds are failing recently, with complaints of 2445 license
> violations. It appears to be a bunch of website files, see below. Any idea
> what is happening here?
>
> Unapproved Licenses:
> /Users/relax/beam/website/www/node_modules/callsites/index.js
> /Users/relax/beam/website/www/node_modules/callsites/readme.md
> /Users/relax/beam/website/www/node_modules/reusify/test.js
> /Users/relax/beam/website/www/node_modules/reusify/README.md
>
> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/reuseNoCodeFunction.js
> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/fib.js
>
> /Users/relax/beam/website/www/node_modules/reusify/benchmarks/createNoCodeFunction.js
> /Users/relax/beam/website/www/node_modules/reusify/.coveralls.yml
> /Users/relax/beam/website/www/node_modules/reusify/reusify.js
> /Users/relax/beam/website/www/node_modules/reusify/.travis.yml
> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.npmignore
> /Users/relax/beam/website/www/node_modules/pretty-hrtime/.jshintignore
> /Users/relax/beam/website/www/node_modules/pretty-hrtime/index.js
> /Users/relax/beam/website/www/node_modules/@types/color-name/README.md
>


Re: Standarizing the "Runner" concept across website content

2021-01-07 Thread Austin Bennett
To those unfamiliar with these concepts, I generally conflate everything to
a "Runner" to keep things simple.  Though, also mention "execution engine"
at times.  Glad there appears to be concrete consensus on how we want to
talk about this.  It will also help guide me in being consistent :-)



On Wed, Jan 6, 2021 at 3:05 PM Griselda Cuevas  wrote:

> Thank you all for this productive conversation!
>
> Interestingly enough, a usability study we ran for Apache Beam (more
> details coming soon) pointed out that our documentation and website assume
> that the readers will be already familiar with Data Processing basic
> concepts such as engines, pipelines, etc. So introducing a glossary and
> even rethinking how we add this concepts into our new documentation is a
> good practice to have in mind.
>
> In the meantime, I will adopt the suggestion of differentiating between
> engine and runner. The first application I made of this is in the copy for
> the home page, which you can find as an attached file in this Jira ticket
> [1] in case you want to add comments/suggestions.
>
> The home page is the most important page in the website, as it's the one
> that explains Beam to the world and markets it's features, so appreciate
> feedback there too.
>
> Thanks everyone!
>
> [1]
> https://issues.apache.org/jira/browse/BEAM-11346?jql=project%20%3D%20beam%20AND%20assignee%20%3D%20gris%20ORDER%20BY%20priority%20DESC
>
> On Wed, 6 Jan 2021 at 13:33, Kenneth Knowles  wrote:
>
>>
>>
>> On Wed, Jan 6, 2021 at 12:28 PM Robert Burke  wrote:
>>
>>> +1 on consolidating and being consistent with our terms.
>>>
>>> I've always considered them (Runner/Engine) synonymous. From a user
>>> perspective, an engine without a runner isn't any good for their beam
>>> pipeline. That there's an adapter is an implementation detail in some
>>> instances. I do appreciate not using Adapter a term, avoiding confusing
>>> descriptions.
>>>
>>> However, if we make the change and there's a clear glossary of terms
>>> somewhere then
>>>
>>> That puts the lifecycle of a pipeline to be (loosely) something like...
>>>
>>> A Beam User authors Pipelines by writing DoFns, adding them as
>>> PTransforms connected by PCollections into a Pipeline using a Beam SDK. An
>>> SDK converts the pipeline into a portable representation, and submit it to
>>> the Job Management Service of a Beam Runner. A Beam Runner translates the
>>> portable pipeline representation into terms an underlying Engine
>>> understands for Execution. The Beam Runner also reverses this translation
>>> when the Engine delegates tasks to workers, so that the Beam SDKs can
>>> execute the user's DoFns in keeping with the Beam Semantics.
>>>
>>
>> An explicit glossary is a great idea to combine with standardizing
>> terminology across the site. I think the important context is that most of
>> the engines already existed before Beam and many of them are more
>> well-known. In fact, a pretty good way for a user to understand the essence
>> of what Beam is about is by taking a look at all the engines for which
>> there are Beam runners :-)
>>
>> Engine: a system/product for doing [big] data processing
>> Pipeline: user authors this logic that says what they want to compute (I
>> think the fact that it is a DAG of PTransforms is relevant but we can get
>> away with omitting it for the high-level view and to avoid introducing the
>> term PTransform too early)
>> Runner: executes a Beam pipeline on an engine (agree that "adapter" is
>> too generic)
>>
>> I'd say below that level of granularity is getting into things that you
>> need to know only after you have started writing pipelines. Possibly you
>> need to introduce SDK harness to make clear that Beam pipelines are
>> inherently multi-language/multi-runtime, even if the engine isn't (my
>> personal opinion is that "UDF server" is the best understood terminology
>> for this, and so much better that it is never too late to abandon the
>> cryptic term "SDK harness").
>>
>> Kenn
>>
>>
>>> (Not covered, bundles etc, but you get the idea...)
>>>
>>> On Wed, Jan 6, 2021, 11:16 AM Robert Bradshaw 
>>> wrote:
>>>
 +1 to keeping the distinction between Runner and Engine as Kenn
 described, and cleaning up the site with these in mind (I don't think the
 term engine is widely used yet).

 On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang  wrote:

> I agree with what kenn said, in most cases I would refer to the term
> runner as the adapter for translating user's pipeline code into a job
> representation and submitting it to the execution engine. Though in some
> cases they may still be used interchangeably such as direct runner?
>
> On Wed, Jan 6, 2021 at 11:02 AM Kenneth Knowles 
> wrote:
>
>> I personally try to always distinguish two concepts: the thing doing
>> the computing (like Spark or Flink), and the adapter for running a Beam
>> pipeline (like SparkRunner or FlinkRunner). I use the term 

Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Pablo Estrada
Thanks Valentyn,
these were built using Maven 3.6.0, and OpenJDK version 1.8.0_232.
Best
-P.

On Thu, Jan 7, 2021 at 9:45 AM Yichi Zhang  wrote:

> +1, I verified the python mobile game walkthrough on core runners.
>
> On Thu, Jan 7, 2021 at 8:29 AM Valentyn Tymofieiev 
> wrote:
>
>> On Thu, Jan 7, 2021 at 8:06 AM Ismaël Mejía  wrote:
>>
>>> > Also I wonder if we now need to clarify both Java 8 and Java 11
>>> versions separately?
>>>
>>> You mean for the docker images? Otherwise we should not be using Java
>>> 11 at all to produce the artifacts.
>>>
>>
>> Thanks for clarification, yes, looks like Docker images are the only
>> artifacts where this concern applies.  The Docker images include several
>> jars[1], that I believe are currently built locally (by a release manager
>> at container image build time). Are the jars included in Java 11 images
>> built with Java 11? If so, is it worth calling out the compiler?  Also,
>> perhaps we should look into building release artifacts using Jenkins or
>> Github Actions, with a preconfigured version of compilers, instead of
>> whatever version is installed on the release manager's machine.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/container/Dockerfile#L24-L33
>>
>>
>>>
>>> On Thu, Jan 7, 2021 at 4:51 PM Valentyn Tymofieiev 
>>> wrote:
>>> >
>>> > Noting that announcement does not include the version of the Java
>>> compilers used - looks like the release guide still requires it:
>>> >
>>> > * Java artifacts were built with Maven MAVEN_VERSION and
>>> OpenJDK/Oracle JDK JDK_VERSION.
>>> >
>>> >
>>> > Could you please add this info to this thread for posterity?
>>> >
>>> > Also I wonder if we now need to clarify both Java 8 and Java 11
>>> versions separately?
>>> >
>>> > Other than that, +1 from me. Ran several mobile gaming pipelines on
>>> Direct and Dataflow runners with Python 3.8.
>>> >
>>> > On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský  wrote:
>>> >>
>>> >> +1 (non-binding).
>>> >>
>>> >> I've validated the RC against my dependent projects (mainly Java SDK,
>>> Flink and DirectRunner).
>>> >>
>>> >> Thanks,
>>> >>
>>> >>  Jan
>>> >>
>>> >> On 1/7/21 2:15 AM, Ahmet Altay wrote:
>>> >>
>>> >> +1 (binding) - validated python quickstarts.
>>> >>
>>> >> Thank you Pablo.
>>> >>
>>> >> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada 
>>> wrote:
>>> >>>
>>> >>> +1 (binding)
>>> >>> I've built and unit tested existing Dataflow Templates with the new
>>> version.
>>> >>> Best
>>> >>> -P.
>>> >>>
>>> >>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada 
>>> wrote:
>>> 
>>>  Hi everyone,
>>>  Please review and vote on the release candidate #4 for the version
>>> 2.27.0, as follows:
>>>  [ ] +1, Approve the release
>>>  [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> 
>>>  NOTE. What happened to RC #2? I started building RC2 before
>>> completing all the cherry-picks, so the tag for RC2 was created on an
>>> incorrect commit.
>>> 
>>>  NOTE. What happened to RC #3? I started building RC3, but a new bug
>>> was discovered (BEAM-11569) that required amending the branch. Thus this is
>>> now RC4.
>>> 
>>>  Reviewers are encouraged to test their own use cases with the
>>> release candidate, and vote +1
>>>   if no issues are found.
>>> 
>>>  The complete staging area is available for your review, which
>>> includes:
>>>  * JIRA release notes [1],
>>>  * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
>>>  * all artifacts to be deployed to the Maven Central Repository [4],
>>>  * source code tag "v2.27.0-RC4" [5],
>>>  * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>>  * Python artifacts are deployed along with the source release to
>>> the dist.apache.org [2].
>>>  * Validation sheet with a tab for 2.27.0 release to help with
>>> validation [9].
>>>  * Docker images published to Docker Hub [10].
>>> 
>>>  The vote will be open for at least 72 hours, but given the
>>> holidays, we will likely extend for a few more days. The release will be
>>> adopted by majority approval, with at least 3 PMC affirmative votes.
>>> 
>>>  Thanks,
>>>  -P.
>>> 
>>>  [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
>>>  [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
>>>  [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>  [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1149/
>>>  [5] https://github.com/apache/beam/tree/v2.27.0-RC4
>>>  [6] https://github.com/apache/beam/pull/13602
>>>  [7] https://github.com/apache/beam-site/pull/610
>>>  [8] https://github.com/apache/beam/pull/13603
>>>  [9]
>>> 

Why are all the website files failing RAT?

2021-01-07 Thread Reuven Lax
My builds are failing recently, with complaints of 2445 license violations.
It appears to be a bunch of website files, see below. Any idea what is
happening here?

Unapproved Licenses:
/Users/relax/beam/website/www/node_modules/callsites/index.js
/Users/relax/beam/website/www/node_modules/callsites/readme.md
/Users/relax/beam/website/www/node_modules/reusify/test.js
/Users/relax/beam/website/www/node_modules/reusify/README.md
/Users/relax/beam/website/www/node_modules/reusify/benchmarks/reuseNoCodeFunction.js
/Users/relax/beam/website/www/node_modules/reusify/benchmarks/fib.js
/Users/relax/beam/website/www/node_modules/reusify/benchmarks/createNoCodeFunction.js
/Users/relax/beam/website/www/node_modules/reusify/.coveralls.yml
/Users/relax/beam/website/www/node_modules/reusify/reusify.js
/Users/relax/beam/website/www/node_modules/reusify/.travis.yml
/Users/relax/beam/website/www/node_modules/pretty-hrtime/.npmignore
/Users/relax/beam/website/www/node_modules/pretty-hrtime/.jshintignore
/Users/relax/beam/website/www/node_modules/pretty-hrtime/index.js
/Users/relax/beam/website/www/node_modules/@types/color-name/README.md


Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Yichi Zhang
+1, I verified the python mobile game walkthrough on core runners.

On Thu, Jan 7, 2021 at 8:29 AM Valentyn Tymofieiev 
wrote:

> On Thu, Jan 7, 2021 at 8:06 AM Ismaël Mejía  wrote:
>
>> > Also I wonder if we now need to clarify both Java 8 and Java 11
>> versions separately?
>>
>> You mean for the docker images? Otherwise we should not be using Java
>> 11 at all to produce the artifacts.
>>
>
> Thanks for clarification, yes, looks like Docker images are the only
> artifacts where this concern applies.  The Docker images include several
> jars[1], that I believe are currently built locally (by a release manager
> at container image build time). Are the jars included in Java 11 images
> built with Java 11? If so, is it worth calling out the compiler?  Also,
> perhaps we should look into building release artifacts using Jenkins or
> Github Actions, with a preconfigured version of compilers, instead of
> whatever version is installed on the release manager's machine.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/container/Dockerfile#L24-L33
>
>
>>
>> On Thu, Jan 7, 2021 at 4:51 PM Valentyn Tymofieiev 
>> wrote:
>> >
>> > Noting that announcement does not include the version of the Java
>> compilers used - looks like the release guide still requires it:
>> >
>> > * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle
>> JDK JDK_VERSION.
>> >
>> >
>> > Could you please add this info to this thread for posterity?
>> >
>> > Also I wonder if we now need to clarify both Java 8 and Java 11
>> versions separately?
>> >
>> > Other than that, +1 from me. Ran several mobile gaming pipelines on
>> Direct and Dataflow runners with Python 3.8.
>> >
>> > On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský  wrote:
>> >>
>> >> +1 (non-binding).
>> >>
>> >> I've validated the RC against my dependent projects (mainly Java SDK,
>> Flink and DirectRunner).
>> >>
>> >> Thanks,
>> >>
>> >>  Jan
>> >>
>> >> On 1/7/21 2:15 AM, Ahmet Altay wrote:
>> >>
>> >> +1 (binding) - validated python quickstarts.
>> >>
>> >> Thank you Pablo.
>> >>
>> >> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada 
>> wrote:
>> >>>
>> >>> +1 (binding)
>> >>> I've built and unit tested existing Dataflow Templates with the new
>> version.
>> >>> Best
>> >>> -P.
>> >>>
>> >>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada 
>> wrote:
>> 
>>  Hi everyone,
>>  Please review and vote on the release candidate #4 for the version
>> 2.27.0, as follows:
>>  [ ] +1, Approve the release
>>  [ ] -1, Do not approve the release (please provide specific comments)
>> 
>>  NOTE. What happened to RC #2? I started building RC2 before
>> completing all the cherry-picks, so the tag for RC2 was created on an
>> incorrect commit.
>> 
>>  NOTE. What happened to RC #3? I started building RC3, but a new bug
>> was discovered (BEAM-11569) that required amending the branch. Thus this is
>> now RC4.
>> 
>>  Reviewers are encouraged to test their own use cases with the
>> release candidate, and vote +1
>>   if no issues are found.
>> 
>>  The complete staging area is available for your review, which
>> includes:
>>  * JIRA release notes [1],
>>  * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
>>  * all artifacts to be deployed to the Maven Central Repository [4],
>>  * source code tag "v2.27.0-RC4" [5],
>>  * website pull request listing the release [6], publishing the API
>> reference manual [7], and the blog post [8].
>>  * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>>  * Validation sheet with a tab for 2.27.0 release to help with
>> validation [9].
>>  * Docker images published to Docker Hub [10].
>> 
>>  The vote will be open for at least 72 hours, but given the holidays,
>> we will likely extend for a few more days. The release will be adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>> 
>>  Thanks,
>>  -P.
>> 
>>  [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
>>  [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
>>  [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>  [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1149/
>>  [5] https://github.com/apache/beam/tree/v2.27.0-RC4
>>  [6] https://github.com/apache/beam/pull/13602
>>  [7] https://github.com/apache/beam-site/pull/610
>>  [8] https://github.com/apache/beam/pull/13603
>>  [9]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=194829106
>>  [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>
>


Re: Nexmark ratelimiting not working

2021-01-07 Thread Andrew Pilloud
It looks like the 'isRateLimited' flag was used on these tests inside
google ~5 years ago, but I don't think we've ever used it in Beam. I don't
believe there are any tests so it doesn't really surprise me that it is
broken.

The configs we normally run nexmark on Flink with are here:
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Flink.groovy

On Thu, Jan 7, 2021 at 6:36 AM Teodor Spæren 
wrote:

> Hello and happy new years!
>
> I've been trying to use Nexmark for evaluating some performance
> improvements I've made to Beam. I've discovered a problem where Nexmark
> won't work if you enable the ratelimiting mode. I've filled out a bug
> report for this[1].
>
> To reproduce the problem is easy, simply run the normal nexmark suite
> with ratelimiting mode enable and see that the number of results you get
> is way lower than it should be.
>
> I've reached the limit of what I'm able to debug. Is this reproducible
> for anyone else and does anyone have a clue about why this might happen?
>
> Any help is very much appreciated!
>
> Teodor
>
> [1]: https://issues.apache.org/jira/browse/BEAM-11547
>


Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Valentyn Tymofieiev
On Thu, Jan 7, 2021 at 8:06 AM Ismaël Mejía  wrote:

> > Also I wonder if we now need to clarify both Java 8 and Java 11 versions
> separately?
>
> You mean for the docker images? Otherwise we should not be using Java
> 11 at all to produce the artifacts.
>

Thanks for clarification, yes, looks like Docker images are the only
artifacts where this concern applies.  The Docker images include several
jars[1], that I believe are currently built locally (by a release manager
at container image build time). Are the jars included in Java 11 images
built with Java 11? If so, is it worth calling out the compiler?  Also,
perhaps we should look into building release artifacts using Jenkins or
Github Actions, with a preconfigured version of compilers, instead of
whatever version is installed on the release manager's machine.

[1]
https://github.com/apache/beam/blob/master/sdks/java/container/Dockerfile#L24-L33


>
> On Thu, Jan 7, 2021 at 4:51 PM Valentyn Tymofieiev 
> wrote:
> >
> > Noting that announcement does not include the version of the Java
> compilers used - looks like the release guide still requires it:
> >
> > * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle
> JDK JDK_VERSION.
> >
> >
> > Could you please add this info to this thread for posterity?
> >
> > Also I wonder if we now need to clarify both Java 8 and Java 11 versions
> separately?
> >
> > Other than that, +1 from me. Ran several mobile gaming pipelines on
> Direct and Dataflow runners with Python 3.8.
> >
> > On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský  wrote:
> >>
> >> +1 (non-binding).
> >>
> >> I've validated the RC against my dependent projects (mainly Java SDK,
> Flink and DirectRunner).
> >>
> >> Thanks,
> >>
> >>  Jan
> >>
> >> On 1/7/21 2:15 AM, Ahmet Altay wrote:
> >>
> >> +1 (binding) - validated python quickstarts.
> >>
> >> Thank you Pablo.
> >>
> >> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada 
> wrote:
> >>>
> >>> +1 (binding)
> >>> I've built and unit tested existing Dataflow Templates with the new
> version.
> >>> Best
> >>> -P.
> >>>
> >>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada 
> wrote:
> 
>  Hi everyone,
>  Please review and vote on the release candidate #4 for the version
> 2.27.0, as follows:
>  [ ] +1, Approve the release
>  [ ] -1, Do not approve the release (please provide specific comments)
> 
>  NOTE. What happened to RC #2? I started building RC2 before
> completing all the cherry-picks, so the tag for RC2 was created on an
> incorrect commit.
> 
>  NOTE. What happened to RC #3? I started building RC3, but a new bug
> was discovered (BEAM-11569) that required amending the branch. Thus this is
> now RC4.
> 
>  Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1
>   if no issues are found.
> 
>  The complete staging area is available for your review, which
> includes:
>  * JIRA release notes [1],
>  * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
>  * all artifacts to be deployed to the Maven Central Repository [4],
>  * source code tag "v2.27.0-RC4" [5],
>  * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
>  * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
>  * Validation sheet with a tab for 2.27.0 release to help with
> validation [9].
>  * Docker images published to Docker Hub [10].
> 
>  The vote will be open for at least 72 hours, but given the holidays,
> we will likely extend for a few more days. The release will be adopted by
> majority approval, with at least 3 PMC affirmative votes.
> 
>  Thanks,
>  -P.
> 
>  [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
>  [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
>  [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>  [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1149/
>  [5] https://github.com/apache/beam/tree/v2.27.0-RC4
>  [6] https://github.com/apache/beam/pull/13602
>  [7] https://github.com/apache/beam-site/pull/610
>  [8] https://github.com/apache/beam/pull/13603
>  [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=194829106
>  [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Jean-Baptiste Onofre
+1 (binding)

Regards
JB

> Le 6 janv. 2021 à 08:17, Pablo Estrada  a écrit :
> 
> Hi everyone,
> Please review and vote on the release candidate #4 for the version 2.27.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> NOTE. What happened to RC #2? I started building RC2 before completing all 
> the cherry-picks, so the tag for RC2 was created on an incorrect commit.
> 
> NOTE. What happened to RC #3? I started building RC3, but a new bug was 
> discovered (BEAM-11569) that required amending the branch. Thus this is now 
> RC4.
> 
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1
>  if no issues are found.
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.27.0-RC4" [5],
> * website pull request listing the release [6], publishing the API reference 
> manual [7], and the blog post [8].
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2].
> * Validation sheet with a tab for 2.27.0 release to help with validation [9].
> * Docker images published to Docker Hub [10].
> 
> The vote will be open for at least 72 hours, but given the holidays, we will 
> likely extend for a few more days. The release will be adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> -P.
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
>  
> 
>  
> [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1149/ 
>  
> [5] https://github.com/apache/beam/tree/v2.27.0-RC4 
>  
> [6] https://github.com/apache/beam/pull/13602 
>  
> [7] https://github.com/apache/beam-site/pull/610 
>  
> [8] https://github.com/apache/beam/pull/13603 
>  
> [9] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=194829106
>  
> 
>  
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image 
> 


Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Ismaël Mejía
> Also I wonder if we now need to clarify both Java 8 and Java 11 versions 
> separately?

You mean for the docker images? Otherwise we should not be using Java
11 at all to produce the artifacts.

On Thu, Jan 7, 2021 at 4:51 PM Valentyn Tymofieiev  wrote:
>
> Noting that announcement does not include the version of the Java compilers 
> used - looks like the release guide still requires it:
>
> * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle JDK 
> JDK_VERSION.
>
>
> Could you please add this info to this thread for posterity?
>
> Also I wonder if we now need to clarify both Java 8 and Java 11 versions 
> separately?
>
> Other than that, +1 from me. Ran several mobile gaming pipelines on Direct 
> and Dataflow runners with Python 3.8.
>
> On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský  wrote:
>>
>> +1 (non-binding).
>>
>> I've validated the RC against my dependent projects (mainly Java SDK, Flink 
>> and DirectRunner).
>>
>> Thanks,
>>
>>  Jan
>>
>> On 1/7/21 2:15 AM, Ahmet Altay wrote:
>>
>> +1 (binding) - validated python quickstarts.
>>
>> Thank you Pablo.
>>
>> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada  wrote:
>>>
>>> +1 (binding)
>>> I've built and unit tested existing Dataflow Templates with the new version.
>>> Best
>>> -P.
>>>
>>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada  wrote:

 Hi everyone,
 Please review and vote on the release candidate #4 for the version 2.27.0, 
 as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 NOTE. What happened to RC #2? I started building RC2 before completing all 
 the cherry-picks, so the tag for RC2 was created on an incorrect commit.

 NOTE. What happened to RC #3? I started building RC3, but a new bug was 
 discovered (BEAM-11569) that required amending the branch. Thus this is 
 now RC4.

 Reviewers are encouraged to test their own use cases with the release 
 candidate, and vote +1
  if no issues are found.

 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org 
 [2], which is signed with the key with fingerprint 
 C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.27.0-RC4" [5],
 * website pull request listing the release [6], publishing the API 
 reference manual [7], and the blog post [8].
 * Python artifacts are deployed along with the source release to the 
 dist.apache.org [2].
 * Validation sheet with a tab for 2.27.0 release to help with validation 
 [9].
 * Docker images published to Docker Hub [10].

 The vote will be open for at least 72 hours, but given the holidays, we 
 will likely extend for a few more days. The release will be adopted by 
 majority approval, with at least 3 PMC affirmative votes.

 Thanks,
 -P.

 [1] 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
 [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4] https://repository.apache.org/content/repositories/orgapachebeam-1149/
 [5] https://github.com/apache/beam/tree/v2.27.0-RC4
 [6] https://github.com/apache/beam/pull/13602
 [7] https://github.com/apache/beam-site/pull/610
 [8] https://github.com/apache/beam/pull/13603
 [9] 
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=194829106
 [10] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Valentyn Tymofieiev
Noting that announcement does not include the version of the Java compilers
used - looks like the release guide still requires it:

* Java artifacts were built with Maven MAVEN_VERSION and
OpenJDK/Oracle JDK JDK_VERSION.


Could you please add this info to this thread for posterity?

Also I wonder if we now need to clarify both Java 8 and Java 11 versions
separately?

Other than that, +1 from me. Ran several mobile gaming pipelines on Direct
and Dataflow runners with Python 3.8.

On Thu, Jan 7, 2021 at 12:49 AM Jan Lukavský  wrote:

> +1 (non-binding).
>
> I've validated the RC against my dependent projects (mainly Java SDK,
> Flink and DirectRunner).
>
> Thanks,
>
>  Jan
> On 1/7/21 2:15 AM, Ahmet Altay wrote:
>
> +1 (binding) - validated python quickstarts.
>
> Thank you Pablo.
>
> On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada  wrote:
>
>> +1 (binding)
>> I've built and unit tested existing Dataflow Templates with the new
>> version.
>> Best
>> -P.
>>
>> On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada  wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #4 for the
>>> version 2.27.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> *NOTE*. What happened to RC #2? I started building RC2 before
>>> completing all the cherry-picks, so the tag for RC2 was created on an
>>> incorrect commit.
>>>
>>> *NOTE*. What happened to RC #3? I started building RC3, but a new bug
>>> was discovered (BEAM-11569) that required amending the branch. Thus this is
>>> now RC4.
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1
>>>  if no issues are found.
>>>
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org [2],
>>> which is signed with the key with fingerprint
>>> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.27.0-RC4" [5],
>>> * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> * Validation sheet with a tab for 2.27.0 release to help with validation
>>> [9].
>>> * Docker images published to Docker Hub [10].
>>>
>>> The vote will be open for at least 72 hours, but given the holidays, we
>>> will likely extend for a few more days. The release will be adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> -P.
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380
>>>
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1149/
>>> [5] https://github.com/apache/beam/tree/v2.27.0-RC4
>>> [6] https://github.com/apache/beam/pull/13602
>>> [7] https://github.com/apache/beam-site/pull/610
>>> [8] https://github.com/apache/beam/pull/13603
>>> [9]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=194829106
>>>
>>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>
>>


Nexmark ratelimiting not working

2021-01-07 Thread Teodor Spæren

Hello and happy new years!

I've been trying to use Nexmark for evaluating some performance 
improvements I've made to Beam. I've discovered a problem where Nexmark 
won't work if you enable the ratelimiting mode. I've filled out a bug 
report for this[1].


To reproduce the problem is easy, simply run the normal nexmark suite 
with ratelimiting mode enable and see that the number of results you get 
is way lower than it should be.


I've reached the limit of what I'm able to debug. Is this reproducible 
for anyone else and does anyone have a clue about why this might happen?


Any help is very much appreciated!

Teodor

[1]: https://issues.apache.org/jira/browse/BEAM-11547


Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-07 Thread Maximilian Michels

Thanks for mentioning me here @Boyan.

In Beam there is no guarantee that checkpoints work across Beam 
releases. Checkpoint compatibility can break due to a lot of reasons 
(primarily DAG changes and serializer changes). Even though in this case 
the serialization id might have guaranteed compatibility, we make 
internal changes to Beam all the time. There is currently no process 
that we follow to ensure compatibility.


I do want to note that Flink has a serializer migration strategy which 
we currently do not leverage: 
https://github.com/apache/beam/blob/d8966d640549932d7551461ff59fa1085730f768/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java#L182


However, this requires that in addition to the new serializer, the old 
serializer is kept around. Flink will then migrate the state by reading 
first with the old serializer and then subsequently writing with the new 
one.


-Max

On 07.01.21 09:43, Jan Lukavský wrote:

Hi Antonio,

can you please create one?

Thanks,

  Jan

On 1/6/21 10:31 PM, Antonio Si wrote:
Thanks for the information. Do we have a jira to track this issue or 
do you want me to create a jira for this?


Thanks.

Antonio.

On 2021/01/06 17:59:47, Kenneth Knowles  wrote:

Agree with Boyuan & Kyle. That PR is the problem, and we probably do not
have adequate testing. We have a cultural understanding of not breaking
encoded data forms but this is the encoded form of the 
TypeSerializer, and

actually there are two problems.

1. When you have a serialized object that does not have the
serialVersionUid explicitly set, the UID is generated based on many 
details

that are irrelevant for binary compatibility. Any Java-serialized object
that is intended for anything other than transient transmission 
*must* have

a serialVersionUid set and an explicit serialized form. Else it is
completely normal for it to break due to irrelevant changes. The
serialVersionUid has no mechanism for upgrade/downgrade so you *must* 
keep
it the same forever, and any versioning or compat scheme exists 
within the

single serialVersionUid.
2. In this case there was an actual change to the fields of the object
stored, so you need to explicitly add the serialized form and also the
ability to read from prior serialized forms.

I believe explicitly setting the serialVersionUid to the original (and
keeping it that way forever) and adding the ability to decode prior 
forms

will regain the ability to read the snapshot. But also this seems like
something that would be part of Flink best practice documentation since
naive use of Java serialization often hits this problem.

Kenn

On Tue, Jan 5, 2021 at 4:30 PM Kyle Weaver  wrote:


This raises a few related questions from me:

1. Do we claim to support resuming Flink checkpoints made with previous
Beam versions?
2. Does 1. require full binary compatibility between different 
versions of

runner internals like CoderTypeSerializer?


3. Do we have tests for 1.?
Kenn



On Tue, Jan 5, 2021 at 4:05 PM Boyuan Zhang  wrote:


https://github.com/apache/beam/pull/13240 seems suspicious to me.

  +Maximilian Michels  Any insights here?

On Tue, Jan 5, 2021 at 8:48 AM Antonio Si  
wrote:



Hi,

I would like to followup with this question to see if there is a
solution/workaround for this issue.

Thanks.

Antonio.

On 2020/12/19 18:33:48, Antonio Si  wrote:

Hi,

We were using Beam v2.23 and recently, we are testing upgrade to 
Beam
v2.26. For Beam v2.26, we are passing 
--experiments=use_deprecated_read and

--fasterCopy=true.

We run into this exception when we resume our pipeline:

Caused by: java.io.InvalidClassException:
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer; 
local

class incompatible: stream classdesc serialVersionUID =
5241803328188007316, local class serialVersionUID = 
7247319138941746449

   at

java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)

   at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942) 


   at

java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)

   at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099) 


   at

java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)

   at

java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)

   at

java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)

   at
org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtil$TypeSerializerSerializationProxy.read(TypeSerializerSerializationUtil.java:301) 


   at
org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtil.tryReadSerializer(TypeSerializerSerializationUtil.java:116) 


   at
org.apache.flink.api.common.typeutils.TypeSerializerConfigSnapshot.readSnapshot(TypeSerializerConfigSnapshot.java:113) 


   at

Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-07 Thread Jan Lukavský

+1 (non-binding).

I've validated the RC against my dependent projects (mainly Java SDK, 
Flink and DirectRunner).


Thanks,

 Jan

On 1/7/21 2:15 AM, Ahmet Altay wrote:

+1 (binding) - validated python quickstarts.

Thank you Pablo.

On Wed, Jan 6, 2021 at 1:57 PM Pablo Estrada > wrote:


+1 (binding)
I've built and unit tested existing Dataflow Templates with the
new version.
Best
-P.

On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada mailto:pabl...@google.com>> wrote:

Hi everyone,
Please review and vote on the release candidate #4 for the
version 2.27.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific
comments)

*NOTE*. What happened to RC #2? I started building RC2 before
completing all the cherry-picks, so the tag for RC2 was
created on an incorrect commit.

*NOTE*. What happened to RC #3? I started building RC3, but a
new bug was discovered (BEAM-11569) that required amending the
branch. Thus this is now RC4.

Reviewers are encouraged to test their own use cases with the
release candidate, and vote +1
 if no issues are found.

The complete staging area is available for your review, which
includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to
dist.apache.org  [2], which is signed
with the key with fingerprint
C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
* all artifacts to be deployed to the Maven Central Repository
[4],
* source code tag "v2.27.0-RC4" [5],
* website pull request listing the release [6], publishing the
API reference manual [7], and the blog post [8].
* Python artifacts are deployed along with the source release
to the dist.apache.org  [2].
* Validation sheet with a tab for 2.27.0 release to help with
validation [9].
* Docker images published to Docker Hub [10].

The vote will be open for at least 72 hours, but given the
holidays, we will likely extend for a few more days. The
release will be adopted by majority approval, with at least 3
PMC affirmative votes.

Thanks,
-P.

[1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349380

[2] https://dist.apache.org/repos/dist/dev/beam/2.27.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4]
https://repository.apache.org/content/repositories/orgapachebeam-1149/

[5] https://github.com/apache/beam/tree/v2.27.0-RC4
[6] https://github.com/apache/beam/pull/13602
[7] https://github.com/apache/beam-site/pull/610
[8] https://github.com/apache/beam/pull/13603
[9]

https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=194829106

[10] https://hub.docker.com/search?q=apache%2Fbeam=image



Re: Compatibility between Beam v2.23 and Beam v2.26

2021-01-07 Thread Jan Lukavský

Hi Antonio,

can you please create one?

Thanks,

 Jan

On 1/6/21 10:31 PM, Antonio Si wrote:

Thanks for the information. Do we have a jira to track this issue or do you 
want me to create a jira for this?

Thanks.

Antonio.

On 2021/01/06 17:59:47, Kenneth Knowles  wrote:

Agree with Boyuan & Kyle. That PR is the problem, and we probably do not
have adequate testing. We have a cultural understanding of not breaking
encoded data forms but this is the encoded form of the TypeSerializer, and
actually there are two problems.

1. When you have a serialized object that does not have the
serialVersionUid explicitly set, the UID is generated based on many details
that are irrelevant for binary compatibility. Any Java-serialized object
that is intended for anything other than transient transmission *must* have
a serialVersionUid set and an explicit serialized form. Else it is
completely normal for it to break due to irrelevant changes. The
serialVersionUid has no mechanism for upgrade/downgrade so you *must* keep
it the same forever, and any versioning or compat scheme exists within the
single serialVersionUid.
2. In this case there was an actual change to the fields of the object
stored, so you need to explicitly add the serialized form and also the
ability to read from prior serialized forms.

I believe explicitly setting the serialVersionUid to the original (and
keeping it that way forever) and adding the ability to decode prior forms
will regain the ability to read the snapshot. But also this seems like
something that would be part of Flink best practice documentation since
naive use of Java serialization often hits this problem.

Kenn

On Tue, Jan 5, 2021 at 4:30 PM Kyle Weaver  wrote:


This raises a few related questions from me:

1. Do we claim to support resuming Flink checkpoints made with previous
Beam versions?
2. Does 1. require full binary compatibility between different versions of
runner internals like CoderTypeSerializer?


3. Do we have tests for 1.?
Kenn



On Tue, Jan 5, 2021 at 4:05 PM Boyuan Zhang  wrote:


https://github.com/apache/beam/pull/13240 seems suspicious to me.

  +Maximilian Michels  Any insights here?

On Tue, Jan 5, 2021 at 8:48 AM Antonio Si  wrote:


Hi,

I would like to followup with this question to see if there is a
solution/workaround for this issue.

Thanks.

Antonio.

On 2020/12/19 18:33:48, Antonio Si  wrote:

Hi,

We were using Beam v2.23 and recently, we are testing upgrade to Beam

v2.26. For Beam v2.26, we are passing --experiments=use_deprecated_read and
--fasterCopy=true.

We run into this exception when we resume our pipeline:

Caused by: java.io.InvalidClassException:

org.apache.beam.runners.flink.translation.types.CoderTypeSerializer; local
class incompatible: stream classdesc serialVersionUID =
5241803328188007316, local class serialVersionUID = 7247319138941746449

   at

java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)

   at

java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942)

   at

java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)

   at

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099)

   at

java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)

   at

java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)

   at

java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)

   at

org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtil$TypeSerializerSerializationProxy.read(TypeSerializerSerializationUtil.java:301)

   at

org.apache.flink.api.common.typeutils.TypeSerializerSerializationUtil.tryReadSerializer(TypeSerializerSerializationUtil.java:116)

   at

org.apache.flink.api.common.typeutils.TypeSerializerConfigSnapshot.readSnapshot(TypeSerializerConfigSnapshot.java:113)

   at

org.apache.flink.api.common.typeutils.TypeSerializerSnapshot.readVersionedSnapshot(TypeSerializerSnapshot.java:174)

   at

org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.deserializeV2(TypeSerializerSnapshotSerializationUtil.java:179)

   at

org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil$TypeSerializerSnapshotSerializationProxy.read(TypeSerializerSnapshotSerializationUtil.java:150)

   at

org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil.readSerializerSnapshot(TypeSerializerSnapshotSerializationUtil.java:76)

   at

org.apache.flink.runtime.state.metainfo.StateMetaInfoSnapshotReadersWriters$CurrentReaderImpl.readStateMetaInfoSnapshot(StateMetaInfoSnapshotReadersWriters.java:219)

   at

org.apache.flink.runtime.state.OperatorBackendSerializationProxy.read(OperatorBackendSerializationProxy.java:119)

   at

org.apache.flink.runtime.state.OperatorStateRestoreOperation.restore(OperatorStateRestoreOperation.java:83)

It looks like it is not able to