Re: Disabling Jenkins Jobs

2023-12-13 Thread Alexey Romanenko
Nevermind, it was an issue with InfluxDB credentials that perhaps appeared 
during a transfer the workflows from Jenkins to GitHub actions.
I fixed this.

—
Alexey

> On 11 Dec 2023, at 17:56, Alexey Romanenko  wrote:
> 
> Hi, 
> 
> I’ve just noticed that some workflows stopped sending metrics to Grafana with 
> an error “org.apache.beam.sdk.testutils.publishing.InfluxDBPublisher: Unable 
> to publish metrics due to error: Response code: 401. Reason: "unable to parse 
> authentication credentials”” (e.g. Nexmark, Tpcds).
> 
> Is it a known issue?
> 
> —
> Alexey
> 
>> On 27 Nov 2023, at 17:16, Yi Hu via dev  wrote:
>> 
>> Hi all,
>> 
>> Just another update that we have shutdown all precommit tests on Jenkins.
>> 
>> For now, the remaining tests running on Jenkins are PostCommit suites. They 
>> are also exercised on GitHub Action, but cannot be triggered from a pull 
>> request due to [1]. That said, once the blocker [1] is resolved, we can 
>> fully shutdown the Jenkins server. For now, I plan to stop half of the 
>> Jenkins worker nodes since developers will no longer see Jenkins jobs under 
>> their pull requests unless triggered manually.
>> 
>> [1] https://github.com/apache/beam/issues/28909
>> 
>> Regards,
>> Yi
>> 
> 



Re: Disabling Jenkins Jobs

2023-12-11 Thread Alexey Romanenko
Hi, 

I’ve just noticed that some workflows stopped sending metrics to Grafana with 
an error “org.apache.beam.sdk.testutils.publishing.InfluxDBPublisher: Unable to 
publish metrics due to error: Response code: 401. Reason: "unable to parse 
authentication credentials”” (e.g. Nexmark, Tpcds).

Is it a known issue?

—
Alexey

> On 27 Nov 2023, at 17:16, Yi Hu via dev  wrote:
> 
> Hi all,
> 
> Just another update that we have shutdown all precommit tests on Jenkins.
> 
> For now, the remaining tests running on Jenkins are PostCommit suites. They 
> are also exercised on GitHub Action, but cannot be triggered from a pull 
> request due to [1]. That said, once the blocker [1] is resolved, we can fully 
> shutdown the Jenkins server. For now, I plan to stop half of the Jenkins 
> worker nodes since developers will no longer see Jenkins jobs under their 
> pull requests unless triggered manually.
> 
> [1] https://github.com/apache/beam/issues/28909
> 
> Regards,
> Yi
> 



Re: Embeddings generation in MLTransform

2023-12-05 Thread Alexey Romanenko
You need to send a blank email to dev-unsubscr...@beam.apache.org 


—
Alexey


> On 5 Dec 2023, at 11:57, Divya Sanghi  wrote:
> 
> Can someone suggest how to unsubscribe?
> 
> On Mon, Oct 30, 2023 at 7:33 PM Anand Inguva via dev  > wrote:
>> Hi all,
>> 
>> In Apache Beam 2.50.0 Python SDK, we added MLTransform 
>> ,
>>  which is used to pre/post process data using common ML operations. Now, we 
>> are planning to generate embeddings 
>>  with ML models using MLTransform. 
>> 
>> I have created a doc 
>> 
>>  on how we can do this. Please go through the doc if interested and let me 
>> know of any feedback. 
>> 
>> Thanks,
>> Anand
>> 
>> Doc: 
>> https://docs.google.com/document/d/1En4bfbTu4rvu7LWJIKV3G33jO-xJfTdbaSFSURmQw_s/edit#heading=h.wskna8eurvjv



Re: Implementing tuple type support in for ClickHouse connector

2023-12-04 Thread Alexey Romanenko
Did you take a look by chance on 
org.apache.beam.sdk.schemas.Schema.LogicalType? Can it be helpful for your case?

> On 4 Dec 2023, at 12:02, Mark Zitnik  wrote:
> 
> Yes I know it is done in  org.apache.beam.sdk.io.clickhouse.TableSchema (Did 
> it for several other types), but since Tuple is a nested type that can hold 
> any number of other ClickHouse types I was wondering what is the best type 
> from the Apache Beam side in order to implement it.
> 
> Mark  
> 
> On Mon, Dec 4, 2023 at 12:24 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> Hi Mark,
>> 
>> What do you mean by “support” in this case? To map this ClickHouse data type 
>> to a Beam Schema data type as it’s done in 
>> org.apache.beam.sdk.io.clickhouse.TableSchema for other types or something 
>> else?
>> 
>> —
>> Alexey
>> 
>>> On 3 Dec 2023, at 10:35, Mark Zitnik >> <mailto:m...@clickhouse.com>> wrote:
>>> 
>>> Hi Team,
>>> 
>>> I am one of the committers of the ClickHouse integration team.
>>> I need to add support for Tuple in the ClickHouse connector for Apache 
>>> Beam. What is the best approval for implementing that? 
>>> Tuple(https://clickhouse.com/docs/en/sql-reference/data-types/tuple) in a 
>>> nested data 
>>> type(https://clickhouse.com/docs/en/sql-reference/data-types#data_types).
>>> If you can point me to a reference on other connectors 
>>> 
>>> Thanks
>>> -MZ
>>> 
>>>  
>> 



Re: Implementing tuple type support in for ClickHouse connector

2023-12-04 Thread Alexey Romanenko
Hi Mark,

What do you mean by “support” in this case? To map this ClickHouse data type to 
a Beam Schema data type as it’s done in 
org.apache.beam.sdk.io.clickhouse.TableSchema for other types or something else?

—
Alexey

> On 3 Dec 2023, at 10:35, Mark Zitnik  wrote:
> 
> Hi Team,
> 
> I am one of the committers of the ClickHouse integration team.
> I need to add support for Tuple in the ClickHouse connector for Apache Beam. 
> What is the best approval for implementing that? 
> Tuple(https://clickhouse.com/docs/en/sql-reference/data-types/tuple) in a 
> nested data 
> type(https://clickhouse.com/docs/en/sql-reference/data-types#data_types).
> If you can point me to a reference on other connectors 
> 
> Thanks
> -MZ
> 
>  



Re: Upgrading Avro dependencies

2023-11-15 Thread Alexey Romanenko
As I mentioned before, we always can generate version-dependent Avro classes by 
running “org.apache.avro.tool.Main" directly with “JavaExec” Gradle task.

Please, see this implementation in Avro extension: 
https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135

In this case, we don’t depend on any 3rd party plugin, that can be run only 
with one Avro version by the whole project iirc.

—
Alexey 


 
> On 15 Nov 2023, at 17:14, John Casey  wrote:
> 
> Alright, it looks like something was broken with my setup. I think a 
> straightforward upgrade is actually possible, so I'm going to continue on that
> 
> On Wed, Nov 15, 2023 at 10:17 AM John Casey  <mailto:theotherj...@google.com>> wrote:
>> So, thats the thing. I've upgraded to 1.11.3, but the plugin still seems to 
>> be generating Avro code that isn't compatible with 1.11.3.
>> 
>> Looking at the PR https://github.com/apache/beam/pull/29390, it looks like 
>> it generates avro code based on 1.9.2, which ends up being incompatible. 
>> 
>> I don't think we can continue to support every avro version with our current 
>> setup.
>> 
>> John
>> 
>> On Tue, Nov 14, 2023 at 4:20 PM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>>> Thanks! Please, let me know if you need any help on this.
>>> 
>>> —
>>> Alexey
>>> 
>>>> On 14 Nov 2023, at 17:52, John Casey >>> <mailto:theotherj...@google.com>> wrote:
>>>> 
>>>> The vulnerability said to upgrade to 1.11.3, so I think that would be my 
>>>> starting point.
>>>> 
>>>> 
>>>> On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko 
>>>> mailto:aromanenko@gmail.com>> wrote:
>>>>> 
>>>>> 
>>>>>> On 10 Nov 2023, at 19:23, John Casey >>>>> <mailto:theotherj...@google.com>> wrote:
>>>>>> 
>>>>>> I guess I'm a bit confused as to why specifically generateTestAvroJava 
>>>>>> seems to use the wrong version. I see our version specific generated 
>>>>>> code, but this action appears to be inherited from the plugin, and is 
>>>>>> configured with whichever avro version is provided. Given that I tried 
>>>>>> to just change to 1.11.3, I'm confused as to why its generating invalid 
>>>>>> java files for the provided avro version.
>>>>>> 
>>>>>> Unlike the classes generated out of the JavaExec you referenced, this 
>>>>>> appears to only generate one version of the files.
>>>>> 
>>>>> It was supposed to generate files with a specific Avro version every time 
>>>>> to run the same tests again this specific Avro version. 
>>>>> 
>>>>>> It may be that we don't need this action, but it still seems to run, as 
>>>>>> we depend on it in the applyAvroNature() action.
>>>>> 
>>>>> I started to think if we really still need this action.
>>>>> 
>>>>>> We could remove this entirely. The java exec only generates versions for 
>>>>>> pre-configured test versions anyways
>>>>> 
>>>>> Right. The point is in how many places in Beam we need to generate these 
>>>>> files and which version(s) of Avro to use?
>>>>> 
>>>>> —
>>>>> Alexey
>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko 
>>>>>> mailto:aromanenko@gmail.com>> wrote:
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> This old Avro version in Beam is a very long story. Briefly, since 
>>>>>>> initially it was toughly integrated into Java SDK “core” module then it 
>>>>>>> was not possible to upgrade an Avro version without breaking changes 
>>>>>>> for users (because of some Avro incompatible changes, as you have 
>>>>>>> noticed before). So, we decided to extract Avro-related classes from 
>>>>>>> Beam “core” to a dedicated Avro extension [2] that supports and 
>>>>>>> actually is tested with different Avro versions. More details on this 
>>>>>>> work are here [1]
>>>>>>> 
>>>>>>> Regarding auto-generated classes. Initially, we used a Gradle plugin 
>>

Re: Upgrading Avro dependencies

2023-11-14 Thread Alexey Romanenko
Thanks! Please, let me know if you need any help on this.

—
Alexey

> On 14 Nov 2023, at 17:52, John Casey  wrote:
> 
> The vulnerability said to upgrade to 1.11.3, so I think that would be my 
> starting point.
> 
> 
> On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> 
>> 
>>> On 10 Nov 2023, at 19:23, John Casey >> <mailto:theotherj...@google.com>> wrote:
>>> 
>>> I guess I'm a bit confused as to why specifically generateTestAvroJava 
>>> seems to use the wrong version. I see our version specific generated code, 
>>> but this action appears to be inherited from the plugin, and is configured 
>>> with whichever avro version is provided. Given that I tried to just change 
>>> to 1.11.3, I'm confused as to why its generating invalid java files for the 
>>> provided avro version.
>>> 
>>> Unlike the classes generated out of the JavaExec you referenced, this 
>>> appears to only generate one version of the files.
>> 
>> It was supposed to generate files with a specific Avro version every time to 
>> run the same tests again this specific Avro version. 
>> 
>>> It may be that we don't need this action, but it still seems to run, as we 
>>> depend on it in the applyAvroNature() action.
>> 
>> I started to think if we really still need this action.
>> 
>>> We could remove this entirely. The java exec only generates versions for 
>>> pre-configured test versions anyways
>> 
>> Right. The point is in how many places in Beam we need to generate these 
>> files and which version(s) of Avro to use?
>> 
>> —
>> Alexey
>> 
>>> 
>>> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>>> Hi John,
>>>> 
>>>> This old Avro version in Beam is a very long story. Briefly, since 
>>>> initially it was toughly integrated into Java SDK “core” module then it 
>>>> was not possible to upgrade an Avro version without breaking changes for 
>>>> users (because of some Avro incompatible changes, as you have noticed 
>>>> before). So, we decided to extract Avro-related classes from Beam “core” 
>>>> to a dedicated Avro extension [2] that supports and actually is tested 
>>>> with different Avro versions. More details on this work are here [1]
>>>> 
>>>> Regarding auto-generated classes. Initially, we used a Gradle plugin for 
>>>> that but it’s limited with only one Avro version per instance of this 
>>>> plugin, so it was not possible to generate these classes with different 
>>>> Avro versions. So, we do this with a special Gradle task (“JavaExec") that 
>>>> executes “org.apache.avro.tool.Main” and generate Avro classes per every 
>>>> tested Avro version [3].
>>>> 
>>>> We still keep an old Avro version 1.8.2. as a default dependency version 
>>>> but it will be overwritten if users have a more recent one as a project 
>>>> dependency in their classpath.
>>>> 
>>>> I think we need to completely remove Avro Gradle plugin (use “JavaExec” 
>>>> task to generate Avro classes with a provided Avro version instead) and 
>>>> update the default Avro version to the more recent one since now it’s not 
>>>> part of Java “core”.
>>>> 
>>>> Any thoughts?
>>>> 
>>>> —
>>>> Alexey
>>>>  
>>>> 
>>>> [1] https://github.com/apache/beam/issues/24292
>>>> [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
>>>> [3] 
>>>> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>>>> 
>>>> 
>>>> 
>>>>> On 10 Nov 2023, at 18:05, John Casey via dev >>>> <mailto:dev@beam.apache.org>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to 
>>>>> upgrade to avro 1.11.3.
>>>>> 
>>>>> Unfortunately, it seems that our auto-generated Avro test classes aren't 
>>>>> being generated properly with this new version. I've updated our avro 
>>>>> generation plugin as well, but for whatever reason, it seems that the 
>>>>> generated AvroTest file is being generated with references to classes 
>>>>> that did exist in 1.8.2, but no longer exist in 1.11.3.
>>>>> 
>>>>> It seems like our autogeneration is being run with the wrong avro 
>>>>> version, but I can't seem to find where that would be configured.
>>>>> 
>>>>> Here is the PR with my changes so far: 
>>>>> https://github.com/apache/beam/pull/29390
>>>>> 
>>>>> Is anyone familiar with what might be misconfigured here?
>>>>> 
>>>>> John
>>>> 
>> 



Re: Upgrading Avro dependencies

2023-11-13 Thread Alexey Romanenko


> On 10 Nov 2023, at 19:23, John Casey  wrote:
> 
> I guess I'm a bit confused as to why specifically generateTestAvroJava seems 
> to use the wrong version. I see our version specific generated code, but this 
> action appears to be inherited from the plugin, and is configured with 
> whichever avro version is provided. Given that I tried to just change to 
> 1.11.3, I'm confused as to why its generating invalid java files for the 
> provided avro version.
> 
> Unlike the classes generated out of the JavaExec you referenced, this appears 
> to only generate one version of the files.

It was supposed to generate files with a specific Avro version every time to 
run the same tests again this specific Avro version. 

> It may be that we don't need this action, but it still seems to run, as we 
> depend on it in the applyAvroNature() action.

I started to think if we really still need this action.

> We could remove this entirely. The java exec only generates versions for 
> pre-configured test versions anyways

Right. The point is in how many places in Beam we need to generate these files 
and which version(s) of Avro to use?

—
Alexey

> 
> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> Hi John,
>> 
>> This old Avro version in Beam is a very long story. Briefly, since initially 
>> it was toughly integrated into Java SDK “core” module then it was not 
>> possible to upgrade an Avro version without breaking changes for users 
>> (because of some Avro incompatible changes, as you have noticed before). So, 
>> we decided to extract Avro-related classes from Beam “core” to a dedicated 
>> Avro extension [2] that supports and actually is tested with different Avro 
>> versions. More details on this work are here [1]
>> 
>> Regarding auto-generated classes. Initially, we used a Gradle plugin for 
>> that but it’s limited with only one Avro version per instance of this 
>> plugin, so it was not possible to generate these classes with different Avro 
>> versions. So, we do this with a special Gradle task (“JavaExec") that 
>> executes “org.apache.avro.tool.Main” and generate Avro classes per every 
>> tested Avro version [3].
>> 
>> We still keep an old Avro version 1.8.2. as a default dependency version but 
>> it will be overwritten if users have a more recent one as a project 
>> dependency in their classpath.
>> 
>> I think we need to completely remove Avro Gradle plugin (use “JavaExec” task 
>> to generate Avro classes with a provided Avro version instead) and update 
>> the default Avro version to the more recent one since now it’s not part of 
>> Java “core”.
>> 
>> Any thoughts?
>> 
>> —
>> Alexey
>>  
>> 
>> [1] https://github.com/apache/beam/issues/24292
>> [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
>> [3] 
>> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>> 
>> 
>> 
>>> On 10 Nov 2023, at 18:05, John Casey via dev >> <mailto:dev@beam.apache.org>> wrote:
>>> 
>>> Hi All,
>>> 
>>> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to 
>>> upgrade to avro 1.11.3.
>>> 
>>> Unfortunately, it seems that our auto-generated Avro test classes aren't 
>>> being generated properly with this new version. I've updated our avro 
>>> generation plugin as well, but for whatever reason, it seems that the 
>>> generated AvroTest file is being generated with references to classes that 
>>> did exist in 1.8.2, but no longer exist in 1.11.3.
>>> 
>>> It seems like our autogeneration is being run with the wrong avro version, 
>>> but I can't seem to find where that would be configured.
>>> 
>>> Here is the PR with my changes so far: 
>>> https://github.com/apache/beam/pull/29390
>>> 
>>> Is anyone familiar with what might be misconfigured here?
>>> 
>>> John
>> 



Re: Adding Dead Letter Queues to Beam IOs

2023-11-13 Thread Alexey Romanenko
Thanks a lot for working on this, long waiting and very demanded user feature.

I’ll try to take a look on design doc in the next days.

—
Alexey

> On 8 Nov 2023, at 21:43, John Casey via dev  wrote:
> 
> Hi All,
> 
> I've written up a design for adding DLQs to existing Beam IOs. It's been 
> through a round of reviews with some Dataflow folks at Google, but I'd 
> appreciate any comments the rest of Beam have around how to refine the design.
> 
> TL;DR: Make it easy for a user to configure IOs to route bad data to an 
> alternate sink instead of crashing the pipeline or having the record be 
> retried indefinitely.
> 
> https://docs.google.com/document/d/1NGeCk6tOqF-TiGEAV7ixd_vhIiWz9sHPlCa1P_77Ajs/edit?usp=sharing
> 
> Thanks!
> 
> John



Re: Upgrading Avro dependencies

2023-11-10 Thread Alexey Romanenko
Hi John,

This old Avro version in Beam is a very long story. Briefly, since initially it 
was toughly integrated into Java SDK “core” module then it was not possible to 
upgrade an Avro version without breaking changes for users (because of some 
Avro incompatible changes, as you have noticed before). So, we decided to 
extract Avro-related classes from Beam “core” to a dedicated Avro extension [2] 
that supports and actually is tested with different Avro versions. More details 
on this work are here [1]

Regarding auto-generated classes. Initially, we used a Gradle plugin for that 
but it’s limited with only one Avro version per instance of this plugin, so it 
was not possible to generate these classes with different Avro versions. So, we 
do this with a special Gradle task (“JavaExec") that executes 
“org.apache.avro.tool.Main” and generate Avro classes per every tested Avro 
version [3].

We still keep an old Avro version 1.8.2. as a default dependency version but it 
will be overwritten if users have a more recent one as a project dependency in 
their classpath.

I think we need to completely remove Avro Gradle plugin (use “JavaExec” task to 
generate Avro classes with a provided Avro version instead) and update the 
default Avro version to the more recent one since now it’s not part of Java 
“core”.

Any thoughts?

—
Alexey
 

[1] https://github.com/apache/beam/issues/24292
[2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
[3] 
https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135



> On 10 Nov 2023, at 18:05, John Casey via dev  wrote:
> 
> Hi All,
> 
> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to 
> upgrade to avro 1.11.3.
> 
> Unfortunately, it seems that our auto-generated Avro test classes aren't 
> being generated properly with this new version. I've updated our avro 
> generation plugin as well, but for whatever reason, it seems that the 
> generated AvroTest file is being generated with references to classes that 
> did exist in 1.8.2, but no longer exist in 1.11.3.
> 
> It seems like our autogeneration is being run with the wrong avro version, 
> but I can't seem to find where that would be configured.
> 
> Here is the PR with my changes so far: 
> https://github.com/apache/beam/pull/29390
> 
> Is anyone familiar with what might be misconfigured here?
> 
> John



Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-10 Thread Alexey Romanenko
+1 (binding)

Java SDK with Spark runner

—
Alexey

> On 9 Nov 2023, at 16:44, Ritesh Ghorse via dev  wrote:
> 
> +1 (non-binding)
> 
> Validated Python SDK quickstart batch and streaming.
> 
> Thanks!
> 
> On Thu, Nov 9, 2023 at 9:25 AM Jan Lukavský  > wrote:
>> +1 (binding)
>> 
>> Validated Java SDK with Flink runner on own use cases.
>> 
>>  Jan
>> 
>> On 11/9/23 03:31, Danny McCormick via dev wrote:
>>> Hi everyone,
>>> Please review and vote on the release candidate #3 for the version 2.52.0, 
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will 
>>> count towards the final vote, but votes from all community members is 
>>> encouraged and helpful for finding regressions; you can either test your 
>>> own use cases or use cases from the validation sheet [10].
>>> 
>>> The complete staging area is available for your review, which includes:
>>> GitHub Release notes [1]
>>> the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint D20316F712213422 [3]
>>> all artifacts to be deployed to the Maven Central Repository [4]
>>> source code tag "v2.52.0-RC3" [5]
>>> website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7]
>>> Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI[8].
>>> Go artifacts and documentation are available at pkg.go.dev 
>>>  [9]
>>> Validation sheet with a tab for 2.52.0 release to help with validation [10]
>>> Docker images published to Docker Hub [11]
>>> PR to run tests against release branch [12]
>>> 
>>> The vote will be open for at least 72 hours. It is adopted by majority 
>>> approval, with at least 3 PMC affirmative votes.
>>> 
>>> For guidelines on how to try the release in your projects, check out our 
>>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> 
>>> Thanks,
>>> Danny
>>> 
>>> [1] https://github.com/apache/beam/milestone/16
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1361/
>>> [5] https://github.com/apache/beam/tree/v2.52.0-RC3
>>> [6] https://github.com/apache/beam/pull/29331
>>> [7] https://github.com/apache/beam-site/pull/653
>>> [8] https://pypi.org/project/apache-beam/2.52.0rc2/
>>> [9] 
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC3/go/pkg/beam
>>> [10] 
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/29319



Re: Disabling Jenkins Jobs

2023-11-07 Thread Alexey Romanenko
Danny, Yi,

Thank you for taking care of this!

—
Alexey

> On 7 Nov 2023, at 17:10, Yi Hu via dev  wrote:
> 
> Hi Alexey,
> 
> > all Jenkins jobs are stuck and there is a big Build Queue on 
> > https://ci-beam.apache.org/
> 
> This is not intentional. This is likely due to INFRA's routine Jenkins 
> upgrade on Nov 5 and caused this outage. Have created 
> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-25147?filter=allopenissues
>  
> 
> Regards,
> Yi
> 
> On Tue, Nov 7, 2023 at 10:35 AM Danny McCormick via dev  <mailto:dev@beam.apache.org>> wrote:
>> I don't think it's related. I noticed the problem half an hour ago; it seems 
>> there's an expired cert on the Jenkins machines. I'm hoping 
>> https://github.com/apache/beam/actions/runs/6786537134/job/18447281366 will 
>> fix this since the IO-Datastores cert is the problematic piece I think (and 
>> that has fixed a similar problem before). I'm not totally confident it will 
>> though given that the job succeeded last week.
>> 
>> On Tue, Nov 7, 2023 at 10:18 AM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>>> Not sure if it’s related but I see that, seems, all Jenkins jobs are stuck 
>>> and there is a big Build Queue on https://ci-beam.apache.org/
>>> 
>>> Random clicks on jobs show that “"All nodes of label ‘beam’ are offline” 
>>> message.
>>> 
>>> Is it known problem?
>>> 
>>> —
>>> Alexey
>>> 
>>>> On 24 Oct 2023, at 21:50, Yi Hu via dev >>> <mailto:dev@beam.apache.org>> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> We have shut down most tests in the Jenkins Load Tests and Performance 
>>>> Tests categories [1, 2], as they have been migrated to GitHub Actions for 
>>>> a while and are continuously publishing the metrics as expected. Please 
>>>> refer to https://github.com/apache/beam/pull/29092 for these tests. Note 
>>>> that pull requests mostly do not involve these tests so that it does not 
>>>> affect the development and release process.
>>>> 
>>>> For the queueing issue mentioned before, after the self-hosted runners 
>>>> switched back from github webhook scaling to load based scaling, it is 
>>>> back to being stable. The issue was likely due to the webhook scaling on 
>>>> GitHub side.
>>>> 
>>>> Regards,
>>>> Yi
>>>> 
>>>> [1] https://ci-beam.apache.org/view/LoadTests/
>>>> [2] https://ci-beam.apache.org/view/PerformanceTests/
>>>> 
>>>> 
>>>> 
>>> 



Re: Disabling Jenkins Jobs

2023-11-07 Thread Alexey Romanenko
Not sure if it’s related but I see that, seems, all Jenkins jobs are stuck and 
there is a big Build Queue on https://ci-beam.apache.org/

Random clicks on jobs show that “"All nodes of label ‘beam’ are offline” 
message.

Is it known problem?

—
Alexey

> On 24 Oct 2023, at 21:50, Yi Hu via dev  wrote:
> 
> Hi all,
> 
> We have shut down most tests in the Jenkins Load Tests and Performance Tests 
> categories [1, 2], as they have been migrated to GitHub Actions for a while 
> and are continuously publishing the metrics as expected. Please refer to 
> https://github.com/apache/beam/pull/29092 for these tests. Note that pull 
> requests mostly do not involve these tests so that it does not affect the 
> development and release process.
> 
> For the queueing issue mentioned before, after the self-hosted runners 
> switched back from github webhook scaling to load based scaling, it is back 
> to being stable. The issue was likely due to the webhook scaling on GitHub 
> side.
> 
> Regards,
> Yi
> 
> [1] https://ci-beam.apache.org/view/LoadTests/
> [2] https://ci-beam.apache.org/view/PerformanceTests/
> 
> 
> 



Re: Processing time watermarks in KinesisIO

2023-10-27 Thread Alexey Romanenko
Ahh, ok, I see.

Yes, it looks like a bug. So, I'd propose to deprecate the old "processing 
time” watermark policy, which we can remove later, and create a new fixed one.

PS: It’s recommended to use "org.apache.beam.sdk.io.aws2.kinesis.KinesisIO” 
instead of deprecated “org.apache.beam.sdk.io.kinesis.KinesisIO” one.

—
Alexey

> On 27 Oct 2023, at 17:42, Jan Lukavský  wrote:
> 
> No, I'm referring to this [1] policy which has unexpected (and hardly 
> avoidable on the user-code side) data loss issues. The problem is that 
> assigning timestamps to elements and watermarks is completely decoupled and 
> unrelated, which I'd say is a bug.
> 
>  Jan
> 
> [1] 
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kinesis/KinesisIO.Read.html#withProcessingTimeWatermarkPolicy--
> 
> On 10/27/23 16:51, Alexey Romanenko wrote:
>> Why not just to create a custom watermark policy for that? Or you mean to 
>> make it as a default policy?
>> 
>> —
>> Alexey
>> 
>>> On 27 Oct 2023, at 10:25, Jan Lukavský  
>>> <mailto:je...@seznam.cz> wrote:
>>> 
>>> 
>>> Hi, 
>>> 
>>> when discussing about [1] we found out, that the issue is actually caused 
>>> by processing time watermarks in KinesisIO. Enabling this watermark outputs 
>>> watermarks based on current processing time, _but event timestamps are 
>>> derived from ingestion timestamp_. This can cause unbounded lateness when 
>>> processing backlog. I think this setup is error-prone and will likely cause 
>>> data loss due to dropped elements. This can be solved in two ways: 
>>> 
>>>  a) deprecate processing time watermarks, or 
>>> 
>>>  b) modify KinesisIO's watermark policy so that is assigns event timestamps 
>>> as well (the processing-time watermark policy would have to derive event 
>>> timestamps from processing-time). 
>>> 
>>> I'd prefer option b) , but it might be a breaking change, moreover I'm not 
>>> sure if I understand the purpose of processing-time watermark policy, it 
>>> might be essentially ill defined from the beginning, thus it might really 
>>> be better to remove it completely. There is also a related issue [2]. 
>>> 
>>> Any thoughts on this? 
>>> 
>>>  Jan 
>>> 
>>> [1] https://github.com/apache/beam/issues/25975 
>>> 
>>> [2] https://github.com/apache/beam/issues/28760 
>>> 
>> 



Re: Processing time watermarks in KinesisIO

2023-10-27 Thread Alexey Romanenko
Why not just to create a custom watermark policy for that? Or you mean to make 
it as a default policy?

—
Alexey

> On 27 Oct 2023, at 10:25, Jan Lukavský  wrote:
> 
> 
> Hi, 
> 
> when discussing about [1] we found out, that the issue is actually caused by 
> processing time watermarks in KinesisIO. Enabling this watermark outputs 
> watermarks based on current processing time, _but event timestamps are 
> derived from ingestion timestamp_. This can cause unbounded lateness when 
> processing backlog. I think this setup is error-prone and will likely cause 
> data loss due to dropped elements. This can be solved in two ways: 
> 
>  a) deprecate processing time watermarks, or 
> 
>  b) modify KinesisIO's watermark policy so that is assigns event timestamps 
> as well (the processing-time watermark policy would have to derive event 
> timestamps from processing-time). 
> 
> I'd prefer option b) , but it might be a breaking change, moreover I'm not 
> sure if I understand the purpose of processing-time watermark policy, it 
> might be essentially ill defined from the beginning, thus it might really be 
> better to remove it completely. There is also a related issue [2]. 
> 
> Any thoughts on this? 
> 
>  Jan 
> 
> [1] https://github.com/apache/beam/issues/25975 
> 
> [2] https://github.com/apache/beam/issues/28760 
> 



Re: [NOTICE] Deprecation Avro classes in "core" and use "extensions/avro" instead for Java SDK

2023-10-18 Thread Alexey Romanenko
Heads up!

Finally, all Avro-related code and Avro dependency, that was deprecated before 
(see a message above), has been removed from Beam Java SDK “core” module [1]. 
We believe that it was a sufficient number of Beam releases (six!) that passed 
after this code had been deprecated and users had an opportunity to switch to a 
new Avro extension as it was recommended before.

We did our best to make this transition as smooth as possible but, please, let 
me know you find any failed tests or any other strange behavior because of this 
change.

Thanks,
Alexey


[1] https://github.com/apache/beam/pull/27851/


> On 22 Feb 2023, at 20:21, Robert Bradshaw via dev  wrote:
> 
> Thanks for pushing this through!
> 
> On Wed, Feb 22, 2023 at 10:38 AM Alexey Romanenko
>  wrote:
>> 
>> Hi all,
>> 
>> As a part of migration the Avro-related classes from Java SDK “core” module 
>> to a dedicated extension [1] (as it was discussed here [2] and here [3]), 
>> two important PRs has been merged [4][5]. Therefore, old Avro-related 
>> classes became deprecated in “core” (still possible to use but not 
>> recommended) and all other Beam modules, that depended on them, switched to 
>> use "extensions/avro” instead.
>> 
>> We did our best to make this change smooth, compatible and not breaking but, 
>> since it was one of the oldest part of “core”, then everything, 
>> unfortunatelly, is possible and we probably could miss something despite of 
>> all efforts. So, considering that, I’d like to ask community to run any kind 
>> of tests or pipelines that utilise, for example, AvroCoder or AvroUtils or 
>> any other related Avro classes and check if new changes doesn’t break 
>> something and everything works as expected.
>> 
>> —
>> Alexey
>> 
>> [1] https://github.com/apache/beam/issues/24292
>> [2] https://lists.apache.org/thread/mz8hvz8dwhd0tzmv2lyobhlz7gtg4gq7
>> [3] https://lists.apache.org/thread/47oz1mlwj0orvo1458v5pw5c20bwt08q
>> [4] https://github.com/apache/beam/pull/24992
>> [5] https://github.com/apache/beam/pull/25534
>> 
>> 



Re: Disabling Jenkins Jobs

2023-10-16 Thread Alexey Romanenko
Thanks for moving this forward!

On a flip side, I noticed that many PreCommit actions (actually all for my PR 
#27851) are stuck waiting for a runner. 
Is it expected behaviour? 
Should we increase a number a runners while moving all Jenkins Jobs to GitHub 
actions?

—
Alexey

> On 13 Oct 2023, at 17:09, Yi Hu via dev  wrote:
> 
> Hi all,
> 
> We have shutdown another set of Jenkins PreCommit, see the newly shutdown 
> ones in https://github.com/apache/beam/pull/28840 , And a complete list of 
> fully migrated tests tracked in https://github.com/apache/beam/issues/28426 
> 
> Regards,
> Yi
> 
> On Thu, Sep 14, 2023 at 3:41 PM Yi Hu  > wrote:
>> Hi all,
>> 
>> We just shutdown a second list of Jenkins test suites. The full list is 
>> tracked in https://github.com/apache/beam/issues/28426 . In the Issue there 
>> is also information about the status of migration as we proceed.
>> 
>> Regards,
>> Yi
>> 
>> On Thu, Sep 14, 2023 at 12:44 PM Danny McCormick via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Sure, I added 
>>> https://cwiki.apache.org/confluence/display/BEAM/GitHub+Actions+Tips
>>> 
>>> On Thu, Sep 14, 2023 at 12:39 PM Ahmet Altay >> > wrote:
 That is great. Could we add a link to that README from 
 https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides ? That 
 will increase discoverability for people like me who use wiki as the 
 starting point for finding how to do things.
 
 On Thu, Sep 14, 2023 at 8:41 AM Danny McCormick >>> > wrote:
> Most of our docs for GitHub actions are located here - 
> https://github.com/apache/beam/blob/master/.github/workflows/README.md. I 
> added https://github.com/apache/beam/pull/28453 to add instructions to 
> that page.
> 
> Thanks,
> Danny
> 
> On Wed, Sep 13, 2023 at 12:31 PM Ahmet Altay via dev  > wrote:
>> This is all great. Do you mind documenting the github actions flow for 
>> running these jobs on the wiki? Or if it already exists, share a link. 
>> Thank you!
>> 
>> On Wed, Sep 13, 2023 at 6:19 AM XQ Hu via dev > > wrote:
>>> This is awesome! Thanks, Danny, Yi, Andrey, and Vlado!
>>> 
>>> On Wed, Sep 13, 2023 at 8:08 AM Danny McCormick via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 Right now, it is just the set of jobs in Yi's PR - 
 https://github.com/apache/beam/pull/28316/files but the plan is to 
 slowly move jobs over time once we've built confidence that they work.
 
 There is a set of jobs that are non-idempotent which we are moving in 
 one shot (which does include the website publishing job in this PR 
  that I just merged) and 
 monitoring closely. The best source of truth is just looking at 
 https://ci-beam.apache.org/ though. Any migrated jobs will have the 
 disabled symbol next to them - for example, the whitespace jobs have 
 now been migrated so they are disabled on Jenkins:
 
 
 
 To run a job that has been migrated manually, you can navigate to that 
 job in the Actions tab and click "run workflow". So for the website 
 publish job you would navigate to 
 https://github.com/apache/beam/actions/workflows/beam_PostCommit_Website_Publish.yml
  and click this button:
 
 
 
 
 Thanks,
 Danny
 
 On Tue, Sep 12, 2023 at 6:22 PM Ahmet Altay via dev 
 mailto:dev@beam.apache.org>> wrote:
> Thank you for doing this.
> 
> Is there a list of jobs that will be disabled? I am particularly 
> curious about: website publishing job (which I need to use manually 
> sometimes) and the job that publishes daily staging builds (which we 
> share with users sometimes.)
> 
> Thank you.
> Ahmet
> 
> On Tue, Sep 12, 2023 at 11:14 AM Danny McCormick via dev 
> mailto:dev@beam.apache.org>> wrote:
>> Hey everyone, I wanted to let you know that as part of the migration 
>> from Jenkins to GitHub Actions we are going to start disabling 
>> Jenkins jobs if they have a corresponding GitHub Actions job that 
>> has been running successfully for a while. We are starting with Yi's 
>> PR here - https://github.com/apache/beam/pull/28316. This is the 
>> next step in the process we kicked off last year [1] now that 
>> self-hosted runners have been in place and working for a while [2].
>> 
>> We will not migrate jobs until we've confirmed we have parity with 
>> the existing Jenkins implementations (for example, some jobs are 
>> still missing test publishing and we won't 

Re: [DISCUSS] Drop Euphoria extension

2023-10-16 Thread Alexey Romanenko
Can we just deprecate it for a while and then remove completely?

—
Alexey

> On 13 Oct 2023, at 18:59, Jan Lukavský  wrote:
> 
> Hi,
> 
> it has been some time since Euphoria extension [1] has been adopted by Beam 
> as a possible "Java 8 API". Beam has evolved from that time a lot, the 
> current API seems actually more elegant than the original Euphoria's and last 
> but not least, it has no maintainers and no known users. If there are any 
> users, please speak up!
> 
> Otherwise I'd like to propose to drop it from codebase, I'll start a vote 
> thread during next week, if there are no objections.
> 
> Best,
> 
>  Jan
> 
> [1] https://beam.apache.org/documentation/sdks/java/euphoria/
> 



Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-06 Thread Alexey Romanenko
+1 (binding)

—
Alexey

> On 5 Oct 2023, at 18:38, Jean-Baptiste Onofré  wrote:
> 
> +1 (binding)
> 
> Thanks !
> Regards
> JB
> 
> On Tue, Oct 3, 2023 at 7:58 PM Kenneth Knowles  wrote:
>> 
>> Hi everyone,
>> 
>> Please review and vote on the release candidate #1 for the version 2.51.0, 
>> as follows:
>> 
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> Reviewers are encouraged to test their own use cases with the release 
>> candidate, and vote +1 if no issues are found. Only PMC member votes will 
>> count towards the final vote, but votes from all community members is 
>> encouraged and helpful for finding regressions; you can either test your own 
>> use cases or use cases from the validation sheet [10].
>> 
>> The complete staging area is available for your review, which includes:
>> 
>> GitHub Release notes [1],
>> the official Apache source release to be deployed to dist.apache.org [2], 
>> which is signed with the key with fingerprint  [3],
>> all artifacts to be deployed to the Maven Central Repository [4],
>> source code tag "v1.2.3-RC3" [5],
>> website pull request listing the release [6], the blog post [6], and 
>> publishing the API reference manual [7].
>> Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle JDK 
>> JDK_VERSION.
>> Python artifacts are deployed along with the source release to the 
>> dist.apache.org [2] and PyPI[8].
>> Go artifacts and documentation are available at pkg.go.dev [9]
>> Validation sheet with a tab for 1.2.3 release to help with validation [10].
>> Docker images published to Docker Hub [11].
>> PR to run tests against release branch [12].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> For guidelines on how to try the release in your projects, check out our 
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>> 
>> Thanks,
>> Kenn
>> 
>> [1] https://github.com/apache/beam/milestone/15
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.51.0
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1356/
>> [5] https://github.com/apache/beam/tree/v2.51.0-RC1
>> [6] https://github.com/apache/beam/pull/28800
>> [7] https://github.com/apache/beam-site/pull/649
>> [8] https://pypi.org/project/apache-beam/2.51.0rc1/
>> [9] https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.51.0-RC1/go/pkg/beam
>> [10] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=437054928
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/28663



Re: [ANNOUNCE] New PMC Member: Alex Van Boxel

2023-10-04 Thread Alexey Romanenko
Congrats Alex, very well deserved!

—
Alexey

> On 4 Oct 2023, at 00:38, Austin Bennett  wrote:
> 
> Thanks for all you do, @Alex Van Boxel  !
> 
> On Tue, Oct 3, 2023 at 12:50 PM Ahmed Abualsaud via dev  > wrote:
>> Congratulations!
>> 
>> On Tue, Oct 3, 2023 at 3:48 PM Byron Ellis via dev > > wrote:
>>> Congrats!
>>> 
>>> On Tue, Oct 3, 2023 at 12:40 PM Danielle Syse via dev >> > wrote:
 Congratulations Alex!! Definitely well deserved!
 
 On Tue, Oct 3, 2023 at 2:57 PM Ahmet Altay via dev >>> > wrote:
> Congratulations Alex! Well deserved!
> 
> On Tue, Oct 3, 2023 at 11:54 AM Ritesh Ghorse via dev 
> mailto:dev@beam.apache.org>> wrote:
>> Congratulations Alex!
>> 
>> On Tue, Oct 3, 2023 at 2:54 PM Danny McCormick via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Congrats Alex, this is well deserved!
>>> 
>>> On Tue, Oct 3, 2023 at 2:50 PM Jack McCluskey via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 Congrats, Alex!
 
 On Tue, Oct 3, 2023 at 2:49 PM XQ Hu via dev >>> > wrote:
> Configurations, Alex!
> 
> On Tue, Oct 3, 2023 at 2:40 PM Kenneth Knowles  > wrote:
>> Hi all,
>> 
>> Please join me and the rest of the Beam PMC in welcoming Alex Van 
>> Boxel mailto:alexvanbo...@apache.org>> as 
>> our newest PMC member.
>> 
>> Alex has been with Beam since 2016, very early in the life of the 
>> project. Alex has contributed code, design ideas, and perhaps most 
>> importantly been a huge part of organizing Beam Summits, and of 
>> course presenting at them as well. Alex really brings the ASF 
>> community spirit to Beam.
>> 
>> Congratulations Alex and thanks for being a part of Apache Beam!
>> 
>> Kenn, on behalf of the Beam PMC (which now includes Alex)



Re: [ANNOUNCE] New PMC Member: Robert Burke

2023-10-04 Thread Alexey Romanenko
Congrats Robert, very well deserved!

—
Alexey

> On 4 Oct 2023, at 00:39, Austin Bennett  wrote:
> 
> Thanks for all you do @Robert Burke  !  
> 
> On Tue, Oct 3, 2023 at 12:53 PM Ahmed Abualsaud  > wrote:
>> Congrats Rebo! 
>> 
>> On 2023/10/03 18:39:47 Kenneth Knowles wrote:
>> > Hi all,
>> > 
>> > Please join me and the rest of the Beam PMC in welcoming Robert Burke <
>> > lostl...@apache.org > as our newest PMC member.
>> > 
>> > Robert has been a part of the Beam community since 2017. He is our resident
>> > Gopher, producing the Go SDK and most recently the local, portable, Prism
>> > runner. Robert has presented on Beam many times, having written not just
>> > core Beam code but quite interesting pipelines too :-)
>> > 
>> > Congratulations Robert and thanks for being a part of Apache Beam!
>> > 
>> > Kenn, on behalf of the Beam PMC (which now includes Robert)
>> > 



Re: [ANNOUNCE] New PMC Member: Valentyn Tymofieiev

2023-10-04 Thread Alexey Romanenko
Congrats Valentyn, very well deserved!

—
Alexey

> On 4 Oct 2023, at 00:39, Austin Bennett  wrote:
> 
> Thanks for everything @Valentyn Tymofieiev  !  
> 
> On Tue, Oct 3, 2023 at 12:53 PM Ahmed Abualsaud  > wrote:
>> Congrats Valentyn! 
>> 
>> On 2023/10/03 18:39:49 Kenneth Knowles wrote:
>> > Hi all,
>> > 
>> > Please join me and the rest of the Beam PMC in welcoming Valentyn
>> > Tymofieiev mailto:tvalen...@apache.org>> as our 
>> > newest PMC member.
>> > 
>> > Valentyn has been contributing to Beam since 2017. Notable highlights
>> > include his work on the Python SDK and also in our container management.
>> > Valentyn also is involved in many discussions around Beam's infrastructure
>> > and community processes. If you look through Valentyn's history, you will
>> > see an abundance of the most critical maintenance work that is the beating
>> > heart of any project.
>> > 
>> > Congratulations Valentyn and thanks for being a part of Apache Beam!
>> > 
>> > Kenn, on behalf of the Beam PMC (which now includes Valentyn)
>> > 



Re: User-facing website vs. contributor-facing website

2023-09-26 Thread Alexey Romanenko
AFAIR, the main point to use Wiki last time was a lower entry bar, comparing to 
the main website, for everybody to add a new information related to any of 
aspects of Beam.

On the flip side, as it was mentioned above, this information is mostly not 
reviewed, bad structured and not easy to find in the end. IMHO, Wiki should be 
used just as an “archive" in case if we something is actual for the moment and 
we don’t want to miss it (e.g. design documents list or git/gradle/etc tips) or 
for very internal things the can “pollute” the website.

I believe the main entry point should be a Beam website that accumulates most 
of the things that are needed for users and contributors frequently and are 
pretty stable in terms of content changing. The review of new doc updates is 
also useful in terms of knowledge sharing.

I'm also against another framework or a tool for similar purposes, it will just 
complicate the things on all levels.

—
Alexey


> On 22 Sep 2023, at 18:14, Robert Bradshaw via dev  wrote:
> 
> On Fri, Sep 22, 2023 at 8:05 AM Danny McCormick via dev  > wrote:
>> > I do feel strongly that https://beam.apache.org/contribute/ should remain 
>> > on the main site, as it's aimed at users (who hopefully want to step up 
>> > and contribute)
>> 
>> To be clear, I don't think anyone is suggesting getting rid of the section, 
>> my comments were about replacing the side panel links with links to the wiki 
>> (or now markdown or wherever we put our docs) instead of hosting those 
>> things as part of our site.
>> 
>> > Related, I stumbled across this the other day: 
>> > https://github.com/apache/beam-site which appears to be unused which could 
>> > probably even have different review and committer sets if we wanted?
>> 
>> That actually holds our published release docs, just not on master - 
>> https://github.com/apache/beam-site/tree/release-docs.
> 
> Yeah. Basically we're using it as hosting for our voluminous auto-generated 
> docs. 
>  
>> A separate repo is always an option regardless, though I don't see a ton of 
>> advantages and it moves us further from the core codebase.
> 
> I don't see any advantage, and plenty of downsides, to a separate repo. What 
> is the issue we're trying to solve here? 
>  
>> > I feel like that's actually pretty easy with Github actions? I think maybe 
>> > there's even one that exists Github Pages and probably any other static 
>> > site generator thingy we could care to name.
>> 
>> Do you know of any actions that do this? 
>> https://github.com/kamranahmedse/github-pages-blog-action is kinda close, 
>> but not obviously better than a folder of markdown docs (no side nav AFAIK). 
>> I'm not sure if actions are really helpful here anyways.
>> 
>> Building our own is definitely doable, but maybe not trivial (feel free to 
>> fact check that) and does introduce a second website framework (hugo vs 
>> jekyll).
> 
> Yeah, -1 on introducing yet another framework. Mostly, we need to prioritize 
> a place to push content that's easy to keep up to date. 
>  
>> On Fri, Sep 22, 2023 at 10:42 AM Byron Ellis via dev > > wrote:
>>> I feel like that's actually pretty easy with Github actions? I think maybe 
>>> there's even one that exists Github Pages and probably any other static 
>>> site generator thingy we could care to name. Related, I stumbled across 
>>> this the other day: https://github.com/apache/beam-site which appears to be 
>>> unused which could probably even have different review and committer sets 
>>> if we wanted?
>>> 
>>> On Thu, Sep 21, 2023 at 3:19 PM Robert Bradshaw via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 TBH, I'm not a huge fan of the wikis either. My ideal flow would be 
 something like g3doc, and markdown files in github do a reasonable enough 
 job emulating that. (I don't think the overhead of having to do a PR for 
 small edits like typos is oneros, as those are super easy reviews to do as 
 well...) For anything in-depth, a pointer to an "actual" doc with better 
 collaborative editing tools is generally in order anyway. 
 
 I do feel strongly that https://beam.apache.org/contribute/ should remain 
 on the main site, as it's aimed at users (who hopefully want to step up 
 and contribute). The top level should probably mostly be a pointer to 
 this, but I think it's valuable (for the audience that reaches it from 
 github) to be a bit taylored to that audience (e.g. assume they just 
 forked/downloaded the repository and want to edit-build-push. Generally a 
 more advanced user than would find the page on the website.)
 
 The release guide? Meh. Wherever those doing releases find it most 
 convenient. If that was me I'd probably put a markdown file right in the 
 release directory next to the relevant scripts... (If not jump to literate 
 programming right there :). 
 
 On Thu, Sep 21, 2023 at 1:20 

Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-13 Thread Alexey Romanenko
I agree with Cham on these two options. 

In the end, it would be great to have such functionality (error handling / DLQ) 
integrated into Beam core API, but it will require, for sure, some technical 
discussions and reviews before - so it will take more time. 

Though, to make it available for users soon as a part of Beam distribution, 
adding this as an extension looks very feasible for me.   

—
Alexey

> On 12 Sep 2023, at 19:44, Chamikara Jayalath via dev  
> wrote:
> 
> Thanks Mazlum, this sounds great. I think there are two ways we can proceed 
> if we decide to integrate the Asgarde library into Beam.
> 
> (1) Directly import the code into Beam without significant modifications 
> and/or a review (though we may add tests).
> 
> (2) Go through a design/code review to determine whether this is the best 
> approach for implementing error handling / DLQ in Beam transforms or whether 
> there are other alternatives/modifications to Asgarde we want to consider.
> 
> If we do (1) I prefer adding Asgarde as a separate Gradle module in Beam. We 
> can later integrate it into the core module after a design/code review.
> 
> Thank,
> Cham
> 
> 
> 
> On Tue, Sep 12, 2023 at 10:26 AM Mazlum TOSUN  > wrote:
>> Hello Austin and everyone,
>> 
>> I am open for discussion.
>> 
>> My first intention with Asgarde was to help the Beam community, because Dead 
>> Letter Queue is so important in Beam and all the data pipeline frameworks.
>> When I worked with Beam on production with my customers, we needed to catch 
>> errors with side outputs and dead letter queue.
>> 
>> This library really helped us to keep a less verbose code while applying all 
>> the error handling logic, that is error prone and verbose if it is repeated.
>> 
>> As Kennet said, my intention was to stay as close as possible to Beam, with 
>> a Wrapper and a Failure Monad on top of a PCollection, to handle all the 
>> code and complexity for try catch blocks and side output.
>> 
>> For the governance, even if I am the creator of this library, the most 
>> important isn't me but the community and to help the community.
>> If the best solution to help the community is including the library directly 
>> on Beam, we can go in this direction, with of course your reviews and 
>> recommendations.
>> 
>> Then the library will belong to the community and we will continue to 
>> improve it.
>> 
>> For the decision about the best place, I will comply with the majority.
>> 
>> Best regards,
>> 
>> Mazlum
>> 
>> On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett > > wrote:
>>> @Mazlum TOSUN  --  you and I have spoken a 
>>> few times about this.  it'd be good for you to comment here on list, on any 
>>> of your concerns with governance, and/or other thoughts.  Ex: if you think 
>>> contributing asgarde directly is the thing [ or perhaps expressing any 
>>> interest helping write/contribute the relevant functionality into beam ... 
>>> it is possible that by adding the actual functionality into beam - like 
>>> Kenn's mentioned 'other place' we could make asgarde as an separate add-on 
>>> obsolete ].  
>>> 
>>> 
>>> 
>>> On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles >> > wrote:
 For anyone who hasn't clicked over the Asgarde, my TL;DR description of it 
 is that it adds the "failure monad" aka "andThen" style error/result 
 handling on top of chaining of PCollections. So it is at a similar level 
 of abstraction of our basic transforms and generally useful for chaining 
 dead-letter side outputs. It is no more or less appropriate for the core 
 SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we 
 actually aspired to have a thin core with the accessories like that in 
 another place, then it should go to that other place.
 
 Kenn
 
 On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev 
 mailto:dev@beam.apache.org>> wrote:
> > until we *require* Asgard on a core transform, it shouldn't be in the 
> > main repo
> 
> I don't think this is necessarily true if it solves end user use cases. 
> If there is a specific transform that solves a specific use case, we 
> could include it in the transforms folder for end-users, even if it isn't 
> utilized in the I/Os at present. Hence the suggestion to take the most 
> promising transforms and propose adding them with documentation, apis and 
> rationale.
> 
> -Daniel
> 
> On Fri, Sep 8, 2023 at 11:20 AM Robert Burke  > wrote:
>> I would say until we *require* Asgard on a core transform, it shouldn't 
>> be in the main repo. 
>> 
>> Incorporating something before there's a need for it is premature 
>> abstraction. We can't do things because they *might* be useful. Let's 
>> see concrete places where they are useful, or we're already having 

Re: [Discuss] Get rid of OWNERS files

2023-08-08 Thread Alexey Romanenko
I’m generally agree with this (initially that was a good intention imho) but 
what could be an alternative for this? Review bot also may assign reviewers 
that are no longer active on the project.

—
Alexey


> On 8 Aug 2023, at 16:55, Danny McCormick via dev  wrote:
> 
> Hey everyone, I'd like to propose getting rid of OWNERS files from the Beam 
> repo. Right now, I don't think they are serving a meaningful purpose:
> 
> - Many OWNERS files are outdated and point to people who are no longer 
> actively involved in the project (examples: 1 
> , 2 
> , 3 
> , 
> there are many more)
> - Many dependencies don't have owners assigned
> - Many major directories function fine without OWNERS files
> - We lack sufficient documentation of what OWNERS files mean 
> (https://s.apache.org/beam-owners is not helpful and I couldn't find other 
> resources) 
> - We now have the review bot to automatically assign reviewers based on areas 
> of ownership. That has proven more likely to stay up to date.
> 
> Given all of these, I don't see any obvious usefulness for OWNERS files. 
> Please chime in if you disagree (or agree). If there are no objections I'll 
> assume silent consensus and remove them next week.
> 
> Thanks,
> Danny



Re: [VOTE] Release 2.49.0, release candidate #1

2023-07-07 Thread Alexey Romanenko
+1 (binding)

Tested with https://github.com/Talend/beam-samples/actions
(Java SDK v8/v11/v17, Spark 3.x runner).

—
Alexey

> On 6 Jul 2023, at 17:34, Yi Hu via dev  wrote:
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 2.49.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if
> no issues are found. Only PMC member votes will count towards the final vote, 
> but votes from all
> community members is encouraged and helpful for finding regressions; you can 
> either test your own
> use cases or use cases from the validation sheet [10].
> 
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with
> fingerprint either CB6974C8170405CB (y...@apache.org 
> ) or D20316F712213422 (GitHub Action automated) [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.49.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle JDK 
> JDK_VERSION.
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2] and PyPI [8].
> * Go artifacts and documentation are available at pkg.go.dev 
>  [9]
> * Validation sheet with a tab for 2.49.0 release to help with validation [10].
> * Docker images published to Docker Hub [11].
> * PR to run tests against release branch [12].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> For guidelines on how to try the release in your projects, check out our blog 
> post at /blog/validate-beam-release/.
> 
> Thanks,
> Release Manager
> 
> [1] https://github.com/apache/beam/milestone/13
> [2] https://dist.apache.org/repos/dist/dev/beam/2.49.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1348/
> [5] https://github.com/apache/beam/tree/v2.49.0-RC1
> [6] https://github.com/apache/beam/pull/27374
> [7] https://github.com/apache/beam-site/pull/646
> [8] https://pypi.org/project/apache-beam/2.49.0rc1/
> [9] https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.49.0-RC1/go/pkg/beam
> [10] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=934901728
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> [12] https://github.com/apache/beam/pull/27307
> 
> -- 
> Yi Hu, (he/him/his)
> Software Engineer
> 
> 



Re: [DISCUSS] Enable Github Discussions?

2023-07-03 Thread Alexey Romanenko
-1 
I understand that for some people, who maybe are not very familiar with ASF and 
its “Apache Way” [1], it may sound a bit obsolete but mailing lists are one of 
the key things of every ASF project which Apache Beam is. Having user@, dev@ 
and commits@ lists are required for ASF project to maintain the open 
discussions that are publicly accessible and archived in the same way for all 
ASF projects. 

I just wanted to remind a key motto at Apache Software Foundation is: 
  “If it didn't happen on the mailing list, it didn't happen.”

—
Alexey

[1] https://apache.org/theapacheway/index.html

> On 1 Jul 2023, at 19:54, Anand Inguva via dev  wrote:
> 
> +1 for GitHub discussions as well. But I am also little concerned about 
> multiple places for discussions. As Danny said, if we have a good plan on how 
> to move forward on how/when to archive the current mailing list, that would 
> be great.
> 
> Thanks,
> Anand
> 
> On Sat, Jul 1, 2023, 3:21 AM Damon Douglas  > wrote:
>> I'm very strong +1 for replacing the use of Email with GitHub Discussions. 
>> Thank you for bringing this up.
>> 
>> On Fri, Jun 30, 2023 at 7:38 AM Danny McCormick via dev > > wrote:
>>> Thanks for starting this discussion!
>>> 
>>> I'm a weak -1 for this proposal. While I think that GH Discussions can be a 
>>> good forum, I think most of the things that Discussions do are covered by 
>>> some combination of the dev/user lists and GitHub issues, and the net 
>>> outcome of this will be creating one more forum to pay attention to. I know 
>>> in the past we've had a hard time keeping up with Stack overflow questions 
>>> for a similar reason. With that said, I'm not opposed to trying it out and 
>>> experimenting as long as we have (a) clear criteria for understanding if 
>>> the change is effective or not (can be subjective), (b) a clear idea of 
>>> when we'd revisit the discussion, and (c) a clear path to rollback the 
>>> decision without it being too much work (this might mean something like 
>>> disabling future discussions and keeping the history or somehow moving the 
>>> history to the dev or user list). If we do this, I also think we should 
>>> update https://beam.apache.org/community/contact-us/ with a clear taxonomy 
>>> of what goes where (this is what I'm unsure of today).
>>> 
>>> FWIW, if we were proposing cutting either the user list or both the user 
>>> and dev list in favor of discussions, I would be +1. I do think the 
>>> advantages of discussions over email are real (threaded, easy to convert 
>>> to/from issues, markdown, one place for all things Beam).
>>> 
>>> Thanks,
>>> Danny
>>> 
>>> On Fri, Jun 30, 2023 at 10:23 AM Svetak Sundhar via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 Hi all,
 
 I wanted to start a discussion to gauge interest on enabling Github 
 Discussions  in Apache 
 Beam.
 
 Pros:
 + GH Discussions allows for folks to get unblocked on small/medium 
 implementation blocker (Google employees can often get this help by 
 scheduling a call with teammates whereas there is a larger barrier for 
 non-Google employees to get this help).
 + On the above point, more visibility into the development blockers that 
 others have previously faced.
 + GH Discussions is more discoverable and approachable for new users and 
 contributors.
 + A centralized place to have discussions. Long term, it makes sense to 
 eventually fully migrate to GH Discussions.
 
 Cons:
 - For a period of time when we use both the dev list and GH Discussions, 
 context can be confusing. 
 - Anything else?
 
 To be clear, I’m not advocating that we move off the dev list immediately. 
 I propose that over time we slowly start moving discussions over to GH 
 discussions, utilizing things such as the poll feature.
 
 I am aware that the Airflow project [1] uses both GH Discussions today and 
 a dev@ list [2] today.
 
 [1] https://github.com/apache/airflow/discussions
 [2] https://lists.apache.org/list.html?d...@airflow.apache.org
 
 Thanks,
 
 
   Svetak Sundhar
   Data Engineer
   s vetaksund...@google.com 
 
  



Re: [VOTE] Release 2.48.0 release candidate #2

2023-05-30 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
(Java SDK v8/v11/v17, Spark 3.x runner).

> On 27 May 2023, at 19:38, Bruno Volpato via dev  wrote:
> 
> I was able to check that containers are all there and complete my validation.
> 
> +1 (non-binding).
> 
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates (Java 
> SDK 11, Dataflow runner).
> 
> 
> Thanks Ritesh and Danny!
> 
> On Fri, May 26, 2023 at 10:09 AM Danny McCormick via dev  > wrote:
>> It looks like some Dataflow containers didn't get published, so some jobs 
>> using the legacy runner (runner v2 disabled) will fail. I kicked off the 
>> container release, so that should hopefully be available later today.
>> 
>> Thanks,
>> Danny
>> 
>> On Thu, May 25, 2023 at 11:19 PM Ritesh Ghorse via dev > > wrote:
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version 2.48.0, 
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will 
>>> count towards the final vote, but votes from all community members are 
>>> encouraged and helpful for finding regressions; you can either test your 
>>> own use cases or use cases from the validation sheet [10].
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint E4C74BEC861570F5A3E44E46280A0AC32DBAE62B [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.48.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7] (to be generated).
>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK 
>>> 8.0.322. 
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev 
>>>  [9]
>>> * Validation sheet with a tab for 2.48.0 release to help with validation 
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>> 
>>> The vote will be open for at least 72 hours. It is adopted by majority 
>>> approval, with at least 3 PMC affirmative votes.
>>> 
>>> For guidelines on how to try the release in your projects, check out our 
>>> blog post at /blog/validate-beam-release/.
>>> 
>>> NOTE: Dataflow containers for Python are not finalized yet (likely to 
>>> happen on tuesday). I will follow up on this thread once that is done. Feel 
>>> free to test it on other runners until then. 
>>> 
>>> Thanks,
>>> Ritesh Ghorse
>>> 
>>> [1] https://github.com/apache/beam/milestone/12
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.48.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1346/
>>> [5] https://github.com/apache/beam/tree/v2.48.0-RC2
>>> [6] https://github.com/apache/beam/pull/26903
>>> [7] https://github.com/apache/beam-site/pull/645
>>> [8] https://pypi.org/project/apache-beam/2.48.0rc2/
>>> [9] 
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.48.0-RC2/go/pkg/beam 
>>> 
>>> [10] 
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=458120434
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/26811
>>> 



Re: Beam SQL found limitations

2023-05-22 Thread Alexey Romanenko
Hi Piotr,

Thanks for details! I cross-post this to dev@ as well since, I guess, people 
there can provide more insights on this.

A while ago, I faced the similar issues trying to run Beam SQL against TPC-DS 
benchmark. 
We had a discussion around that [1], please, take a look since it can be 
helpful.

[1] https://lists.apache.org/thread/tz8h1lycmob5vpkwznvc2g6ol2s6n99b

—
Alexey 

> On 18 May 2023, at 11:36, Wiśniowski Piotr 
>  wrote:
> 
> HI,
> 
> After experimenting with Beam SQL I did find some limitations. Testing on 
> near latest main (precisely `5aad2467469fafd2ed2dd89012bc80c0cd76b168`) with 
> Calcite, direct runner and openjdk version "11.0.19". Please let me know if 
> some of them are known/ worked on/ have tickets or have estimated fix time. I 
> believe most of them are low hanging fruits or just my thinking is not right 
> for the problem. If this is the case please guide me to some working solution.
> 
>  From my perspective it is ok to have a fix just on master - no need to wait 
> for release. Priority order: 
> - 7. Windowing function on a stream - in detail - How to get previous message 
> for a key? setting expiration arbitrary big is ok, but access to the previous 
> record must happen fairly quickly not wait for the big window to finish and 
> emit the expired keys. Ideally would like to do it in pure beam pipeline as 
> saving to some external key/value store and then reading this here could 
> potentially result in some race conditions which in I would like to avoid, 
> but if its the only option - let it be.
> - 5. single UNION ALL possible
> - 4. UNNEST ARRAY with nested ROW
> - 3. Using * when there is Row type present in the schema
> - 1. `CROSS JOIN` between two unrelated tables is not supported - even if one 
> is a static number table
> - 2. ROW construction not supported. It is not possible to nest data
> 
> Below queries tat I use to testing this scenarios.
> 
> Thank You for looking at this topics!
> 
> Best
> 
> Wiśniowski Piotr
> 
> ---
> -- 1. `CROSS JOIN` between two unrelated tables is not supported. 
> ---
> -- Only supported is `CROSS JOIN UNNEST` when exploding array from same table.
> -- It is not possible to number rows
> WITH data_table AS (
> SELECT 1 AS a
> ),
> number_table AS (
> SELECT 
> numbers_exploded AS number_item
> FROM UNNEST(ARRAY[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]) AS 
> numbers_exploded
> )
> SELECT 
> data_table.a,
> number_table.number_item
> FROM data_table
> CROSS JOIN number_table
> ;
> -- CROSS JOIN, JOIN ON FALSE is not supported!
> 
> 
> ---
> -- 2. ROW construction not supported. It is not possible to nest data
> ---
> SELECT ROW(1,2,'a') AS r; -- java.lang.NoSuchFieldException: EXPR$0 
> SELECT (1,2,'a') AS r; -- java.lang.NoSuchFieldException: EXPR$0 
> SELECT MAP['field1',1,'field2','a']; -- Parameters must be of the same type
> SELECT MAP['field1','b','field2','a']; -- null
> -- WORKAROUND - manually compose json string, 
> -- drawback - decomposing might be not supported or would need to be also 
> based on string operations
> SELECT ('{"field1":"' || 1 || '","field2":"' || 'a' || '"}') AS `json_object`;
> 
> 
> ---
> -- 3. Using * when there is Row type present in the schema
> ---
> CREATE EXTERNAL TABLE test_tmp_1(
> `ref` VARCHAR,
> `author` ROW<
> `name` VARCHAR,
> `email` VARCHAR
> >
> )
> TYPE text
> LOCATION 'python/dbt/tests/using_star_limitation.jsonl'
> TBLPROPERTIES '{"format":"json", 
> "deadLetterFile":"top/python/dbt/tests/dead"}';
> SELECT * FROM test_tmp_1;
> --  java.lang.NoSuchFieldException: name 
> -- WORKAROUND - refer to columns explicitly with alias
> SELECT 
> `ref` AS ref_value, 
> test_tmp_1.`author`.`name` AS author_name, -- table name must be 
> referenced explicitly - this could be fixed too
> test_tmp_1.`author`.`email` AS author_name
> FROM test_tmp_1;
> 
> 
> ---
> -- 4. UNNEST ARRAY with nested ROW
> ---
> CREATE EXTERNAL TABLE test_tmp(
> `ref` VARCHAR,
> `commits` ARRAY `id` VARCHAR,
> `author` ROW<
> `name` VARCHAR,
> `email` VARCHAR
> >
> >>
> )
> TYPE text
> LOCATION 'python/dbt/tests/array_with_nested_rows_limitation.jsonl'
> TBLPROPERTIES '{"format":"json", "deadLetterFile":"python/dbt/tests/dead"}';
> SELECT
> test_tmp.`ref` AS branch_name,
> commit_item.`id` AS commit_hash,
> commit_item.`author`.`name` AS author_name
> FROM test_tmp
> CROSS JOIN UNNEST(test_tmp.commits) AS commit_item;
> -- Row expected 4 fields (Field{name=ref, description=, type=STRING, 
> options={{}}}, Field{name=commits, description=, type=ARRAY author ROW> NOT NULL>, options={{}}}, 
> Field{name=id, description=, type=STRING, options={{}}}, Field{name=author, 
> description=, type=ROW, options={{}}}). 
> 

Re: Repository Deletion Postmortem

2023-05-17 Thread Alexey Romanenko
Thanks for follow-up on this, it should be helpful.

> The git revert command was used to create a revert commit. However, the 
> created commit deleted the entire repository.

Do you have an idea what was a command and how it could deleted the entire 
repository?

I’m asking about this to find a potential solution how we may protect ourself 
against this in the future.

—
Alexey

> On 16 May 2023, at 21:39, Jack McCluskey via dev  wrote:
> 
> Hey everyone,
> 
> Now that 2.47.0 is out, I've got a short write-up on how the repository was 
> very briefly deleted back when I was cutting the release branch at 
> https://s.apache.org/beam-repo-deletion-postmortem. This includes the rough 
> order of operations, the recovery, and then the action items that came from 
> the experience. It's a very good sign that between Danny McCormick and myself 
> we were able to knock out two of the three core action items pretty soon 
> after we got history restored in the repository. As we continue to improve 
> and simplify the release process we'll hopefully find fewer and fewer avenues 
> for big issues to pop up.
> 
> Thanks,
> 
> Jack McCluskey
> 
> -- 
> 
>   
> 
> Jack McCluskey
> SWE - DataPLS PLAT/ Dataflow ML
> RDU
> jrmcclus...@google.com 
> 
> 



Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-28 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
(Java SDK v8/v11/v17, Spark 3.x runner).

---
Alexey

> On 28 Apr 2023, at 16:06, Jack McCluskey via dev  wrote:
> 
> There was a GCP outage that impacted pushing containers to GCR, I expected it 
> to impact Java containers specifically but it looks like it also affected 
> Python containers. I believe the situation is resolved and I can get the 
> containers pushed now, if that continues to be an issue I'll follow up. 
> 
> On Thu, Apr 27, 2023 at 7:21 PM Chamikara Jayalath  > wrote:
>> I tried to run a Java multi-lang pipeline and it's failing due to the 
>> following error during worker setup.
>> 
>> Error syncing pod, skipping" err="failed to \"StartContainer\" for 
>> \"sdk-1-0\" with ImagePullBackOff: \"Back-off pulling image 
>> \\\"gcr.io/cloud-dataflow/v1beta3/beam_python3.8_sdk:2.47.0\\\ 
>> "\""
>>  pod="default/df-runinferenceexample-chami-04271607-gwf8-harness-vj8w" 
>> podUID=37d8de0a068391920b98dce559c4886f
>> 
>> Are these containers not available yet to test Dataflow ?
>> 
>> Thanks,
>> Cham
>> 
>> On Thu, Apr 27, 2023 at 2:17 PM Robert Bradshaw via dev > > wrote:
>>> The artifacts and signatures all look good, and I validated a couple of 
>>> Python pipelines in a fresh install. 
>>> 
>>> Assuming all the tests (including the Dataflow ones) pass (modulo the two 
>>> mentioned above; seems a fair justification to not block on those) I'm +1 
>>> (binding) on this release. 
>>> 
>>> On Wed, Apr 26, 2023 at 12:39 PM Jack McCluskey via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 There's also a good chance that newer test suites haven't been included in 
 mass_comment.py 
 (https://github.com/apache/beam/blob/master/release/src/main/scripts/mass_comment.py)
  and as a result they were not executed. 
 
 On Wed, Apr 26, 2023 at 3:29 PM Jack McCluskey >>> > wrote:
> The Dataflow CrossLanguageValidatesRunner GoUsingJava Tests have been 
> broken for quite some time (https://github.com/apache/beam/issues/21645) 
> and the Kafka issue is tied to a test timeout that John Casey has fixed 
> but didn't get cherrypicked (just fell through the cracks while waiting 
> on tests to pass, but conversations with them led to the conclusion that 
> we would just get it into an RC2 if necessary since it's a matter of how 
> the tests run not how the code under test functions.) 
> 
> The tests still marked "pending" passed but did not get updated on the 
> GitHub side from when Jenkins was straining under load, I'm guessing 
> those builds have since been deleted under our new retention policy to 
> alleviate the OOM Jenkins issues. I will try to re-run those for the sake 
> of having clear and obvious results.
> 
> On Wed, Apr 26, 2023 at 3:23 PM Valentyn Tymofieiev  > wrote:
>> Thanks, Jack!
>> 
>> re [12]: 
>> 
>> I am seeing some test errors - have they been investigated?
>> Also, did all test suites run? I think I am not seeing output of some of 
>> the suites, like 
>> Run Python Dataflow V2 ValidatesRunner
>> 
>> 
>> On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Hi everyone,
>>> 
>>> Please review and vote on the release candidate #3 for the version 
>>> 1.2.3, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if no issues are found.
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint DF3CBA4F3F4199F4 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.47.0-RC1" [5],
>>> * website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK 
>>> 8.0.322.
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev 
>>>  [9]
>>> * Validation sheet with a tab for 2.47.0 release to help with 
>>> validation [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>> 
>>> The vote 

Re: [ANNOUNCE] New committer: Anand Inguva

2023-04-25 Thread Alexey Romanenko
Congratulations, Anand! Well deserved!

> On 25 Apr 2023, at 06:02, Byron Ellis via dev  wrote:
> 
> Congrats Anand!
> 
> On Mon, Apr 24, 2023 at 9:54 AM Ahmet Altay via dev  > wrote:
>> Congratulations Anand!
>> 
>> On Mon, Apr 24, 2023 at 8:05 AM Kerry Donny-Clark via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Great work Anand, this is well deserved.
>>> 
>>> 
>>> On Mon, Apr 24, 2023 at 10:35 AM Yi Hu via dev >> > wrote:
 Congrats Anand!
 
 On Fri, Apr 21, 2023 at 3:54 PM Danielle Syse via dev >>> > wrote:
> Congratulations!
> 
> On Fri, Apr 21, 2023 at 3:53 PM Damon Douglas via dev 
> mailto:dev@beam.apache.org>> wrote:
>> Congratulations Anand!
>> 
>> On Fri, Apr 21, 2023 at 12:28 PM Ritesh Ghorse via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Congratulations Anand!
>>> 
>>> On Fri, Apr 21, 2023 at 3:24 PM Ahmed Abualsaud via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 Congrats Anand!
 
 On Fri, Apr 21, 2023 at 3:18 PM Anand Inguva via dev 
 mailto:dev@beam.apache.org>> wrote:
> Thanks everyone. Really excited to be a part of Beam Committers. 
> 
> On Fri, Apr 21, 2023 at 3:07 PM XQ Hu via dev  > wrote:
>> Congratulations, Anand!!!
>> 
>> On Fri, Apr 21, 2023 at 2:31 PM Jack McCluskey via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Congratulations, Anand!
>>> 
>>> On Fri, Apr 21, 2023 at 2:28 PM Valentyn Tymofieiev via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 Congratulations!
 
 On Fri, Apr 21, 2023 at 8:19 PM Jan Lukavský >>> > wrote:
> Congrats Anand!
> 
> On 4/21/23 20:05, Robert Burke wrote:
>> Congratulations Anand!
>> 
>> On Fri, Apr 21, 2023, 10:55 AM Danny McCormick via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Woohoo, congrats Anand! This is very well deserved!
>>> 
>>> On Fri, Apr 21, 2023 at 1:54 PM Chamikara Jayalath 
>>> mailto:chamik...@apache.org>> wrote:
 Hi all,
 
 Please join me and the rest of the Beam PMC in welcoming a new 
 committer: Anand Inguva (ananding...@apache.org 
 )
 
 Anand has been contributing to Apache Beam for more than a 
 year and  authored and reviewed more than 100 PRs. Anand has 
 been a core contributor to Beam Python SDK and drove the 
 efforts to support Python 3.10 and Python 3.11. 
 
 Considering their contributions to the project over this 
 timeframe, the Beam PMC trusts Anand with the responsibilities 
 of a Beam committer. [1]
 
 Thank you Anand! And we are looking to see more of your 
 contributions!
 
 Cham, on behalf of the Apache Beam PMC
 
 [1]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer



Re: [ANNOUNCE] New committer: Damon Douglas

2023-04-25 Thread Alexey Romanenko
Congratulations, Damon! Well deserved!

> On 25 Apr 2023, at 09:17, Jan Lukavský  wrote:
> 
> Congrats Damon!
> 
> On 4/25/23 06:15, Alex Kosolapov wrote:
>> Congratulations, Damon!
>> 
>>  
>> From: Kenneth Knowles  
>> Reply-To: "dev@beam.apache.org"  
>>  
>> Date: Monday, April 24, 2023 at 12:52 PM
>> To: "dev@beam.apache.org"   
>> 
>> Subject: [EXTERNAL] [ANNOUNCE] New committer: Damon Douglas
>> 
>>  
>> Hi all,
>> 
>> Please join me and the rest of the Beam PMC in welcoming a new committer: 
>> Damon Douglas (damondoug...@apache.org )
>> 
>> Damon has contributed widely: Beam Katas, playground, infrastructure, and 
>> many IO connectors. Damon does lots of code review in addition to code. 
>> (yes, you can review code as a non-committer!)
>> 
>> Considering their contributions to the project over this timeframe, the Beam 
>> PMC trusts Damon with the responsibilities of a Beam committer. [1]
>> 
>> Thank you Damon! And we are looking to see more of your contributions!
>> 
>> Kenn, on behalf of the Apache Beam PMC
>> 
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>> 



Re: [DISCUSS] @Experimental, @Internal, @Stable, etc annotations

2023-04-18 Thread Alexey Romanenko
://github.com/apache/beam/blob/b9f27f9da2e63b564feecaeb593d7b12783192b0/sdks/java/core/src/main/java/org/apache/beam/sdk/annotations/Experimental.java#L48
>>>>> 
>>>>> On Fri, Apr 14, 2023 at 1:26 PM Ahmet Altay via dev >>>> <mailto:dev@beam.apache.org>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Fri, Apr 14, 2023 at 1:15 PM Kenneth Knowles >>>>> <mailto:k...@apache.org>> wrote:
>>>>>>> 
>>>>>>> Thanks for the discussion. Many good points. Probably just removing all 
>>>>>>> the annotations is a noop to users, and will solve the "afraid to use 
>>>>>>> experimental features" problem.
>>>>>>> 
>>>>>>> Regarding stability, the capabilities of Java (and Python is much much 
>>>>>>> worse) make it infeasible to produce quality software with the rule 
>>>>>>> "once it is public it is frozen forever". But on the other hand, there 
>>>>>>> isn't much of a practical alternative. Most projects just make breaking 
>>>>>>> changes at minor releases quite often, in my experience. I don't want 
>>>>>>> to follow that pattern, for sure.
>>>>>>> 
>>>>>>> Regarding Danny's comment of not seeing this culture - check out any of 
>>>>>>> our more mature IOs, which all have very high cyclomatic complexity due 
>>>>>>> to never being significantly refactored. Adhering to in-place state 
>>>>>>> compatibility for update instead of focusing on blue/green deployment 
>>>>>>> is also a culprit here. I don't have examples to mind, but the point 
>>>>>>> about the culture of stagnation came from my recent experiences as code 
>>>>>>> reviewer where there was some idea that we couldn't change things even 
>>>>>>> when they were plainly wrong and the change was plainly a fix.
>>>>>>> 
>>>>>>> Often, it comes from corners like triggered side inputs where we simply 
>>>>>>> never had a clear concept and so bringing things into alignment with a 
>>>>>>> spec will break someone, by necessity. To be clear: I have not received 
>>>>>>> pushback on that one (yet). Some other examples are 
>>>>>>> https://s.apache.org/finishing-triggers-drop-data (breaking change 
>>>>>>> necessary to eliminate data loss risk) 
>>>>>>> https://github.com/apache/beam/issues/20528 (fix was too slow because 
>>>>>>> we were hesitant to commit a breaking fix) 
>>>>>>> https://github.com/apache/beam/pull/8134#pullrequestreview-218592801 
>>>>>>> (left unsafe API in place, applied doc-only fix).
>>>>>>> 
>>>>>>> But indeed, of all the issues I raised, the customer concern with 
>>>>>>> `@Experimental` was the most important. We have had a few threads about 
>>>>>>> it in the past, too, and it hasn't gotten better.
>>>>>>> 
>>>>>>>  1. It does not have the intended effect (making users OK with evolving 
>>>>>>> APIs and behavior to allow us to reach a high level of quality)
>>>>>>>  2. It has an unintended effect (making users afraid to use things 
>>>>>>> which they should be happy to use)
>>>>>>>  3. We don't use it consistently (many less-safe things are not 
>>>>>>> experimental, many totally stable things are experimental)
>>>>>>> 
>>>>>>> Because of 3, if we don't have a feasible way to move to 
>>>>>>> "evolving/unstable by default" in a way that users know and are OK 
>>>>>>> with, then 1 is impossible. And so the only way to fix 2 is to just 
>>>>>>> eliminate the annotation approach entirely and go with language 
>>>>>>> conventions.
>>>>>> 
>>>>>> +1 to eliminating @Experimental as a Beam level annotation. That is the 
>>>>>> simplest approach that will get us to a consistent state, and it will 
>>>>>> align the goals and intentions of us with users'.
>>>>>>  
>>>>>>> 
>>>>>>> Kenn
>>>>>>> 
>>>>>>> On Wed, Apr 12, 2023 at 5:10 PM Ahmet Altay via dev 
>&g

Re: [DISCUSS] @Experimental, @Internal, @Stable, etc annotations

2023-04-04 Thread Alexey Romanenko
Great and long-to-wait topic to discuss.

My personal opinion based on what I saw on different open-source projects is 
that all such annotations, like @Experimental or @Stable, are not usefull along 
the time and even rather useless and misleading. What actually play roles is 
artifacts publishing and public API despite how it was annotated. Once a 
class/method was published and available for users to use, it should be 
considered as “stable" (even if it’s not yet stable from its developers point 
of view) and can’t be easily removed/changed in the next releases. 

At Beam, we have a “good" example with @Experimental that was used to annotate 
many parts of code in the beginning of its creation but then perhaps forgotten 
to be removed whenever this code is already used by many users and API can’t be 
just changed despite of this annotation. 

So, I’m pro to dismiss such annotations and consider all public and 
user-available API as “stable”. If it’s needed to change/remove a public API 
then we should follow the procedure of API deprecation and final removing, at 
least, after 3 major (x.y) Beam releases. It should help to have the clear 
rules for API changes and avoiding breaking changes for users.

—
Alexey 


> On 3 Apr 2023, at 17:04, Byron Ellis via dev  wrote:
> 
> Honestly, I think APIs could be pretty simply defined if you think of it in 
> terms of the user:
> 
> @Deprecated = this was either stable or evolve but the 
> functionality/interface will go away at a future date
> 
> @Stable = the user of this API opting out of changes to functionality and 
> interface. For example, default options don't change for a transform 
> annotated this way.
> 
> Evolving (No Annotation) = the user is opting in to changes to functionality 
> but not to interface. We should generally try to write backwards compatible 
> code, but on the other hand the release model does not force users into an 
> upgrade
> 
> @Experimental = this functionality / interface might be a bad idea and could 
> go away at any time
> 
> 
> On Mon, Apr 3, 2023 at 7:22 AM Danny McCormick via dev  > wrote:
>> ;tldr - I'd like "evolving" to be further defined, specifically around how 
>> we will make decisions about breaking behavior and API changes
>> 
>> I don't particularly care what tags we use as long as they're well 
>> documented. With that said, I think the following framing needs to be 
>> documented with more definition to flesh out the underlying philosophy:
>> 
>> >  - new code is changeable/evolving by default (so we don't have to always 
>> > remember to annotate it) but users have confidence they can use it in 
>> > production (because we have good software engineering practices)
>>  > - Experimental would be reserved for more risky things
>>  > - after we are confident an API is stable, because it has been the same 
>> across a couple releases, we mark it
>> 
>> Here, we have 3 classes of APIs - "experimental", "stable", and "evolving" 
>> (or alternately "undefined").
>> 
>> "Experimental" seems clear - we can make any changes we want. "Stable" is 
>> reasonably straightforward as well - we will only make non-breaking changes 
>> except in exceptional cases (e.g. security hole, total failure of 
>> functionality, etc...)
>> 
>> With "evolving" is the idea that we can still make any changes we want, but 
>> we think it's less likely we'll need to? Are silent behavior changes 
>> acceptable here (my vote would be no)? What about breaking API changes (my 
>> vote would be rarely)?
>> 
>> I think being able to change our APIs is an ok goal, but outside of a true 
>> experimental context we should still be weighing the cost of API changes 
>> against the benefit; we have a problem of people not updating to newer SDKs, 
>> and introducing more breaking changes will just exacerbate that problem. 
>> Maybe my concerns are just a consequence of me not really seeing the same 
>> things that you're seeing, specifically: "I'm seeing a culture of being 
>> afraid to change things, even when it would be good for users, because our 
>> API surface area is far too large and not explicitly chosen." Mostly what 
>> I've seen is a healthy concern about making it hard for users to upgrade 
>> versions, but my view is probably just limited here.
>> 
>> My ideal framing for "evolving" is: an evolving API can make breaking API 
>> changes between versions, but this will be rare and weighed against the cost 
>> of slowing users' upgrade process. All breaking changes will be communicated 
>> in our change log. An evolving API will not make silent behavior changes 
>> except in exceptional cases (e.g. patching a security gap, fixing total 
>> failures of functionality).
>> 
>> Thanks,
>> Danny
>> 
>> On Mon, Apr 3, 2023 at 9:02 AM Jan Lukavský > > wrote:
>>> Hi,
>>> 
>>> removing @Experimental and adding explicit @Stable annotation makes 
>>> sense to me. FWIW, when we were designing Euphoria 

Re: Error debezium

2023-04-03 Thread Alexey Romanenko
Hi Juan Manuel montoya,

Thanks for working on this! Just for reference  - all Beam Java artifacts are 
normally built with Java 8 but they are tested and suposed to work with Java 11 
and Java 17.

Also, it would be great to track a progress on this work for Debezium 
integration by creating a Github issue at Beam repository 
(https://github.com/apache/beam/issues)

—
Alexey

> On 31 Mar 2023, at 07:48, Juan Manuel montoya  
> wrote:
> 
> Dear dev debezium,
> 
> I am writing to inform you about an error I encountered while attempting to 
> use Debezium with Apache Beam. Specifically, I received an "Unsupported 
> signal: 2" error message, which appears to indicate that Debezium was unable 
> to connect with the Java runtime environment.
> 
> After investigating this issue further, I discovered that Debezium requires 
> Java 8 or OpenJDK 11, while Apache Beam recommends Java 8 or later. 
> Therefore, it is essential to ensure that the correct version of Java is 
> installed and configured correctly on the system.
> 
> I am currently working on resolving this issue and will keep you updated on 
> any progress I make. In the meantime, please let me know if you have any 
> suggestions or recommendations for resolving this problem.
> 
> Thank you for your attention to this matter.
> 
> Best regards,
> Juan Manuel Buritica Montoya
> i am from colombia. Gracias!



Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-07 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
(Java SDK v8/v11/v17, Spark 3.x runner).

---
Alexey

> On 7 Mar 2023, at 07:38, Ahmet Altay via dev  wrote:
> 
> +1 (binding) - I validated python quickstarts on direct & dataflow runners.
> 
> Thank you for doing the release!
> 
> On Sat, Mar 4, 2023 at 8:01 AM Chamikara Jayalath via dev 
> mailto:dev@beam.apache.org>> wrote:
>> +1 (binding)
>> 
>> Validated multi-language Java and Python pipelines. 
>> 
>> On Fri, Mar 3, 2023 at 1:59 PM Danny McCormick via dev > > wrote:
>>> > I have encountered a failure in a Python pipeline running with Runner v1: 
>>> 
>>> > RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow Python 
>>> > worker version 2.45.0. Please check Dataflow worker startup logs and make 
>>> > sure that correct version of Beam SDK is installed.
>>> 
>>> > We should understand why Python ValidatesRunner tests (which have passed) 
>>> >  didn't catch this error.
>>> 
>>> > This can be remediated in Dataflow containers without  changes to the 
>>> > release candidate.
>>> 
>>> Good catch! I've kicked off a release to fix this, it should be done later 
>>> this evening - I won't be available when it completes, but I would expect 
>>> it to be around 5:00 PST.
>>> 
>>> On Fri, Mar 3, 2023 at 3:49 PM Danny McCormick >> > wrote:
 Hey Reuven, could you provide some more context on the bug/why it is 
 important? Does it meet the standard in 
 https://beam.apache.org/contribute/release-guide/#7-triage-release-blocking-issues-in-github?
 
 The release branch was cut last Wednesday, so that is why it is not 
 included.
>> 
>> Seems like this was a revert of a previous commit that was also not included 
>> in the 2.46.0 release branch (https://github.com/apache/beam/pull/25627) ? 
>> 
>> If so we might not need a new RC but good to confirm.
>> 
>> Thanks,
>> Cham
>> 
 
 On Fri, Mar 3, 2023 at 3:24 PM Reuven Lax >>> > wrote:
> If possible, I would like to see if we could include 
> https://github.com/apache/beam/pull/25642 as we believe this bug has been 
> impacting multiple users. This was merged 4 days ago, but this RC cut 
> does not seem to include it.
> 
> On Fri, Mar 3, 2023 at 12:18 PM Valentyn Tymofieiev via dev 
> mailto:dev@beam.apache.org>> wrote:
>> I have encountered a failure in a Python pipeline running with Runner 
>> v1: 
>> 
>> RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow 
>> Python worker version 2.45.0. Please check Dataflow worker startup logs 
>> and make sure that correct version of Beam SDK is installed.
>> 
>> We should understand why Python ValidatesRunner tests (which have 
>> passed)  didn't catch this error.
>> 
>> This can be remediated in Dataflow containers without  changes to the 
>> release candidate.
>> 
>> On Fri, Mar 3, 2023 at 11:22 AM Robert Bradshaw via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> +1 (binding).
>>> 
>>> I verified that the artifacts and signatures all look good, all the
>>> containers are pushed, and tested some pipelines with a fresh install
>>> from one of the Python wheels.
>>> 
>>> On Fri, Mar 3, 2023 at 11:13 AM Danny McCormick
>>> mailto:dannymccorm...@google.com>> wrote:
>>> >
>>> > > The released artifacts seem to be missing the last commit at
>>> > > https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
>>> > > . Is this essential to the release, or just useful for validating 
>>> > > it?
>>> >
>>> > It's strictly a test infrastructure change, it has no functional 
>>> > impact. For context, the changes included were from 
>>> > https://github.com/apache/beam/pull/25661 and 
>>> > https://github.com/apache/beam/pull/25654, both were keeping 
>>> > integration tests from running correctly.
>>> 
>>> Thanks.
>>> 
>>> > On Fri, Mar 3, 2023 at 2:09 PM Robert Bradshaw >> > > wrote:
>>> >>
>>> >> The released artifacts seem to be missing the last commit at
>>> >> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
>>> >> . Is this essential to the release, or just useful for validating it?
>>> >>
>>> >> On Fri, Mar 3, 2023 at 11:02 AM Danny McCormick
>>> >> mailto:dannymccorm...@google.com>> wrote:
>>> >> >
>>> >> > Thanks for calling that out, and thanks for helping me fix it! We 
>>> >> > should be all set now
>>> >> >
>>> >> > On Fri, Mar 3, 2023 at 1:38 PM Robert Bradshaw 
>>> >> > mailto:rober...@google.com>> wrote:
>>> >> >>
>>> >> >> It appears your public key is not published in
>>> >> >> https://dist.apache.org/repos/dist/release/beam/KEYS .
>>> >> >>
>>> >> >> On Fri, Mar 3, 2023 at 

[NOTICE] Deprecation Avro classes in "core" and use "extensions/avro" instead for Java SDK

2023-02-22 Thread Alexey Romanenko
Hi all,

As a part of migration the Avro-related classes from Java SDK “core” module to 
a dedicated extension [1] (as it was discussed here [2] and here [3]), two 
important PRs has been merged [4][5]. Therefore, old Avro-related classes 
became deprecated in “core” (still possible to use but not recommended) and all 
other Beam modules, that depended on them, switched to use "extensions/avro” 
instead.

We did our best to make this change smooth, compatible and not breaking but, 
since it was one of the oldest part of “core”, then everything, unfortunatelly, 
is possible and we probably could miss something despite of all efforts. So, 
considering that, I’d like to ask community to run any kind of tests or 
pipelines that utilise, for example, AvroCoder or AvroUtils or any other 
related Avro classes and check if new changes doesn’t break something and 
everything works as expected.

—
Alexey

[1] https://github.com/apache/beam/issues/24292
[2] https://lists.apache.org/thread/mz8hvz8dwhd0tzmv2lyobhlz7gtg4gq7
[3] https://lists.apache.org/thread/47oz1mlwj0orvo1458v5pw5c20bwt08q
[4] https://github.com/apache/beam/pull/24992
[5] https://github.com/apache/beam/pull/25534




Re: Broken Jenkins jobs

2023-02-20 Thread Alexey Romanenko
Thanks for taking care of this, Yi!

—
Alexey

> On 20 Feb 2023, at 20:21, Yi Hu via dev  wrote:
> 
> Hi Alexy,
> 
> Thanks for raising this. The breaking change is found: 
> https://github.com/apache/beam/pull/25566
> 
> Best,
> Yi
> 
> On Mon, Feb 20, 2023 at 12:44 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> Hi all,
>> 
>> Jenkins jobs “beam_PreCommit_SQL_Java11_Commit” [1] and 
>> “beam_PreCommit_SQL_Java17_Commit” [2] seems are broken since Feb 17th.
>> 
>> Anyone is looking into this ?
>> 
>> —
>> Alexey
>> 
>> [1] https://ci-beam.apache.org/job/beam_PreCommit_SQL_Java11_Commit/
>> [2] https://ci-beam.apache.org/job/beam_PreCommit_SQL_Java17_Commit/



Broken Jenkins jobs

2023-02-20 Thread Alexey Romanenko
Hi all,

Jenkins jobs “beam_PreCommit_SQL_Java11_Commit” [1] and 
“beam_PreCommit_SQL_Java17_Commit” [2] seems are broken since Feb 17th.

Anyone is looking into this ?

—
Alexey

[1] https://ci-beam.apache.org/job/beam_PreCommit_SQL_Java11_Commit/
[2] https://ci-beam.apache.org/job/beam_PreCommit_SQL_Java17_Commit/

[ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-16 Thread Alexey Romanenko
Hi all,

Please join me and the rest of the Beam PMC in welcoming Jan Lukavský 
 as our newest PMC member.

Jan has been a part of Beam community and a long time contributor since 2018 in 
many significant ways, including code contributions in different areas, 
participating in technical discussions, advocating for users, giving a talk at 
Beam Summit and even writing one of the few Beam books!

Congratulations Jan and thanks for being a part of Apache Beam!

---
Alexey

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-13 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
(Java SDK v8/v11/v17, Spark 3.x runner).

---
Alexey

> On 13 Feb 2023, at 17:54, Ahmet Altay via dev  wrote:
> 
> +1 (binding) - I validated python quick starts on direct runner and python 
> streaming quickstart on dataflow.
> 
> Thank you!
> 
> On Mon, Feb 13, 2023 at 5:17 AM Bruno Volpato via dev  > wrote:
>> +1 (non-binding)
>> 
>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates (Java 
>> SDK 11, Dataflow runner).
>> 
>> 
>> Thanks!
>> 
>> On Mon, Feb 13, 2023 at 1:13 AM Chamikara Jayalath via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> +1 (binding)
>>> 
>>> Tried several Java and Python multi-language pipelines.
>>> 
>>> Thanks,
>>> Cham
>>> 
>>> On Fri, Feb 10, 2023 at 1:52 PM Luke Cwik via dev >> > wrote:
 +1
 
 Validated release artifact signatures and verified the Java Flink and 
 Spark quickstarts.
 
 On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev >>> > wrote:
> Addendum to above email. 
> 
> Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362
> 
> On Fri, Feb 10, 2023 at 11:14 AM John Casey  > wrote:
>> Hi everyone,
>> Please review and vote on the release candidate #3 for the version 
>> 2.45.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> 
>> Reviewers are encouraged to test their own use cases with the release 
>> candidate, and vote +1 if no issues are found.
>> 
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org 
>>  [2], which is signed with the key with 
>> fingerprint 921F35F5EC5F5DDE [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.45.0-RC1" [5],
>> * website pull request listing the release [6], the blog post [6], and 
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle GRADLE_VERSION and 
>> OpenJDK/Oracle JDK JDK_VERSION.
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org  [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev 
>>  [9]
>> * Validation sheet with a tab for 2.45.0release to help with validation 
>> [10].
>> * Docker images published to Docker Hub [11].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> For guidelines on how to try the release in your projects, check out our 
>> blog post at /blog/validate-beam-release/.
>> 
>> Thanks,
>> John Casey
>> 
>> [1] https://github.com/apache/beam/milestone/8
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.45.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4] 
>> https://repository.apache.org/content/repositories/orgapachebeam-1293/
>> [5] https://github.com/apache/beam/tree/v2.45.0-RC1
>> [6] https://github.com/apache/beam/pull/25407
>> [7] https://github.com/apache/beam-site/pull/640
>> [8] https://pypi.org/project/apache-beam/2.45.0rc1/
>> [9] 
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.45.0-RC1/go/pkg/beam
>> [10] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2030665842
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image



Re: [BEAM-13261] PR review needed

2023-01-16 Thread Alexey Romanenko
Thanks, I’ll take a look

—
Alexey

> On 14 Jan 2023, at 10:38, Piyush Sagar  wrote:
> 
> Hello Beam Devs,
> 
> Requesting a review for a PR: https://github.com/apache/beam/pull/24851 and 
> Task: https://github.com/apache/beam/issues/21200
> 
> Thanks
> 



Re: [VOTE] Release 2.44.0, release candidate #1

2023-01-11 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
(Java SDK v8/v11/v17, Spark 3 runner).

---
Alexey

> On 11 Jan 2023, at 16:53, Ritesh Ghorse via dev  wrote:
> 
> +1 (non-binding)
> Validated Go Dataframe Transform wrapper on Dataflow runner and Go SDK 
> quickstart on Direct and Dataflow Runner. 
> 
> Thanks!
> 
> On Wed, Jan 11, 2023 at 12:51 AM Anand Inguva via dev  > wrote:
>> I ran the Python word count on DirectRunner and Dataflow Runner. 
>> 
>> Steps:
>> 1. pip install --pre apache_beam in a fresh virtualenv.
>> 2. Run the command Ahmet provided except removing the sdk_location from CMD 
>> args.
>> 
>> The job was successful.   
>> 
>> On Tue, Jan 10, 2023 at 6:48 PM Ahmet Altay via dev > > wrote:
>>> I validated python quick starts (direct, dataflow) X (batch, streaming). I 
>>> ran into an issue with the dataflow batch case, running the wordcount with 
>>> the standard:
>>> 
>>> python -m apache_beam.examples.wordcount \
>>> --output  \
>>> --staging_location  \
>>> --temp_location \
>>> --runner DataflowRunner \
>>> --job_name wordcount-$USER \
>>> --project  \
>>> --num_workers 1 \
>>> --region us-central1 \
>>> --sdk_location apache-beam-2.44.0.zip
>>> 
>>> results in:
>>> 
>>> "/usr/local/lib/python3.10/site-packages/dataflow_worker/shuffle.py", line 
>>> 589, in __enter__ raise RuntimeError(_PYTHON_310_SHUFFLE_ERROR_MESSAGE) 
>>> RuntimeError: This pipeline requires Dataflow Runner v2 in order to run 
>>> with currently used version of Apache Beam on Python 3.10+. Please verify 
>>> that the Dataflow Runner v2 is not disabled in the pipeline options or 
>>> enable it explicitly via: --dataflow_service_option=use_runner_v2. 
>>> Alternatively, downgrade to Python 3.9 to use Dataflow Runner v1.
>>> 
>>> Questions:
>>> - I am not explicitly opting out of runner v2, and this is a standard 
>>> wordcount example, I expected it to just work.
>>> 
>>> Then I tried to add --dataflow_service_option=use_runner_v2 to the above 
>>> wordcount command, which results in the following error:
>>> 
>>> "message": "Dataflow Runner v2 requires a valid FnApi job, Please 
>>> resubmit your job with a valid configuration. Note that if using Templates, 
>>> you may need to regenerate your template with the '--use_runner_v2'."
>>> 
>>> Maybe I am doing something wrong and it is an error on my end. It would be 
>>> good for someone else with python experience to check this.
>>> 
>>> /cc @Valentyn Tymofieiev  
>>> 
>>> Ahmet
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jan 10, 2023 at 10:54 AM Kenneth Knowles >> > wrote:
 I have published a new maven staging repository: 
 https://repository.apache.org/content/repositories/orgapachebeam-1290/
 
 It looks like it has everything, though I did not automate a check. At 
 least there were no errors during publish which I ran with --no-parallel 
 overnight, and some specific things that were missing from 
 orgapachebeam-1289 are present.
 
 I will restart the 72 hour waiting period, since the RC is only now usable.
 
 Kenn
 
 On Mon, Jan 9, 2023 at 6:51 PM Kenneth Knowles >>> > wrote:
> I have discovered that many pom files are missing from the nexus 
> repository. I should be able to re-publish a new one. It will take some 
> time as this is one of the longest-running processes.
> 
> On Mon, Jan 9, 2023 at 1:42 PM Kenneth Knowles  > wrote:
>> Correction: this is release candidate #1.
>> 
>> On Mon, Jan 9, 2023 at 1:25 PM Kenneth Knowles > > wrote:
>>> Hi everyone,
>>> 
>>> Please review and vote on the release candidate #3 for the version 
>>> 2.44.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if
>>> no issues are found.
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint 6ED551A8AE02461C [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.44.0-RC1" [5],
>>> * website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_232.
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI [8].
>>> * Go artifacts and documentation are available at pkg.go.dev 
>>> 

Re: How to write an IO guide draft

2023-01-10 Thread Alexey Romanenko
I doubt that it will be a "de-facto" standard behaviour for all runners in the 
short term until the cross-language funtionality brings additional complexity 
into pipeline deployment and performance overhead. 

Perhaps, it will be changed in long term, but for now, I may guess that the 
most of Beam pipelines still use the same SDK IO connectors as a pipeline 
itself.

—
Alexey

> On 10 Jan 2023, at 16:51, Sachin Agarwal via dev  wrote:
> 
> I think the idea of cross language is that an IO is only in one language and 
> others can use that IO. My feeling is that the idea of “what language is this 
> IO in” becomes an implementation detail that folks won’t have to care about 
> longer term. There are enhancements needed to the expansion service to make 
> that happen but that’s my understanding of the strategy. 
> 
> On Tue, Jan 10, 2023 at 7:40 AM Austin Bennett  > wrote:
>> This is great, thanks for putting this together!  
>> 
>> A related question:  are we as a community targeting java to be the 
>> canonical/target IO language if an IO does not currently exist?  If that is 
>> not the case, then I would imagine we are hoping that we might eventually 
>> also wind up with good examples for implementing IOs in other languages as 
>> well [ not suggesting that you/John address that, but that we add GH Issues 
>> as that might be worthwhile to hope others take on ]?
>> 
>> 
>> 
>> On Mon, Jan 9, 2023 at 8:58 AM John Casey via dev > > wrote:
>>> Hi All,
>>> 
>>> I spent the last few weeks of December drafting a "How to write an IO 
>>> guide": 
>>> https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit#
>>> 
>>> and an associated code sample: https://github.com/apache/beam/pull/24799
>>> 
>>> My goal is to make it easier for a new IO developer to create a new IO from 
>>> scratch. This is intended to complement the various standards documents 
>>> that have been floating around. Where those are intended to prescribe 
>>> structure of an IO, this is more focused on the mechanics of internal 
>>> design.
>>> 
>>> Please take a look and let me know what you think,
>>> 
>>> John



Re: Beam Website Feedback

2023-01-09 Thread Alexey Romanenko
Always happy to help!

Many thanks for your work to make Beam website better!

—
Alexey

> On 6 Jan 2023, at 21:54, Alex Kosolapov  wrote:
> 
> Thank you, Ahmet! Happy to help! Both changes [1] and [2] have been reviewed 
> and merged by Alexey Romanenko.
>  
> We wanted to thank Alexey Romanenko, David Huntsperger, Pablo Estrada, Alya 
> Boiko for reviewing and helping to contribute 52 enhancements, fixes and case 
> study related additions for the Beam website in the last 6 months since 
> July’22! [3]
>  
> [1] https://github.com/apache/beam/pull/1
> [2] https://github.com/apache/beam/pull/24747 
> [3] 
> https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Abullet03+is%3Aclosed+merged%3A%3E%3D2022-07-01
>  
> From: Ahmet Altay 
> Date: Tuesday, January 3, 2023 at 2:22 PM
> To: Alex Kosolapov 
> Cc: "dev@beam.apache.org" , Rebecca Szper 
> , Bulat Safiullin , Alexey 
> Romanenko , Rajkumar Gupta 
> 
> Subject: [EXTERNAL] Re: Beam Website Feedback
>  
> Thank you Alex and Bulat for improving this. We all very much appreciate it.
>  
> On Thu, Dec 22, 2022 at 9:21 AM Alex Kosolapov  <mailto:alex.kosola...@akvelon.com>> wrote:
>> Hi all,
>>  
>> We were preparing some improvements for check-links.sh script that is used 
>> for testing Apache Beam website links during the website build with Bulat 
>> (@bullet03 <https://github.com/bullet03>).
>>  
>> We saw several categories of link checks and error statuses:
>> 404 - actual incorrect links - fixed in [1] and [2]
>> Valid links that appear to the script as incorrect, e..g., 9xx status code 
>> for LinkedIn requiring authentication in LinkedIn, some GitHub documentation 
>> links, example links, some Meetup links, etc.
>>  
>> We propose to add a “verified_list” to check_links.sh so that manually 
>> verified links can be skipped in testing. Current verified list includes 15 
>> links based on review of most recent test review. Inconvenience of this 
>> approach is that a verified link may become outdated, and would require an 
>> update of the “verified_list” in check_links.sh. This approach implemented 
>> in [3].
>>  
>> [3] also contains check-links.sh improvements:
>> Added a function that checks and reports Apache Beam staging website links 
>> to prevent the production website from having links to staging
>> Added script checks and reports Apache Beam website absolute links (links of 
>> the form https://beam.apache.org/path) - relative links in the sources are 
>> preferred to properly build and review website staging
>> Added sorting any invalid links by their error code - this may be more 
>> convenient for reviewing output
>>  
>> [4] - optionally, update absolute links to relative links so that a staging 
>> website more closely resembles the production website
>>  
>> We submitted [3] and [4] for PR review and tagged Alexey Romanenko to kindly 
>> help with reviewing these PRs. Please share your comments about proposed 
>> approach in the PRs or list.
>>  
>> [1] https://github.com/apache/beam/pull/24635
>> [2] https://github.com/apache/beam/pull/24744 
>> [3] https://github.com/apache/beam/pull/1 
>> [4] https://github.com/apache/beam/pull/24747 
>>  
>> Thank you,
>> Alex
>>  
>> From: Rebecca Szper via dev > <mailto:dev@beam.apache.org>>
>> Reply-To: "dev@beam.apache.org <mailto:dev@beam.apache.org>" 
>> mailto:dev@beam.apache.org>>, Rebecca Szper 
>> mailto:rsz...@google.com>>
>> Date: Wednesday, December 21, 2022 at 10:15 AM
>> To: Ahmet Altay mailto:al...@google.com>>
>> Cc: Alexey Romanenko > <mailto:aromanenko@gmail.com>>, dev > <mailto:dev@beam.apache.org>>, Rajkumar Gupta > <mailto:rajkumargu...@google.com>>
>> Subject: [EXTERNAL] Re: Beam Website Feedback
>>  
>> Our team doesn't maintain the Beam website infrastructure, but last time 
>> something like this came up, David said that there are consultants that work 
>> on this type of thing. He pinged @bullet03 <https://github.com/bullet03> on 
>> the Beam ticket, who was able to help.
>>  
>> On Tue, Dec 20, 2022 at 5:06 PM Ahmet Altay > <mailto:al...@google.com>> wrote:
>>>  
>>>  
>>> On Tue, Dec 20, 2022 at 1:12 PM Ahmet Altay >> <mailto:al...@google.com>> wrote:
>>>>  
>>>>  
>>>> On Tue, Dec 20, 2022 at 9:14 AM Alexey Romanenko >>> <mailto:aromanenko@gmail.com>> wrote:
>>>>> Thanks Ahmet! I’d prefer

Re: [DISCUSS] Avro dependency update, design doc

2023-01-02 Thread Alexey Romanenko
Here is the recent update on the progress for this topic.

After receiving a feedback on the design document [1] presented to community 
before and having the several discussions after (many thanks for this!), it was 
decided to take an “option 4” (“Move Avro from “core” to generic Avro 
extensions using multiple Avro version specific adapters to handle breaking 
changes”) as a way to move forward. 

We created an umbrella issue to track the progress [2] and the first step 
(“Create Avro extension for Java SDK”) of this [3] is already finished and 
merged. This new created extension (“sdks/java/extensions/avro/") replicates 
the same Avro support behaviour as it's currently implemented in Java SDK 
“core”. It required almost no changes for the current user API (only relaxation 
of access modifiers for several class members and methods to provide an access 
from other packages to them), so it should not introduce any potential breaking 
changes for users, especially if they still use the current Beam Avro's version 
(1.8.2). 

The next step will be to switch all Beam Java modules to use the new Avro 
extension instead of using the “core” Avro classes. Again, we don’t expect any 
user API breaking changes for this step.

Note: As a price for smooth and not breakable transition, we have to support 
two equal versions of Beam Avro's code (in “core" and in “extensions/avro”) 
until the old code will be deprecated (it’s expected to be the third step). So, 
till this, please apply your Java SDK Avro-related changes (if any) in two 
places to keep them in sync.


Also, please, share any of your feedback, questions, ideas or concerns on this 
topic.

 
[1] 
https://docs.google.com/document/d/1tKIyTk_-HhkmVuJsxvWP5eTELESpCBe_Vmb1nJ3Ia34/edit?usp=sharing
[2] https://github.com/apache/beam/issues/24292
[3] https://github.com/apache/beam/issues/24293

—
Alexey



> On 18 Nov 2022, at 15:56, Alexey Romanenko  wrote:
> 
> Since there are no principal objections against the proposed option 2 
> (extract Avro-related code from “core” to Avro extension but keep it in 
> “core” for some time because of transition period), then we will try to move 
> forward and take this path. 
> 
> I’m pretty sure that we will face some hidden issues while working on this, 
> so I’ll keep you posted =)
> 
> —
> Alexey
> 
>> On 11 Nov 2022, at 18:05, Austin Bennett  wrote:
>> 
>> @Moritz: I *think* should be fine, and don't have anything specific to offer 
>> for what might go wrong throughout the process.  :-) :shrug:
>> 
>> 
>> 
>> On Fri, Nov 11, 2022 at 2:07 AM Moritz Mack > <mailto:mm...@talend.com>> wrote:
>>> Thanks a lot for the feedback so far! I can only second Alexey. It was 
>>> painful to come to realize that the only feasible option seems to be 
>>> copying a lot of code during the transition phase.
>>> 
>>> For that reason, it will be critical to be disciplined about the removal of 
>>> the to-be deprecated code in core and, ahead of time, agree on when to 
>>> remove it again. Any thought on how long the transition phase should be?
>>> 
>>>  
>>> 
>>>  I am concerned of what could go wrong for users in the 
>>> in-between/transition state while more slowly transitioning avro to 
>>> extension.
>>> 
>>>  
>>> 
>>> @Austin Do you have any specific concern in mind here?
>>> 
>>> To minimize this risk, we propose that all APIs should be kept as is to 
>>> make the migration as easy as possible and kick off with the Avro version 
>>> used in core. The only thing that changes will be package names.
>>> 
>>>  
>>> 
>>> / Moritz
>>> 
>>>  
>>> 
>>> On 10.11.22, 22:46, "Kenneth Knowles" >> <mailto:k...@apache.org>> wrote:
>>> 
>>>  
>>> 
>>> Thank you for writing this document. It really helps to understand the 
>>> options. I agree that option 2 (make a new extension and deprecate from 
>>> core) seems best. I think +Reuven Lax might have the most context on any 
>>> technical issue we will
>>> 
>>> Thank you for writing this document. It really helps to understand the 
>>> options. I agree that option 2 (make a new extension and deprecate from 
>>> core) seems best. I think +Reuven Lax <mailto:re...@google.com> might have 
>>> the most context on any technical issue we will encounter around schema 
>>> codegen.
>>> 
>>>  
>>> 
>>> Kenn
>>> 
>>>  
>>> 
>>> On Thu, Nov 10, 2022 at 7:24 AM Alexey Romanenko >> <mailto:aromanenko@gmail.com>

Re: Beam Website Feedback

2022-12-20 Thread Alexey Romanenko
Thanks Ahmet! I’d prefer to fix the links as you did and add the redirect from 
old one - perhaps, there are other similar links that have been changed in the 
same way.

Btw, I’m not sure that we still check the broken links as it was before, iirc, 
but probably it would be a good idea to add such check before publishing a 
website.

—
Alexey



> On 20 Dec 2022, at 18:04, Ahmet Altay via dev  wrote:
> 
> I did a search and found a few places with the broken link. Correct links 
> should be: https://beam.apache.org/get-started/resources/videos-and-podcasts/
> 
> I created a PR to update the website 
> (https://github.com/apache/beam/pull/24733). I do not know if that is the 
> best solution. As an alternative we could consider setting up a redirect for 
> the old link. We do not know who else would be still linking to the old one.
> 
> Ahmet
> 
> On Tue, Dec 20, 2022 at 8:52 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> Hi Rajkumar,
>> 
>> Could you specify where (which page) this link was found?
>> Thanks!
>> 
>> —
>> Alexey
>> 
>>> On 20 Dec 2022, at 10:08, Rajkumar Gupta via dev >> <mailto:dev@beam.apache.org>> wrote:
>>> 
>>> Hi Team,
>>> 
>>> Just a minor point, while browsing the site I noticed that the link below 
>>> is not working. Can you please check? 
>>> https://beam.apache.org/documentation/resources/videos-and-podcasts 
>>> 
>>> Regards,
>>> Raj
>>> 
>>> -- 
>>> Rajkumar Gupta | Technical Solutions Engineer - Google Cloud Platform | 
>>> rajkumargu...@google.com <mailto:rfol...@google.com> | +91-9223541460 
>>> 



Re: Beam Website Feedback

2022-12-20 Thread Alexey Romanenko
Hi Rajkumar,

Could you specify where (which page) this link was found?
Thanks!

—
Alexey

> On 20 Dec 2022, at 10:08, Rajkumar Gupta via dev  wrote:
> 
> Hi Team,
> 
> Just a minor point, while browsing the site I noticed that the link below is 
> not working. Can you please check? 
> https://beam.apache.org/documentation/resources/videos-and-podcasts 
> 
> Regards,
> Raj
> 
> -- 
> Rajkumar Gupta | Technical Solutions Engineer - Google Cloud Platform | 
> rajkumargu...@google.com  | +91-9223541460



Re: [Proposal] Adopt a Beam I/O Standard

2022-12-15 Thread Alexey Romanenko
Cham, do you remember what was a reason to not finalise that doc? 

Personally, I find having such standards very useful (if they are flexible 
during a time, of course), especially for new developers and PR reviewers, and 
it’d be great to finally have such doc as a part of contribution guide.

—
Alexey  

> On 13 Dec 2022, at 04:32, Chamikara Jayalath via dev  
> wrote:
> 
> Yeah, I don't think either finalized or documented (in the Website) the 
> previous iteration. This doc seems to contain details from the documents 
> shared in the previous iteration.
> 
> Thanks,
> Cham
> 
>  
> 
> On Mon, Dec 12, 2022 at 6:49 PM Robert Burke  <mailto:rob...@frantil.com>> wrote:
>> I think ultimately: until the docs a clearly available on the Beam site 
>> itself, it's not documentation. See also, design docs, previous emails, and 
>> similar.
>> 
>> On Mon, Dec 12, 2022, 6:07 PM Andrew Pilloud via dev > <mailto:dev@beam.apache.org>> wrote:
>>> I believe the previous iteration was here: 
>>> https://lists.apache.org/thread/3o8glwkn70kqjrf6wm4dyf8bt27s52hk
>>> 
>>> The associated docs are:
>>> https://s.apache.org/beam-io-api-standard-documentation
>>> https://s.apache.org/beam-io-api-standard
>>> 
>>> This is missing all the relational stuff that was in those docs, this 
>>> appears to be another attempt starting from the beginning?
>>> 
>>> Andrew
>>> 
>>> 
>>> On Mon, Dec 12, 2022 at 9:57 AM Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>>> Thanks for writing this!
>>>> 
>>>> IIRC, the similar design doc was sent for review here a while ago. Is this 
>>>> just an updated version and a new one?
>>>> 
>>>> —
>>>> Alexey
>>>> 
>>>>> On 11 Dec 2022, at 15:16, Herman Mak via dev >>>> <mailto:dev@beam.apache.org>> wrote:
>>>>> 
>>>>> Hello Everyone,
>>>>> 
>>>>> *TLDR*
>>>>> 
>>>>> Should we adopt a set of standards that Connector I/Os should adhere to? 
>>>>> Attached is a first version of a Beam I/O Standards guideline that 
>>>>> includes opinionated best practices across important components of a 
>>>>> Connector I/O, namely Documentation, Development and Testing. 
>>>>> 
>>>>> *The Long Version*
>>>>> 
>>>>> Apache Beam is a unified open-source programming model for both batch and 
>>>>> streaming. It runs on multiple platform runners and integrates with over 
>>>>> 50 services using individually developed I/O Connectors 
>>>>> <https://beam.apache.org/documentation/io/connectors/>. 
>>>>> 
>>>>> Given that Apache Beam connectors are written by many different 
>>>>> developers and at varying points in time, they vary in syntax style, 
>>>>> documentation completeness and testing done. For a new adopter of Apache 
>>>>> Beam, that can definitely cause some uncertainty.
>>>>> 
>>>>> So should we adopt a set of standards that Connector I/Os should adhere 
>>>>> to? 
>>>>> Attached is a first version, in Doc format, of a Beam I/O Standards 
>>>>> guideline that includes opinionated best practices across important 
>>>>> components of a Connector I/O, namely Documentation, Development and 
>>>>> Testing. And the aim is to incorporate this into the documentation and to 
>>>>> have it referenced as standards for new Connector I/Os (and ideally have 
>>>>> existing Connectors upgraded over time). If it looks helpful, the 
>>>>> immediate next step is that we can convert it into a .md as a PR into the 
>>>>> Beam repo!
>>>>> 
>>>>> Thanks and looking forward to feedbacks and discussion,
>>>>> 
>>>>>  [PUBLIC] Beam I/O Standards 
>>>>> <https://docs.google.com/document/d/1BCTpSZDUjK90hYZjcn8aAnPd9vuRfj8YU1j3mpSgRwI/edit?usp=drive_web>
>>>>> 
>>>>> 
>>>>> Herman Mak |   Customer Engineer, Hong Kong, Google Cloud |
>>>>> herman...@google.com <mailto:herman...@google.com> |+852-3923-5417 
>>>>> 
>>>>> 
>>>>> 
>>>> 



Re: [Proposal] | Move FileIO and TextIO from :sdks:java:core to :sdks:java:io:file

2022-12-14 Thread Alexey Romanenko
On 12 Dec 2022, at 22:23, Robert Bradshaw via dev  wrote:
> 
> Saving up all the breaking changes until a major release definitely
> has its downsides (look at Python 3). The migration path is often as
> important (if not more so) than the final destination.

Actually, it proves that the major releases should not be delayed for a long 
period of time and should be issued more often to reduce the number of breaking 
changes (that, of course, likely may happen). That will help users to do much 
more smooth and less risky upgrades, and developers to not keep burden forever. 
Beam 2.0.0 was released back in may 2017 and we've almost never talked about 
Beam 3.0 and what are the criteria for it. I understand that it’s a completely 
different discussion but seems that this time has come =)

> As for this particular change, I would question how the benefit (it's
> unclear what the exact benefit is--better internal organization?)
> exceeds the pain of making every user refactor their code. I think a
> stronger case can be made for things like the Avro dependency that
> cause real pain.

Agree. I think that if it doesn’t bring any pain with additional external 
dependecies and this code is used in almost every other SDK module, then there 
are no reasons for such breaking changes. On the other hand, Avro case, that 
you mentioned above, is a good example why sometimes it would be better to keep 
such code outside of “core”.

> As for the pipeline update feature, we've long discussed having
> "pick-your-implementation" transforms that specify alternative,
> equivalent implementations. Upgrades can choose the old one whereas
> new pipelines can get the latest and greatest. It won't solve all
> issues, and requires keeping old codepaths around, but could be an
> important step forward.
> 
> On Mon, Dec 12, 2022 at 10:20 AM Kenneth Knowles  wrote:
>> 
>> I agree with Mortiz. To answer a few specifics in my own words:
>> 
>> - It is a perfectly sensible refactor, but as a counterpoint without 
>> file-based IO the SDK isn't functional so it is also a reasonable design 
>> point to have this included. There are other things in the core SDK that are 
>> far less "core" and could be moved out with greater benefit. The main goal 
>> for any separation of modules would be lighter weight transitive 
>> dependencies, IMO.
>> 
>> - No, Beam has not made any deliberate breaking changes of this nature. 
>> Hence we are still on major version 2. We have made some bugfixes for data 
>> loss risks that could be called "breaking changes" but since the feature was 
>> unsafe to use in the first place we did not bump the major version.
>> 
>> - It is sometimes possible to do such a refactor and have the deprecated 
>> location proxy to the new location. In this case that seems hard to achieve.
>> 
>> - It is not actually necessary to maintain both locations, as we can declare 
>> the old location will be unmaintained (but left alone) and all new 
>> development goes to the new location. That isn't a great choice for users 
>> who may simply upgrade their SDK version and not notice that their old code 
>> is now pointing at a version that will not receive e.g. security updates.
>> 
>> - I like the style where if/when we transition from Beam 2 to Beam 3 we 
>> should have the exact functionality of Beam 3 available as an opt-in flag 
>> first. So if a user passes --beam-3 they get exactly what will be the 
>> default functionality when we bump the major version. It really is a problem 
>> to do a whole bunch of stuff feverishly before a major version bump. The 
>> other style that I think works well is the linux kernel style where major 
>> versions alternate between stable and unstable (in other words, returning to 
>> the 0.x style with every alternating version).
>> 
>> - I do think Beam suffers from fear and inability to do significant code 
>> gardening. I don't think backwards compatibility in the code sense is the 
>> biggest blocker. I think the "pipeline update" feature is perhaps the thing 
>> most holding Beam back from making radical rapid forward progress.
>> 
>> Kenn
>> 
>> On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack  wrote:
>>> 
>>> Hi Damon,
>>> 
>>> 
>>> 
>>> I fear the current release / versioning strategy of Beam doesn’t lend 
>>> itself well for such breaking changes. Alexey and I have spent quite some 
>>> time discussing how to proceed with the problematic Avro dependency in core 
>>> (and respectively AvroIO, of course).
>>> 
>>> Such changes essentially always require duplicating code to continue 
>>> supporting a deprecated legacy code path to not break users’ code. But this 
>>> comes at a very high price. Until the deprecated code path can be finally 
>>> removed again, it must be maintained in two places.
>>> 
>>> Unfortunately, the removal of deprecated code is rather problematic without 
>>> a major version release as it would break semantic versioning and people’s 
>>> expectations. With that deprecations 

Re: [Proposal] Adopt a Beam I/O Standard

2022-12-12 Thread Alexey Romanenko
Thanks for writing this!

IIRC, the similar design doc was sent for review here a while ago. Is this just 
an updated version and a new one?

—
Alexey

> On 11 Dec 2022, at 15:16, Herman Mak via dev  wrote:
> 
> Hello Everyone,
> 
> *TLDR*
> 
> Should we adopt a set of standards that Connector I/Os should adhere to? 
> Attached is a first version of a Beam I/O Standards guideline that includes 
> opinionated best practices across important components of a Connector I/O, 
> namely Documentation, Development and Testing. 
> 
> *The Long Version*
> 
> Apache Beam is a unified open-source programming model for both batch and 
> streaming. It runs on multiple platform runners and integrates with over 50 
> services using individually developed I/O Connectors 
> . 
> 
> Given that Apache Beam connectors are written by many different developers 
> and at varying points in time, they vary in syntax style, documentation 
> completeness and testing done. For a new adopter of Apache Beam, that can 
> definitely cause some uncertainty.
> 
> So should we adopt a set of standards that Connector I/Os should adhere to? 
> Attached is a first version, in Doc format, of a Beam I/O Standards guideline 
> that includes opinionated best practices across important components of a 
> Connector I/O, namely Documentation, Development and Testing. And the aim is 
> to incorporate this into the documentation and to have it referenced as 
> standards for new Connector I/Os (and ideally have existing Connectors 
> upgraded over time). If it looks helpful, the immediate next step is that we 
> can convert it into a .md as a PR into the Beam repo!
> 
> Thanks and looking forward to feedbacks and discussion,
> 
>  [PUBLIC] Beam I/O Standards 
> 
> 
> 
> Herman Mak |   Customer Engineer, Hong Kong, Google Cloud |
> herman...@google.com  |+852-3923-5417 
> 
> 
> 



Re: Issue with website update and Jenkins

2022-11-30 Thread Alexey Romanenko
Thanks, Kenn!

> On 30 Nov 2022, at 18:59, Kenneth Knowles  wrote:
> 
> I filed https://issues.apache.org/jira/browse/INFRA-23967
> 
> On Tue, Nov 29, 2022 at 7:53 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> It looks that there is again the same issue with 
>> beam_PostCommit_Website_Publish job - the last successful build was 6 days 
>> ago.
>> 
>> Could someone take a look on this?
>> 
>> Thanks,
>> Alexey
>> 
>>> On 11 Oct 2022, at 19:23, David Huntsperger via dev >> <mailto:dev@beam.apache.org>> wrote:
>>> 
>>> Thank you!
>>> 
>>> On Mon, Oct 10, 2022 at 8:18 AM Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>>> The issue is resolved.
>>>> Thank you to everybody who made this working again!
>>>> 
>>>> ---
>>>> Alexey
>>>> 
>>>>> On 7 Oct 2022, at 15:26, Alexey Romanenko >>>> <mailto:aromanenko@gmail.com>> wrote:
>>>>> 
>>>>> Hi everybody,
>>>>> 
>>>>> The is an issue with updating a content on Beam website. I believe it’s 
>>>>> caused by not running the Jenkins job that publishes the Beam website 
>>>>> into the asf-site branch since 30 Sep 2022 [1] with an error message 
>>>>> “There are no nodes with the label ‘git-websites’”:
>>>>> - https://ci-beam.apache.org/job/beam_PostCommit_Website_Publish/
>>>>> 
>>>>> In it’s order, iinm, it’s caused by the fact that several 
>>>>> “apache-beam-jenkins-*” nodes are offline.
>>>>> 
>>>>> Does anyone aware of this problem and work on this? To whom we need to 
>>>>> address such problems, INFRA?
>>>>> 
>>>>> How we can prevent this silent behaviour in the future? Beam website 
>>>>> publishing job (“beam_PostCommit_Website_Publish”) just got stuck and 
>>>>> hangs for a while.
>>>>> 
>>>>> —
>>>>> Alexey
>>>>> 
>>>> 
>> 



Re: [DISCUSSION][JAVA] Current state of Java 17 support

2022-11-30 Thread Alexey Romanenko

> On 30 Nov 2022, at 03:56, Tomo Suzuki via dev  wrote:
> 
> > Do we still need to support Java 8 SDK?
> 
> Yes, for Google Cloud customers who still use Java 8, I want Apache Beam to 
> support Java 8. Do you observe any special burden maintaining Java 8?

I can only think of the additional resources costs if we will test all 
supported JDKs, as Austin mentioned above. Imho, we should do that for all JDK 
that are officially supported.
Another less-costly way is to run the Java tests for all JDKs only during the 
release preparation stage.

I agree that it would make sense to continue to support Java 8 until a 
significant number of users are using it.

—
Alexey


> 
> Regards,
> Tomo
> 
> On Tue, Nov 29, 2022 at 21:48 Austin Bennett  <mailto:aus...@apache.org>> wrote:
>> -1 for ongoing Java8 support [ or, said another way, +1 for dropping support 
>> of Java8 ]
>> 
>> +1 for having tests that run for ANY JDK that we say we support.  Is there 
>> any reason the resources to support are too costly [ or outweigh the 
>> benefits of additional confidence in ensuring we support what we say we do 
>> ]?  I am not certain on whether this would only be critical for releases, or 
>> should be done as part of regular CI.  
>> 
>> On Tue, Nov 29, 2022 at 8:51 AM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>>> Hello,
>>> 
>>> I’m sorry if it’s already discussed somewhere but I find myself a little 
>>> bit lost in the subject. 
>>> So, I’d like to clarify this - what is a current official state of Java 17 
>>> support at Beam?
>>> 
>>> I recall that a great job was done to make Beam compatible with Java 17 [1] 
>>> and Beam already provides “beam_java17_sdk” Docker image [2] but, iiuc, 
>>> Java 8 is still the default JVM to run all Java tests on Jenkins ("Java 
>>> PreCommit" in the first order) and there are only limited number of tests 
>>> that are running with JDK 11 and 17 on Jenkins by dedicated jobs.
>>> 
>>> So, my question would sound like if Beam officially supports Java 17 (and 
>>> 11), do we need to run all Beam Java SDK related tests (VR and IT test 
>>> including) against all supported Java SDKs? 
>>> 
>>> Do we still need to support Java 8 SDK?
>>> 
>>> In the same time, as we are heading to move everything from Jenkins to 
>>> GitHub actions, what would be the default JDK there or we will run all 
>>> Java-related actions against all supported JDKs?
>>> 
>>> —
>>> Alexey
>>> 
>>> [1] https://issues.apache.org/jira/browse/BEAM-12240
>>> [2] https://hub.docker.com/r/apache/beam_java17_sdk
>>> 
>>> 
>>> 
> -- 
> Regards,
> Tomo



Re: Issue with website update and Jenkins

2022-11-29 Thread Alexey Romanenko
It looks that there is again the same issue with 
beam_PostCommit_Website_Publish job - the last successful build was 6 days ago.

Could someone take a look on this?

Thanks,
Alexey

> On 11 Oct 2022, at 19:23, David Huntsperger via dev  
> wrote:
> 
> Thank you!
> 
> On Mon, Oct 10, 2022 at 8:18 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> The issue is resolved.
>> Thank you to everybody who made this working again!
>> 
>> ---
>> Alexey
>> 
>>> On 7 Oct 2022, at 15:26, Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>> 
>>> Hi everybody,
>>> 
>>> The is an issue with updating a content on Beam website. I believe it’s 
>>> caused by not running the Jenkins job that publishes the Beam website into 
>>> the asf-site branch since 30 Sep 2022 [1] with an error message “There are 
>>> no nodes with the label ‘git-websites’”:
>>> - https://ci-beam.apache.org/job/beam_PostCommit_Website_Publish/
>>> 
>>> In it’s order, iinm, it’s caused by the fact that several 
>>> “apache-beam-jenkins-*” nodes are offline.
>>> 
>>> Does anyone aware of this problem and work on this? To whom we need to 
>>> address such problems, INFRA?
>>> 
>>> How we can prevent this silent behaviour in the future? Beam website 
>>> publishing job (“beam_PostCommit_Website_Publish”) just got stuck and hangs 
>>> for a while.
>>> 
>>> —
>>> Alexey
>>> 
>> 



[DISCUSSION][JAVA] Current state of Java 17 support

2022-11-29 Thread Alexey Romanenko
Hello,

I’m sorry if it’s already discussed somewhere but I find myself a little bit 
lost in the subject. 
So, I’d like to clarify this - what is a current official state of Java 17 
support at Beam?

I recall that a great job was done to make Beam compatible with Java 17 [1] and 
Beam already provides “beam_java17_sdk” Docker image [2] but, iiuc, Java 8 is 
still the default JVM to run all Java tests on Jenkins ("Java PreCommit" in the 
first order) and there are only limited number of tests that are running with 
JDK 11 and 17 on Jenkins by dedicated jobs.

So, my question would sound like if Beam officially supports Java 17 (and 11), 
do we need to run all Beam Java SDK related tests (VR and IT test including) 
against all supported Java SDKs? 

Do we still need to support Java 8 SDK?

In the same time, as we are heading to move everything from Jenkins to GitHub 
actions, what would be the default JDK there or we will run all Java-related 
actions against all supported JDKs?

—
Alexey

[1] https://issues.apache.org/jira/browse/BEAM-12240
[2] https://hub.docker.com/r/apache/beam_java17_sdk





Re: Beam, Flink state and Avro Schema Evolution is problematic

2022-11-23 Thread Alexey Romanenko
+ dev

Many thanks for sharing your observations and findings on this topic, Cristian!
I copy it to dev@ as well to attract more attention to this problem.

—
Alexey


> On 18 Nov 2022, at 18:21, Cristian Constantinescu  wrote:
> 
> Hi everyone,
> 
> I'm using Beam on Flink with Avro generated records. If the record
> schema changes, the Flink state cannot be restored. I just want to
> send this email out for anyone who may need this info in the future
> and also ask others for possible solutions as this problem is so
> easily hit, that I'm having a hard time figuring out what other users
> of Beam running on the Flink runner are doing to circumvent it.
> 
> The in-depth discussion of the issue can be found here [1] (thanks
> Maximilian). There are also a few more emails about this here [2], and
> here [3].
> 
> The gist of the issue is that Beam serializes the coders used into the
> Flink state, and some of those coders hold references to the
> Bean/Pojos/Java classes they serialize/deserialize to. Flink
> serializes its state using Java serialization, that means that in the
> Flink state we will get a reference to the Bean/Pojo/Java class name
> and the related serialVersionUID. When the pojo (Avro generated)
> changes, so does its serialVersionUID, and Flink cannot deserialize
> the Beam state anymore because the serialVersionUID doesn't match, not
> on the Coder, but on the Pojo type that coder was holding when it got
> serialized.
> 
> I decided to try each coder capable of handling Pojos, one by one, to
> see if any would work. That is, I tried the SerializableCoder,
> AvroCoder and the SchemaCoder/RowCoder. In the case of AvroCoder and
> SerializableCoder, I have used the SpecificRecord version (not the
> GenericRecord one) and the non-Row (ie: the one that returns a Pojo
> type, not Row type) version respectively. They all failed the below
> test (added it to be very explicit, but really, it's just simple
> schema evolution).
> 
> Test:
> 1. Create a avro pojo (idl generated pojo):
> record FooRecord {
> union {null, string} dummy1 = null;
> }
> 2. Create a pipeline with a simple stateful DoFn, set desired coder
> for FooRecord (I tried the SerializableCoder, AvroCoder and the
> SchemaCoder/RowCoder), and populate state with a few FooRecord
> objects.
> 3. Start the pipeline
> 4. Stop the pipeline with a savepoint.
> 5. Augment FooRecord to add another field after dummy1.
> 6. Start the pipeline restoring from the saved savepoint.
> 7. Observed this exception when deserializing the savepoint -->
> "Caused by: java.io.InvalidClassException: com.mymodels.FooRecord;
> local class incompatible: stream classdesc serialVersionUID =  number>, local class serialVersionUID = "
> 
> There are a few workarounds.
> 
> Workaround A:
> Right now my working solution is to implement what was suggested by
> Pavel (thanks Pavel) in [3]. Quote from him "having my business
> logic-related POJOs still Avro-generated, but I introduced another,
> generic one, which just stores schema & payload bytes, and does not
> need to change. then using a DelegateCoder that converts the POJO
> to/from that generic schema-bytes pojo that never changes".
> 
> Basically something like this (pseudocode):
> record FlinkStateValue {
> string schema;
> bytes value;
> }
> 
> var delegateCoder = DelegateCoder.of(
> AvroCoder.of(FlinkStateValue.class),
> (FooRecord in) ->
> FlinkStateValue.setSchema(FooRecord.getSchema()).setValue(AvroCoder.of(FooRecord.class).encode(in)),
> (FlinkStateValue in) -> return
> AvroCoder.of(FooRecord.class).decode(in.getValue())
> ) ;
> 
> p.getCoderRegistry().registerCoderForClass(FooRecord.class, delegateCoder)
> 
> The downside is that now there's yet another deserialization step,
> which wastes CPU cycles. The upside is that things are decoupled, that
> is, I think the DelegateCoder could use a RowCoder.of(FooRecord)
> instead of the AvroCoder.of(FooRecord), or any other coder for that
> matter and you can change between them with only a code change.
> 
> Workaround B:
> Difficulty hard! Use the Flink state api [4] and update the Beam
> serialized state to modify the FooRecord serialVersionUID stored in
> that state to the new one after the schema evolution, then save the
> state and start your pipeline with the evolved FooRecord.
> 
> Workaround C:
> Wrap the Avro generated FooRecord to a real Pojo or AutoValue or
> anything that you have full control over serialVersionUID, and use
> that in your pipeline especially when putting things into the state.
> 
> Problem arises when the Avro generated records have lots of properties
> and or nested records. It becomes tedious to essentially duplicate
> them to Pojo/AutoValue.
> 
> Conclusion:
> I want to end by asking advice from the community. For those of you
> who use Beam with Avro records running on the Flink runner, how do you
> handle state when the Avro schema inevitably evolves?
> 
> It just seems like it's such a simple use case and such an easy
> 

Re: [DISCUSS] Avro dependency update, design doc

2022-11-18 Thread Alexey Romanenko
Since there are no principal objections against the proposed option 2 (extract 
Avro-related code from “core” to Avro extension but keep it in “core” for some 
time because of transition period), then we will try to move forward and take 
this path. 

I’m pretty sure that we will face some hidden issues while working on this, so 
I’ll keep you posted =)

—
Alexey

> On 11 Nov 2022, at 18:05, Austin Bennett  wrote:
> 
> @Moritz: I *think* should be fine, and don't have anything specific to offer 
> for what might go wrong throughout the process.  :-) :shrug:
> 
> 
> 
> On Fri, Nov 11, 2022 at 2:07 AM Moritz Mack  <mailto:mm...@talend.com>> wrote:
>> Thanks a lot for the feedback so far! I can only second Alexey. It was 
>> painful to come to realize that the only feasible option seems to be copying 
>> a lot of code during the transition phase.
>> 
>> For that reason, it will be critical to be disciplined about the removal of 
>> the to-be deprecated code in core and, ahead of time, agree on when to 
>> remove it again. Any thought on how long the transition phase should be?
>> 
>>  
>> 
>>  I am concerned of what could go wrong for users in the 
>> in-between/transition state while more slowly transitioning avro to 
>> extension.
>> 
>>  
>> 
>> @Austin Do you have any specific concern in mind here?
>> 
>> To minimize this risk, we propose that all APIs should be kept as is to make 
>> the migration as easy as possible and kick off with the Avro version used in 
>> core. The only  thing that changes will be package names.
>> 
>>  
>> 
>> / Moritz
>> 
>>  
>> 
>> On 10.11.22, 22:46, "Kenneth Knowles" > <mailto:k...@apache.org>> wrote:
>> 
>>  
>> 
>> Thank you for writing this document. It really helps to understand the 
>> options. I agree that option 2 (make a new extension and deprecate from 
>> core) seems best. I think +Reuven Lax might have the most context on any 
>> technical issue we will
>> 
>> Thank you for writing this document. It really helps to understand the 
>> options. I agree that option 2 (make a new extension and deprecate from 
>> core) seems best. I think +Reuven Lax <mailto:re...@google.com> might have 
>> the most context on any technical issue we will encounter around schema 
>> codegen.
>> 
>>  
>> 
>> Kenn
>> 
>>  
>> 
>> On Thu, Nov 10, 2022 at 7:24 AM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>> 
>> Personally, I think that keeping two mostly identical versions of 
>> Avro-related code in two different places (“core" and "extension") is rathe 
>> bad practice, especially, in case of need to fix some issues there - though, 
>> it’s a very low risk there since this code is quite mature and it’s not 
>> touched often. On the other hand, it should give time for users (several 
>> Beam releases) to update their code and use Avro from extension artifact 
>> instead of core.
>> 
>>  
>> 
>> Though, if we accept that this breaking change at compile time is allowable, 
>> then this process of transition should be much faster and can be performed 
>> within only one Beam release. Our main concern here is runtime breaking 
>> changes that we can miss but must be avoided by all means. 
>> 
>>  
>> 
>> —
>> 
>> Alexey
>> 
>> 
>> 
>> 
>> On 9 Nov 2022, at 18:47, Austin Bennett > <mailto:aus...@apache.org>> wrote:
>> 
>>  
>> 
>> Being tied to a specific version of a dependency, and esp. one that is 
>> not-[actually-long-term]critical, sounds like a problem.  It doesn't seem 
>> like Avro needs to be in core.  I am in favor of about any path someone 
>> wants to address towards removing that from core [ #2 in the design doc 
>> seems reasonable ].  
>> 
>>  
>> 
>> Naturally, having ways to more easily change versions [esp. to remediate 
>> CVEs, but for any specific reason ], seems very valuable.
>> 
>>  
>> 
>> It reads as a significant problem; I wouldn't take issue with a breaking [ 
>> compile time ] change, if that got things addressed and somewhat 
>> straightforwardly - I am concerned of what could go wrong for users in the 
>> in-between/transition state while more slowly transitioning avro to 
>> extension.
>> 
>>  
>> 
>> On Wed, Nov 9, 2022 at 5:43 AM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>> 
>> Any thought

Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-15 Thread Alexey Romanenko
+1 (binding)

—
Alexey

> On 15 Nov 2022, at 14:37, Ritesh Ghorse via dev  wrote:
> 
> +1 (non-binding)
> 
> Validated Go SDK quickstart on Direct and Dataflow runner. Also validated 
> Dataframe wrapper on Portable and Dataflow runner.
> 
> 
> On Tue, Nov 15, 2022 at 5:17 AM Anand Inguva via dev  > wrote:
>> +1(non-binding)
>> 
>> Validated Python wordcount example on Direct and Dataflow runner. Staging of 
>> the Python dependencies works as expected now.
>> 
>> Thanks,
>> Anand
>> 
>> On Sun, Nov 13, 2022 at 9:52 AM Chamikara Jayalath via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version 2.43.0, 
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if
>>> no issues are found.
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.43.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and openjdk version 
>>> 1.8.0_181-google-v7.
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev 
>>>  [9]
>>> * Validation sheet with a tab for 2.43.0 release to help with validation 
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> 
>>> The vote will be open for at least 72 hours. It is adopted by majority 
>>> approval, with at least 3 PMC affirmative votes.
>>> 
>>> For guidelines on how to try the release in your projects, check out our 
>>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> 
>>> Thanks,
>>> Cham
>>> 
>>> [1] https://github.com/apache/beam/milestone/5
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1288/
>>> [5] https://github.com/apache/beam/tree/v2.43.0-RC2
>>> [6] https://github.com/apache/beam/pull/24044
>>> [7] https://github.com/apache/beam-site/pull/636
>>> [8] https://pypi.org/project/apache-beam/2.43.0rc2/
>>> [9] 
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
>>> [10] 
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image



Re: bhulette stepping back (for now)

2022-11-14 Thread Alexey Romanenko
Hey Brian,

Many thanks for your contributions! Good luck with your new adventure!

—
Alexey


> On 12 Nov 2022, at 20:47, Chamikara Jayalath via dev  
> wrote:
> 
> Good luck with your next endeavor Brian! Thanks for all the contributions to 
> Beam (and hopefully more in the future when you have time :-) )
> 
> - Cham
> 
> On Fri, Nov 11, 2022 at 10:47 PM Moritz Mack  > wrote:
>> Also, thanks so much for all the great and through reviews! That was always 
>> much appreciated!
>> 
>> All the best, Brian
>> 
>>  
>> 
>> On 11.11.22, 23:23, "Ahmet Altay via dev" > > wrote:
>> 
>>  
>> 
>> Thank you for everything Brian! On Fri, Nov 11, 2022 at 11: 27 AM Austin 
>> Bennett  wrote: Thanks for everything you've done, @ 
>> Bhulette@ apache. org!   On Fri, Nov 11, 2022 at 11: 01 AM Pablo Estrada via 
>> dev 
>> 
>> Thank you for everything Brian!
>> 
>>  
>> 
>> On Fri, Nov 11, 2022 at 11:27 AM Austin Bennett > > wrote:
>> 
>> Thanks for everything you've done, @bhule...@apache.org 
>> !  
>> 
>>  
>> 
>> On Fri, Nov 11, 2022 at 11:01 AM Pablo Estrada via dev > > wrote:
>> 
>> I promised I wouldn't cry so I won't. Cya!
>> 
>>  
>> 
>> On Fri, Nov 11, 2022 at 10:46 AM Robin Qiu via dev > > wrote:
>> 
>> Thanks for your contribution Brian! Hope you enjoy your new team!
>> 
>>  
>> 
>> Best,
>> 
>> Robin
>> 
>>  
>> 
>> On Fri, Nov 11, 2022 at 10:27 AM Kenneth Knowles > > wrote:
>> 
>> Your contributions have been huge. You will be missed! But have a fabulous 
>> time with BigQuery. And thank you so much for letting us know [1]
>> 
>>  
>> 
>> Kenn
>> 
>>  
>> 
>> [1] See "stepping down considerately" from 
>> https://www.apache.org/foundation/policies/conduct.html 
>> 
>>  
>> 
>> On Thu, Nov 10, 2022 at 4:00 PM Brian Hulette > > wrote:
>> 
>> Hi dev@beam,
>> 
>>  
>> 
>> I just wanted to let the community know that I will be stepping back from 
>> Beam development for now. I'm switching to a different team within Google 
>> next week - I will be working on BigQuery.
>> 
>>  
>> 
>> I'm removing myself from automated code review assignments [1], and won't 
>> actively monitor the beam lists anymore. That being said, I'm happy to 
>> contribute to discussions or code reviews when it would be particularly 
>> helpful, e.g. for anything relating to DataFrames/Schemas/SQL. I can always 
>> be reached at bhule...@apache.org , and 
>> @TheNeuralBit [2] on GitHub.
>> 
>>  
>> 
>> Brian
>> 
>>  
>> 
>> [1] https://github.com/apache/beam/pull/24108 
>> 
>> [2] https://github.com/TheNeuralBit 
>> 
>> As a recipient of an email from Talend, your contact personal data will be 
>> on our systems. Please see our privacy notice. 
>> 
>> 



Re: [ANNOUNCE] New committer: Yi Hu

2022-11-10 Thread Alexey Romanenko
Congratulations! Well deserved!

—
Alexey

> On 9 Nov 2022, at 21:01, Tomo Suzuki via dev  wrote:
> 
> Congratulations!
> 
> On Wed, Nov 9, 2022 at 3:00 PM John Casey via dev  > wrote:
>> Congrats! this is well deserved YI
>> 
>> On Wed, Nov 9, 2022 at 2:58 PM Austin Bennett > > wrote:
>>> Congrats, and Thanks, Yi!  
>>> 
>>> On Wed, Nov 9, 2022 at 11:24 AM Valentyn Tymofieiev via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 I am with the Beam PMC on this, congratulations and very well deserved, Yi!
 
 On Wed, Nov 9, 2022 at 11:08 AM Byron Ellis via dev >>> > wrote:
> Congratulations!
> 
> On Wed, Nov 9, 2022 at 11:00 AM Pablo Estrada via dev 
> mailto:dev@beam.apache.org>> wrote:
>> +1 thanks Yi : D
>> 
>> On Wed, Nov 9, 2022 at 10:47 AM Danny McCormick via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Congrats Yi! I've really appreciated the ways you've consistently taken 
>>> responsibility for improving our team's infra and working through sharp 
>>> edges in the codebase that others have ignored. This is definitely well 
>>> deserved!
>>> 
>>> Thanks,
>>> Danny
>>> 
>>> On Wed, Nov 9, 2022 at 1:37 PM Anand Inguva via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 Congratulations Yi!
 
 On Wed, Nov 9, 2022 at 1:35 PM Ritesh Ghorse via dev 
 mailto:dev@beam.apache.org>> wrote:
> Congratulations Yi!
> 
> On Wed, Nov 9, 2022 at 1:34 PM Ahmed Abualsaud via dev 
> mailto:dev@beam.apache.org>> wrote:
>> Congrats Yi!
>> 
>> On Wed, Nov 9, 2022 at 1:33 PM Sachin Agarwal via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Congratulations Yi!
>>> 
>>> On Wed, Nov 9, 2022 at 10:32 AM Kenneth Knowles >> > wrote:
 Hi all,
 
 Please join me and the rest of the Beam PMC in welcoming a new 
 committer: Yi Hu (y...@apache.org )
 
 Yi started contributing to Beam in early 2022. Yi's contributions 
 are very diverse! I/Os, performance tests, Jenkins, support for 
 Schema logical types. Not only code but a very large amount of 
 code review. Yi is also noted for picking up smaller issues that 
 normally would be left on the backburner and filing issues that he 
 finds rather than ignoring them.
 
 Considering their contributions to the project over this 
 timeframe, the Beam PMC trusts Yi with the responsibilities of a 
 Beam committer. [1]
 
 Thank you Yi! And we are looking to see more of your contributions!
 
 Kenn, on behalf of the Apache Beam PMC
 
 [1]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> 
> 
> -- 
> Regards,
> Tomo



Re: [DISCUSS] Avro dependency update, design doc

2022-11-10 Thread Alexey Romanenko
Personally, I think that keeping two mostly identical versions of Avro-related 
code in two different places (“core" and "extension") is rathe bad practice, 
especially, in case of need to fix some issues there - though, it’s a very low 
risk there since this code is quite mature and it’s not touched often. On the 
other hand, it should give time for users (several Beam releases) to update 
their code and use Avro from extension artifact instead of core.

Though, if we accept that this breaking change at compile time is allowable, 
then this process of transition should be much faster and can be performed 
within only one Beam release. Our main concern here is runtime breaking changes 
that we can miss but must be avoided by all means. 

—
Alexey

> On 9 Nov 2022, at 18:47, Austin Bennett  wrote:
> 
> Being tied to a specific version of a dependency, and esp. one that is 
> not-[actually-long-term]critical, sounds like a problem.  It doesn't seem 
> like Avro needs to be in core.  I am in favor of about any path someone wants 
> to address towards removing that from core [ #2 in the design doc seems 
> reasonable ].  
> 
> Naturally, having ways to more easily change versions [esp. to remediate 
> CVEs, but for any specific reason ], seems very valuable.
> 
> It reads as a significant problem; I wouldn't take issue with a breaking [ 
> compile time ] change, if that got things addressed and somewhat 
> straightforwardly - I am concerned of what could go wrong for users in the 
> in-between/transition state while more slowly transitioning avro to extension.
> 
> On Wed, Nov 9, 2022 at 5:43 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
>> Any thoughts on this? For now, we'd need to decide which path finally to 
>> take to move forward.
>> 
>> Thanks in advance!
>> 
>> —
>> Alexey
>> 
>>> On 4 Nov 2022, at 16:44, Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>> 
>>> Hi all,
>>> 
>>> Following-up an Avro dependency update discussion [1] that showed a lot of 
>>> uncertainties to move forward, Moritz and I decided to create a design 
>>> document [2] with potential options, that we believe, can be considered and 
>>> used further. Unfortunately, all solutions lead to breaking changes in some 
>>> way, though, for some of them the negative effect can be reduced by 
>>> preparing users for this in advance and make this transition smoother.
>>> 
>>> Please, take a look on this doc and leave your comments and opinions - your 
>>> feedback is very welcomed!
>>> 
>>> [1] https://lists.apache.org/thread/mz8hvz8dwhd0tzmv2lyobhlz7gtg4gq7
>>> [2] 
>>> https://docs.google.com/document/d/1tKIyTk_-HhkmVuJsxvWP5eTELESpCBe_Vmb1nJ3Ia34/edit?usp=sharing
>>> 
>>> —
>>> Alexey
>> 



Re: [DISCUSS] Avro dependency update, design doc

2022-11-09 Thread Alexey Romanenko
Any thoughts on this? For now, we'd need to decide which path finally to take 
to move forward.

Thanks in advance!

—
Alexey

> On 4 Nov 2022, at 16:44, Alexey Romanenko  wrote:
> 
> Hi all,
> 
> Following-up an Avro dependency update discussion [1] that showed a lot of 
> uncertainties to move forward, Moritz and I decided to create a design 
> document [2] with potential options, that we believe, can be considered and 
> used further. Unfortunately, all solutions lead to breaking changes in some 
> way, though, for some of them the negative effect can be reduced by preparing 
> users for this in advance and make this transition smoother.
> 
> Please, take a look on this doc and leave your comments and opinions - your 
> feedback is very welcomed!
> 
> [1] https://lists.apache.org/thread/mz8hvz8dwhd0tzmv2lyobhlz7gtg4gq7 
> <https://lists.apache.org/thread/mz8hvz8dwhd0tzmv2lyobhlz7gtg4gq7>[2] 
> https://docs.google.com/document/d/1tKIyTk_-HhkmVuJsxvWP5eTELESpCBe_Vmb1nJ3Ia34/edit?usp=sharing
>  
> <https://docs.google.com/document/d/1tKIyTk_-HhkmVuJsxvWP5eTELESpCBe_Vmb1nJ3Ia34/edit?usp=sharing>
> 
> —
> Alexey



Re: [VOTE] Release 2.43.0, release candidate #1

2022-11-09 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
 
(Java SDK v8 & v11, Spark 3 runner).

---
Alexey

> On 9 Nov 2022, at 01:38, Chamikara Jayalath via dev  
> wrote:
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 2.43.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if
> no issues are found.
> 
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.43.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.5.1 and openjdk version 
> 1.8.0_181-google-v7.
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev 
>  [9]
> * Validation sheet with a tab for 2.43.0 release to help with validation [10].
> * Docker images published to Docker Hub [11].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/ 
> .
> 
> Thanks,
> Cham
> 
> [1] https://github.com/apache/beam/milestone/5 
> 
> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1287/ 
> 
> [5] https://github.com/apache/beam/tree/v2.43.0-RC1 
> 
> [6] https://github.com/apache/beam/pull/24044 
> 
> [7] https://github.com/apache/beam-site/pull/635 
> 
> [8] https://pypi.org/project/apache-beam/2.43.0rc1/ 
> 
> [9] https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC1/go/pkg/beam 
> 
> [10] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>  
> 
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image 
> 



[DISCUSS] Avro dependency update, design doc

2022-11-04 Thread Alexey Romanenko
Hi all,

Following-up an Avro dependency update discussion [1] that showed a lot of 
uncertainties to move forward, Moritz and I decided to create a design document 
[2] with potential options, that we believe, can be considered and used 
further. Unfortunately, all solutions lead to breaking changes in some way, 
though, for some of them the negative effect can be reduced by preparing users 
for this in advance and make this transition smoother.

Please, take a look on this doc and leave your comments and opinions - your 
feedback is very welcomed!

[1] https://lists.apache.org/thread/mz8hvz8dwhd0tzmv2lyobhlz7gtg4gq7
[2] 
https://docs.google.com/document/d/1tKIyTk_-HhkmVuJsxvWP5eTELESpCBe_Vmb1nJ3Ia34/edit?usp=sharing
 


—
Alexey

Re: [PROPOSAL] Re-enable checkerframework by default

2022-10-21 Thread Alexey Romanenko
+1 to make it “on" by default with mentioning that on Contribution Guide.

I recall for one PR that it took me some time to realise why it was failing on 
Jenkins and not locally because of this different behaviour. 

—
Alexey

> On 20 Oct 2022, at 00:51, Kenneth Knowles  wrote:
> 
> Hi all,
> 
> Some time ago we turned off checker framework locally by default, and only 
> turn it on with `-PenableCheckerFramework` and also on Jenkins.
> 
> My opinion is that this causes more headache than it solves, by delaying 
> finding out about errors. The increased compilation time of checkerframework 
> is real. But during iteration almost every step of a compile is cached so it 
> only matters specifically for :sdks:java:core. My take is that anyone editing 
> that is probably experienced enough with Beam to know they can turn it off. 
> So I propose we turn it on by default, with the option to disable it.
> 
> Kenn



Re: [VOTE] Release 2.42.0, release candidate #2

2022-10-14 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
 
(Java SDK v8 & v11, Spark 3 runner).

---
Alexey

> On 14 Oct 2022, at 05:17, Ahmet Altay via dev  wrote:
> 
> +1 (binding)
> 
> Tested python quickstart examples on the direct runner. Thank you!
> 
> On Thu, Oct 13, 2022 at 5:35 PM Robert Bradshaw via dev  > wrote:
> +1 (binding)
> 
> Validated release artifacts and signatures. Tested a Python pipeline
> on a clean install.
> 
> On Thu, Oct 13, 2022 at 1:22 PM Ritesh Ghorse via dev
> mailto:dev@beam.apache.org>> wrote:
> >
> > +1 (non-binding)
> > Validated Go SDK Quickstart on Direct and Dataflow runner.
> >
> > Thanks,
> > Ritesh Ghorse
> >
> > On Thu, Oct 13, 2022 at 4:01 PM Pablo Estrada via dev  > > wrote:
> >>
> >> +1 (binding)
> >>
> >> I've validated local/unit tests for existing dataflow templates. They look 
> >> good!
> >> Best
> >> -P.
> >>
> >> On Thu, Oct 13, 2022 at 10:41 AM Ning Kang via dev  >> > wrote:
> >>>
> >>> +1 Thank you, Robert!
> >>>
> >>> On Thu, Oct 13, 2022 at 12:47 AM Robert Burke  >>> > wrote:
> 
>  Hi everyone,
>  Please review and vote on the release candidate #2 for the version 
>  2.42.0, as follows:
>  [ ] +1, Approve the release
>  [ ] -1, Do not approve the release (please provide specific comments)
> 
>  Reviewers are encouraged to test their own use cases with the release 
>  candidate, and vote +1 if no issues are found.
> 
>  The complete staging area is available for your review, which includes:
>  * GitHub Release notes [1],
>  * the official Apache source release to be deployed to dist.apache.org 
>   [2], which is signed with the key with 
>  fingerprint A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
>  * all artifacts to be deployed to the Maven Central Repository [4],
>  * source code tag "v2.42.0-RC2" [5],
>  * website pull request listing the release [6], the blog post [6], and 
>  publishing the API reference manual [7].
>  * Java artifacts were built with Gradle 7.5.1 and AdoptOpen JDK 
>  1.8.0_292.
>  * Python artifacts are deployed along with the source release to the 
>  dist.apache.org  [2] and PyPI [8]
>  * Go Package information and SDK RC [9]
>  * Validation sheet with a tab for 2.42.0 release to help with validation 
>  [10].
>  * Docker images published to Docker Hub [11]. (Soon)
> 
>  The vote will be open for at least 72 hours. It is adopted by majority 
>  approval, with at least 3 PMC affirmative votes.
> 
>  Updates from RC1 include a fix to SpannerIO backlog estimation [12] and 
>  a fix to the BigQueryIO interpretation of coders on an internal flatten 
>  [13]. Otherwise, previous validation should be unaffected.
> 
>  For guidelines on how to try the release in your projects, check out our 
>  blog post at https://beam.apache.org/blog/validate-beam-release/ 
>  .
> 
>  Thanks,
>  Robert Burke
>  2.42.0 Release Manager
> 
>  [1] https://github.com/apache/beam/milestone/4 
>  
>  [2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/ 
>  
>  [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
>  
>  [4] 
>  https://repository.apache.org/content/repositories/orgapachebeam-1286/ 
>  
>  [5] https://github.com/apache/beam/tree/v2.42.0-RC2 
>  
>  [6] https://github.com/apache/beam/pull/23406 
>  
>  [7] https://github.com/apache/beam-site/pull/634 
>  
>  [8] https://pypi.org/project/apache-beam/2.42.0rc2/ 
>  
>  [9] 
>  https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC2/go/pkg/beam
>   
>  
>  [10] 
>  https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
>   
>  
>  [11] https://hub.docker.com/search?q=apache%2Fbeam=image 
>  
>  [12] https://github.com/apache/beam/issues/23494 
>  
>  [13] https://github.com/apache/beam/issues/23561 
>  

Re: Issue with website update and Jenkins

2022-10-10 Thread Alexey Romanenko
The issue is resolved.
Thank you to everybody who made this working again!

---
Alexey

> On 7 Oct 2022, at 15:26, Alexey Romanenko  wrote:
> 
> Hi everybody,
> 
> The is an issue with updating a content on Beam website. I believe it’s 
> caused by not running the Jenkins job that publishes the Beam website into 
> the asf-site branch since 30 Sep 2022 [1] with an error message “There are no 
> nodes with the label ‘git-websites’”:
> - https://ci-beam.apache.org/job/beam_PostCommit_Website_Publish/ 
> <https://ci-beam.apache.org/job/beam_PostCommit_Website_Publish/>
> 
> In it’s order, iinm, it’s caused by the fact that several 
> “apache-beam-jenkins-*” nodes are offline.
> 
> Does anyone aware of this problem and work on this? To whom we need to 
> address such problems, INFRA?
> 
> How we can prevent this silent behaviour in the future? Beam website 
> publishing job (“beam_PostCommit_Website_Publish”) just got stuck and hangs 
> for a while.
> 
> —
> Alexey
> 



Issue with website update and Jenkins

2022-10-07 Thread Alexey Romanenko
Hi everybody,

The is an issue with updating a content on Beam website. I believe it’s caused 
by not running the Jenkins job that publishes the Beam website into the 
asf-site branch since 30 Sep 2022 [1] with an error message “There are no nodes 
with the label ‘git-websites’”:
- https://ci-beam.apache.org/job/beam_PostCommit_Website_Publish/ 


In it’s order, iinm, it’s caused by the fact that several 
“apache-beam-jenkins-*” nodes are offline.

Does anyone aware of this problem and work on this? To whom we need to address 
such problems, INFRA?

How we can prevent this silent behaviour in the future? Beam website publishing 
job (“beam_PostCommit_Website_Publish”) just got stuck and hangs for a while.

—
Alexey



Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-06 Thread Alexey Romanenko
Is it a regression or just recently discovered?

—
Alexey

> On 6 Oct 2022, at 01:13, Eike Falkenberg via dev  wrote:
> 
> Hey everyone,
> 
> We have gotten customer feedback about by this issue 
>  and provided a PR that is in 
> review .
> Because of the significance of the user/customer impact, I would like to get 
> this cherry picked into the 2.42.0 release, is that still possible?
> 
> Thanks!
> 
> Eike
> 
> 
> 
> 
> 
> On 2022/09/30 01:12:11 Robert Burke via dev wrote:
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version 2.42.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> > 
> > Reviewers are encouraged to test their own use cases with the release
> > candidate, and vote +1 if no issues are found.
> > 
> > The complete staging area is available for your review, which includes:
> > * GitHub Release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org 
> >  [2],
> > which is signed with the key with fingerprint
> > A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.42.0-RC1" [5],
> > * website pull request listing the release [6], the blog post [6], and
> > publishing the API reference manual [7].
> > * Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle
> > JDK JDK_VERSION.
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org  [2] and PyPI [8]
> > * Go Package information and SDK RC  [9]
> > * Validation sheet with a tab for 2.42.0 release to help with validation
> > [10].
> > * Docker images published to Docker Hub [11].
> > 
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> > 
> > For guidelines on how to try the release in your projects, check out our
> > blog post at https://beam.apache.org/blog/validate-beam-release/ 
> > .
> > 
> > Thanks,
> > Robert Burke
> > 2.42.0 Release Manager
> > 
> > [1] https://github.com/apache/beam/milestone/4 
> > 
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/ 
> > 
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> > 
> > [4] https://repository.apache.org/content/repositories/orgapachebeam-1285/ 
> > 
> > [5] https://github.com/apache/beam/tree/v2.42.0-RC1 
> > 
> > [6] https://github.com/apache/beam/pull/23406 
> > 
> > [7] https://github.com/apache/beam-site/pull/634 
> > 
> > [8] https://pypi.org/project/apache-beam/2.42.0rc1/ 
> > 
> > [9]
> > https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC1/go/pkg/beam 
> > 
> > [10]
> > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
> >  
> > 
> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image 
> > 
> >



Re: Beam Website Feedback

2022-10-04 Thread Alexey Romanenko
Thanks for your feedback. 

At the time, using a Google website search was a simplest solution since, 
before, we didn’t have a search at all. I agree that it could be frustrating to 
have ad links before the actual results (not sure that we can avoid them there) 
but "it is what it is” and it's still possible to have the correct links 
further which is better than nothing. 

Beam community is always welcome for suggestions and, especially, contributions 
to improve the project in any possible way. I’d be happy to assist on this 
topic if someone will decide to improve Beam website search.

—
Alexey

> On 3 Oct 2022, at 23:21, Borris  wrote:
> 
> This is my experience of trying the search capability.
> 
> I know I want to read about dataframes (I was reading this 10 minutes ago but 
> browsing history didn't take me back to where I wanted)
> I search for "dataframes"
> I am presented with a whole load of pages that are elsewhere (other sites) - 
> maybe what I want is some pages below, but I stop at this point as I think 
> its a fundamental failure of what I expect from the search dialogue
> If I enter "beam.apache.org: dataframe" to the search dialogue then the 
> sensible relevant page is now visible, only 5 links down
> I know this may be a penalty of getting a "free" search service from your 
> viewpoint
> But from my viewpoint this is a failure. Your search capability fails to 
> understand that by searching for something on your site, rather than 
> generically through a search engine, I am massively predisposed to the pages 
> on your site, whereas the search results are more predisposed to offering 
> advertising opportunities.
> It is very frustrating that something as simple as, on the Beam site, going 
> to the page about Beam Dataframes takes such a level of hoop jumping
> That is my feedback offering. Thank you for taking the time to read it.
> 
> 
> 
> 
> 



Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-03 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
 
(Java SDK v8 & v11, Spark 3 runner).

---
Alexey

> On 3 Oct 2022, at 14:32, Chamikara Jayalath via dev  > wrote:
> 
> +1 (binding)
> 
> Verified checksums and signatures of artifacts.
> Validated some multi-language pipelines.
> 
> Thanks,
> Cham
> 
> On Thu, Sep 29, 2022 at 6:12 PM Robert Burke via dev  > wrote:
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 2.42.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if no issues are found.
> 
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.42.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle JDK 
> JDK_VERSION.
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2] and PyPI [8] 
> * Go Package information and SDK RC  [9]
> * Validation sheet with a tab for 2.42.0 release to help with validation [10].
> * Docker images published to Docker Hub [11].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/ 
> .
> 
> Thanks,
> Robert Burke
> 2.42.0 Release Manager
> 
> [1] https://github.com/apache/beam/milestone/4 
> 
> [2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1285/ 
> 
> [5] https://github.com/apache/beam/tree/v2.42.0-RC1 
> 
> [6] https://github.com/apache/beam/pull/23406 
> 
> [7] https://github.com/apache/beam-site/pull/634 
> 
> [8] https://pypi.org/project/apache-beam/2.42.0rc1/ 
> 
> [9] https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC1/go/pkg/beam 
>  
> [10] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
>  
> 
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image 
> 
> 



Re: Performance and Cost benchmarking

2022-09-27 Thread Alexey Romanenko
Thanks for raising this topic.

> On 26 Sep 2022, at 23:32, Andrew Pilloud via dev  wrote:
> 
> I left some comments on your design. Your doc discusses a bunch of
> details about infrastructure such as testing frameworks, automation,
> and performance databases, but doesn't describe how it will fit in
> with our existing infrastructure (Load Tests, Nexmark, Jenkins,
> InfluxDB, Grafina). I would suspect we actually have most of the
> infrastructure already built?

Right, I’m second on this question. We already have an infrastructure ready to 
run a bunch of different benchmarks/load tests and collect/present/analyse the 
results. Of course, there is a field for improvements, but it would be great to 
take this into account and add the details how this benchmark can be integrated 
into (to avoid a double job for further support).


—
Alexey

> On Mon, Sep 26, 2022 at 9:07 AM Pranav Bhandari
>  wrote:
>> 
>> Hello,
>> 
>> Hope this email finds you well. I have attached a link to a doc which 
>> discusses the design for a performance and cost benchmarking framework to be 
>> used by Beam IOs and Google-provided dataflow templates.
>> 
>> Please feel free to comment on the doc with any questions, concerns or ideas 
>> you might have.
>> 
>> Thank you,
>> Pranav Bhandari
>> 
>> 
>> https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing=102139643796739130048=true=true



[ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Alexey Romanenko
Hi everybody,

As some of you may know, at Talend, we’ve been working for a while to add 
TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part of 
Beam testing workflow and release routine will help a community to detect 
quickly the performance regressions or improvements, identify missing or 
incorrect Beam SQL features and execute Beam SQL on different runtime 
environments with different runners. 

What is TPC-DS? From TPC-DS specification document [1]:

“TPC-DS is a decision support benchmark that models several generally 
applicable aspects of a decision support system, including queries and data 
maintenance. The benchmark provides a representative evaluation of performance 
as a general purpose decision support system.” 

TPC-DS benchmark suite for Beam is implemented as a separate testing tool for 
Java SDK (like well known Nexmark benchmark suite) [2]. It supports a limited 
number of TPC-DS SQL queries for now (mostly because of limited SQL syntax 
support in Beam), CSV and Parquet as input data format, and it runs on Jenkins 
with three most popular Beam runners (Spark [3], Flink [4], Dataflow [5]). The 
job metrics are stored in InfluxDB and can be accessed though Grafana 
dashboards [6][7][8]. 

More details can be found in Beam documentation [9].

For sure, there are still plenty things to do, like adding new runners, support 
of other SDKs, data formats, etc - so, your contributions are very welcomed in 
any form. Though, at least for now, we already have a first working and 
automated version that can be used by community. 

Also, I’d like to thank everybody who worked on this improvement!

—
Alexey


[1] 
https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp 

[2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds 

[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/ 

[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/ 

[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/ 

[6] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 

[7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1 

[8] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 

[9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/ 








Re: Beam Dependency Check Report (2022-09-15)

2022-09-16 Thread Alexey Romanenko
Is it a bug that this email is empty? 

> On 15 Sep 2022, at 19:40, Apache Jenkins Server  
> wrote:
> 



Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-14 Thread Alexey Romanenko
Ahh, great! I didn’t know that 'beam-perf’ label is used for that. 
Thanks!

> On 14 Sep 2022, at 17:47, Andrew Pilloud  wrote:
> 
> We do have a dedicated machine for benchmarks. This is a single
> machine limited to running one test at a time. Set the
> jenkinsExecutorLabel for the job to 'beam-perf' to use it. For
> example:
> https://github.com/apache/beam/blob/66bbee84ed477d86008905646e68b100591b6f78/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Direct.groovy#L36
> 
> Andrew
> 
> On Wed, Sep 14, 2022 at 8:28 AM Alexey Romanenko
>  wrote:
>> 
>> I think it depends on the goal why to run that benchmarks. In ideal case, we 
>> need to run them on the same dedicated machine(s) and with the same 
>> configuration all the time but I’m not sure that it can be achieved in 
>> current infrastructure reality.
>> 
>> On the other hand, IIRC, the initial goal of benchmarks, like Nexmark, was 
>> to detect fast any major regressions, especially between releases, that are 
>> not so sensitive to ideal conditions. And here we a field for improvements.
>> 
>> —
>> Alexey
>> 
>> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
>> 
>> Good idea. I'm curious about our current benchmarks. Some of them run on 
>> clusters, but I think some of them are running locally and just being noisy. 
>> Perhaps this could improve that. (or if they are running on local 
>> Spark/Flink then maybe the results are not really meaningful anyhow)
>> 
>> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  wrote:
>>> 
>>> Hi team,
>>> 
>>> 
>>> 
>>> I’m looking for some help to setup infrastructure to periodically run Java 
>>> microbenchmarks (JMH).
>>> 
>>> Results of these runs will be added to our community metrics (InfluxDB) to 
>>> help us track performance, see [1].
>>> 
>>> 
>>> 
>>> To prevent noisy runs this would require a dedicated Jenkins machine that 
>>> runs at most one job (benchmark) at a time. Benchmark runs take quite some 
>>> time, but on the other hand they don’t have to run very frequently (once a 
>>> week should be fine initially).
>>> 
>>> 
>>> 
>>> Thanks so much,
>>> 
>>> Moritz
>>> 
>>> 
>>> 
>>> [1] https://github.com/apache/beam/pull/23041
>>> 
>>> As a recipient of an email from Talend, your contact personal data will be 
>>> on our systems. Please see our privacy notice.
>>> 
>>> 
>>> 
>> 



Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-14 Thread Alexey Romanenko
I think it depends on the goal why to run that benchmarks. In ideal case, we 
need to run them on the same dedicated machine(s) and with the same 
configuration all the time but I’m not sure that it can be achieved in current 
infrastructure reality. 

On the other hand, IIRC, the initial goal of benchmarks, like Nexmark, was to 
detect fast any major regressions, especially between releases, that are not so 
sensitive to ideal conditions. And here we a field for improvements.

—
Alexey

> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
> 
> Good idea. I'm curious about our current benchmarks. Some of them run on 
> clusters, but I think some of them are running locally and just being noisy. 
> Perhaps this could improve that. (or if they are running on local Spark/Flink 
> then maybe the results are not really meaningful anyhow)
> 
> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  > wrote:
> Hi team,
> 
>  
> 
> I’m looking for some help to setup infrastructure to periodically run Java 
> microbenchmarks (JMH).
> 
> Results of these runs will be added to our community metrics (InfluxDB) to 
> help us track performance, see [1]. 
> 
>  
> 
> To prevent noisy runs this would require a dedicated Jenkins machine that 
> runs at most one job (benchmark) at a time. Benchmark runs take quite some 
> time, but on the other hand they don’t have to run very frequently (once a 
> week should be fine initially).
> 
>  
> 
> Thanks so much,
> 
> Moritz
> 
>  
> 
> [1] https://github.com/apache/beam/pull/23041 
> 
> As a recipient of an email from Talend, your contact personal data will be on 
> our systems. Please see our privacy notice. 
> 



Re: [VOTE] Release 2.41.0, release candidate #2

2022-08-19 Thread Alexey Romanenko
+1 (binding)

I tested it with  https://github.com/Talend/beam-samples/ 
 - no byte_buddy issue for now.
(Java SDK v8 & v11, Spark 3 runner).

---
Alexey

> On 19 Aug 2022, at 10:40, Jan Lukavský  wrote:
> 
> +1 (non-binding)
> 
> Validated Java SDK PIpelines on Flink Runner.
> 
>  Jan
> 
> On 8/18/22 22:31, Kiley Sok via dev wrote:
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version 2.41.0, 
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> 
>> Reviewers are encouraged to test their own use cases with the release 
>> candidate, and vote +1 if no issues are found.
>> 
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org 
>>  [2], which is signed with the key with fingerprint 
>> 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.41.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and 
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK 1.8.0_312.
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org  [2] and PyPI[8].
>> * Validation sheet with a tab for 2.41.0 release to help with validation [9].
>> * Docker images published to Docker Hub [10].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> For guidelines on how to try the release in your projects, check out our 
>> blog post at https://beam.apache.org/blog/validate-beam-release/ 
>> .
>> 
>> Thanks,
>> Release Manager
>> 
>> [1] https://github.com/apache/beam/milestone/3 
>> 
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.41.0/ 
>> 
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
>> 
>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1283/ 
>> 
>> [5] https://github.com/apache/beam/tree/v2.41.0-RC2 
>> 
>> [6] https://github.com/apache/beam/pull/22706 
>> 
>> [7] https://github.com/apache/beam-site/pull/633 
>> 
>> [8] https://pypi.org/project/apache-beam/2.41.0rc2/ 
>> 
>> [9] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=331459080
>>  
>> 
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image 
>> 



Re: [VOTE] Release 2.41.0, release candidate #1

2022-08-18 Thread Alexey Romanenko
Just for reference: it looks that it happens only with default package classes 
[1].

[1] https://github.com/raphw/byte-buddy/issues/1301#issuecomment-1218475514


> On 18 Aug 2022, at 12:40, Alexey Romanenko  wrote:
> 
> 
>> On 17 Aug 2022, at 22:52, Kenneth Knowles > <mailto:k...@apache.org>> wrote:
>> 
>> Seems like there has been a lot of progress on 
>> https://github.com/raphw/byte-buddy/issues/1301 
>> <https://github.com/raphw/byte-buddy/issues/1301>. Since it has been 
>> identified, I think we can be pretty confident that the version downgrade is 
>> the necessary part. So we can revert the PR for the release, then on main 
>> branch we can proceed with unvendoring but keeping the same version.
> 
> +1 for this.
> 
>> BTW just noting for the thread that I also checked mvn dependency:tree of 
>> Talend/beam-samples and confirmed that only Beam depends on bytebuddy so it 
>> is not a dependency conflict.
> 
> Sorry, I forgot to mention that I'd checked this before I posted that issue 
> on dev@ (to make sure that it depends only on one version of byte_buddy).
> Thanks for highlighting this.
> 
>> I also googled the error message and found 
>> https://stackoverflow.com/questions/51650074/apache-beam-invisible-parameter-type-exception
>>  
>> <https://stackoverflow.com/questions/51650074/apache-beam-invisible-parameter-type-exception>
>>  and https://issues.apache.org/jira/browse/BEAM-5061 
>> <https://issues.apache.org/jira/browse/BEAM-5061> where using JDK 10 instead 
>> of 8 causes a similar symptom but it does not seem related. It was never 
>> directly addressed.
> 
> For me, it looks not very related since it’s about pretty old version of 
> byte_buddy and in “beam-samples” we test a build with OpenJDK v8, v11, and 
> even v17 and it fails in the same way for all of them.
> 
> —
> Alexey
> 
>> 
>> Kenn
>> 
>> On Wed, Aug 17, 2022 at 10:36 AM Kiley Sok > <mailto:kiley...@google.com>> wrote:
>> PR to revert the change for the release 
>> https://github.com/apache/beam/pull/22759 
>> <https://github.com/apache/beam/pull/22759>
>> 
>> I'll rebuild a new RC once the tests pass and the PR is merged.
>> 
>> On Wed, Aug 17, 2022 at 8:16 AM Liam Miller-Cushon > <mailto:cus...@google.com>> wrote:
>> On Tue, Aug 16, 2022 at 2:42 PM Kiley Sok > <mailto:kiley...@google.com>> wrote:
>> Liam, are we okay to roll back this change for this release?
>> 
>> No concerns from me with rolling back to unblock the release.
>> 
>> It looks like this is a change between bytebuddy 1.12.3 and 1.12.4. I filed 
>> https://github.com/raphw/byte-buddy/issues/1301 
>> <https://github.com/raphw/byte-buddy/issues/1301> to get help understanding 
>> what changed, it sounds like the change might be WAI but there's a suggested 
>> fix. I will prepare a PR for that as a follow-up.
> 



Re: [VOTE] Release 2.41.0, release candidate #1

2022-08-18 Thread Alexey Romanenko

> On 17 Aug 2022, at 22:52, Kenneth Knowles  wrote:
> 
> Seems like there has been a lot of progress on 
> https://github.com/raphw/byte-buddy/issues/1301 
> . Since it has been 
> identified, I think we can be pretty confident that the version downgrade is 
> the necessary part. So we can revert the PR for the release, then on main 
> branch we can proceed with unvendoring but keeping the same version.

+1 for this.

> BTW just noting for the thread that I also checked mvn dependency:tree of 
> Talend/beam-samples and confirmed that only Beam depends on bytebuddy so it 
> is not a dependency conflict.

Sorry, I forgot to mention that I'd checked this before I posted that issue on 
dev@ (to make sure that it depends only on one version of byte_buddy).
Thanks for highlighting this.

> I also googled the error message and found 
> https://stackoverflow.com/questions/51650074/apache-beam-invisible-parameter-type-exception
>  
> 
>  and https://issues.apache.org/jira/browse/BEAM-5061 
>  where using JDK 10 instead 
> of 8 causes a similar symptom but it does not seem related. It was never 
> directly addressed.

For me, it looks not very related since it’s about pretty old version of 
byte_buddy and in “beam-samples” we test a build with OpenJDK v8, v11, and even 
v17 and it fails in the same way for all of them.

—
Alexey

> 
> Kenn
> 
> On Wed, Aug 17, 2022 at 10:36 AM Kiley Sok  > wrote:
> PR to revert the change for the release 
> https://github.com/apache/beam/pull/22759 
> 
> 
> I'll rebuild a new RC once the tests pass and the PR is merged.
> 
> On Wed, Aug 17, 2022 at 8:16 AM Liam Miller-Cushon  > wrote:
> On Tue, Aug 16, 2022 at 2:42 PM Kiley Sok  > wrote:
> Liam, are we okay to roll back this change for this release?
> 
> No concerns from me with rolling back to unblock the release.
> 
> It looks like this is a change between bytebuddy 1.12.3 and 1.12.4. I filed 
> https://github.com/raphw/byte-buddy/issues/1301 
>  to get help understanding 
> what changed, it sounds like the change might be WAI but there's a suggested 
> fix. I will prepare a PR for that as a follow-up.



Re: [VOTE] Release 2.41.0, release candidate #1

2022-08-17 Thread Alexey Romanenko


> On 16 Aug 2022, at 23:23, Kenneth Knowles  wrote:
> 
> And as a follow up we should make sure there is some test that would exercise 
> this, since that PR was green and was a while ago too and our postcommits did 
> not catch it either.

I’m concerned that it was not caught with Beam tests - it has to fail with 
literally every Java Beam pipeline that contains, at least, new DoFn instance 
creation. Though, it fails only when we compile and run a pipeline outside of 
Beam codebase.

Does it mean that Beam tests for some reasons still use an old vendored 
byte-buddy version?

—
Alexey




> Kenn
> 
> On Tue, Aug 16, 2022 at 12:50 PM Kiley Sok via dev  <mailto:dev@beam.apache.org>> wrote:
> cc: @Liam Miller-Cushon <mailto:cus...@google.com>, who worked on the 
> bytebuddy update. 
> 
> Liam, do you have any context on this error?
> 
> On Tue, Aug 16, 2022 at 10:11 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
> I tested with "beam-samples" [1] and found that a rather simple test pipeline 
> fails [2] with this runtime error:
> 
> Error: 
>  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.776 s <<< 
> FAILURE! - in SerializationTest
> 
> 5809
> Error: 
>  SerializationTest.nonSerilizableTest  Time elapsed: 2.708 s  <<< ERROR!
> 
> 5810
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalStateException: Invisible parameter type of 
> SerializationTest$1 arg0 for public 
> SerializationTest$1$DoFnInvoker(SerializationTest$1)
> 
> 5811
> Caused by: java.lang.IllegalStateException: Invisible parameter type of 
> SerializationTest$1 arg0 for public 
> SerializationTest$1$DoFnInvoker(SerializationTest$1)
> 
> 
> Seems like that it’s caused by “bytebuddy” dependency update [3] from version 
> 1.11.0 to 1.12.9 and it was vendored before (not sure if it’s related).
> 
> Downgrading the “bytebuddy” version to 1.11.0 fixes an error. 
> 
> I’ve not yet gone deep into a cause of this problem but maybe someone knows 
> some details?
> 
> [1] https://github.com/Talend/beam-samples/ 
> <https://github.com/Talend/beam-samples/>
> [2] 
> https://github.com/Talend/beam-samples/runs/7856722514?check_suite_focus=true 
> <https://github.com/Talend/beam-samples/runs/7856722514?check_suite_focus=true>
>  
> [3] https://github.com/apache/beam/pull/17317 
> <https://github.com/apache/beam/pull/17317>
> 
> —
> Alexey
> 
>> On 16 Aug 2022, at 17:54, Ritesh Ghorse via dev > <mailto:dev@beam.apache.org>> wrote:
>> 
>> +1 (non-binding), Validated Go SDK Quickstart on Direct and Dataflow runner
>> 
>> 
>> On Tue, Aug 16, 2022 at 4:26 AM Jan Lukavský > <mailto:je...@seznam.cz>> wrote:
>> +1 (non-binding)
>> 
>> Validated Java SDK with classical Flink Runner.
>> 
>> On 8/15/22 23:06, Chamikara Jayalath via dev wrote:
>>> +1 as well
>>> (I believe Kiley is addressing the container tags issue)
>>> 
>>> Thanks,
>>> Cham
>>> 
>>> On Mon, Aug 15, 2022 at 1:00 PM Robert Bradshaw >> <mailto:rober...@google.com>> wrote:
>>> +1 (binding).
>>> 
>>> I verified the release artifacts and signatures, and ran a couple of
>>> simple Python pipelines.
>>> 
>>> On Mon, Aug 15, 2022 at 12:40 PM Chamikara Jayalath via dev
>>> mailto:dev@beam.apache.org>> wrote:
>>> >
>>> >
>>> >
>>> > On Mon, Aug 15, 2022 at 11:37 AM Kiley Sok >> > <mailto:kiley...@google.com>> wrote:
>>> >>
>>> >> Thanks everyone!
>>> >>
>>> >> @Chamikara Jayalath The Spark issue is running successfully for me, 
>>> >> could you try it again? I'll look into the container tags.
>>> >
>>> >
>>> > Thanks. Regarding the Spark issue, it could just be my setup then. Feel 
>>> > free to close the Github issue.
>>> >
>>> > - Cham
>>> >
>>> >>
>>> >>
>>> >> On Mon, Aug 15, 2022 at 11:04 AM Pablo Estrada >> >> <mailto:pabl...@google.com>> wrote:
>>> >>>
>>> >>> +1 - I validated tests/build with existing Dataflow Templates
>>> >>> Best
>>> >>> -P.
>>> >>>
>>> >>> On Fri, Aug 12, 2022 at 9:20 PM Ahmet Altay via dev 
>>> >>> mailto:dev@beam.apache.org>> wrote:
>>> >>>>
>

Re: Forward StackOverflow questions with the apache-beam tag to a new mailing list

2022-08-17 Thread Alexey Romanenko
Good point about unanswered SO questions. +1 that we need to improve a 
situation there.

Yes, we may try to stream them to a new dedicated list but it will require 
people here to subscribe to and check regularly one more list which perhaps 
won’t be so efficient as well. 

I believe that a digest of the N latest unanswered questions to dev@ (or to 
user@? or both?) every 3-4 days should be a better option.

—
Alexey

> On 17 Aug 2022, at 04:42, Chamikara Jayalath via dev  
> wrote:
> 
> Hi folks,
> 
> It seems like many of the questions posted to StackOverflow with the 
> apache-beam tag [1] go unanswered or take more than they should to receive an 
> acceptable answer.
> 
> What do you all think about creating a new mailing list, 
> stackoverf...@beam.apache.org  
> (assuming Apache Infra is OK with this and it's technically feasible to do 
> so), where notifications regarding such questions will be forwarded to ?
> 
> This should allow folks who are interested in answering related questions to 
> get notified early. Hopefully getting more eyeballs on these questions will 
> increase the response rate (and the quality of the answers).
> 
> Another option might be to post such notifications (or aggregations) to dev@ 
> but this might unnecessarily spam all members of the dev list.
> 
> Thanks,
> Cham
> 
> [1] https://stackoverflow.com/questions/tagged/apache-beam 
> 


Re: [VOTE] Release 2.41.0, release candidate #1

2022-08-16 Thread Alexey Romanenko
I tested with "beam-samples" [1] and found that a rather simple test pipeline 
fails [2] with this runtime error:

Error: 
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.776 s <<< 
FAILURE! - in SerializationTest

5809
Error: 
 SerializationTest.nonSerilizableTest  Time elapsed: 2.708 s  <<< ERROR!

5810
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
 java.lang.IllegalStateException: Invisible parameter type of 
SerializationTest$1 arg0 for public 
SerializationTest$1$DoFnInvoker(SerializationTest$1)

5811
Caused by: java.lang.IllegalStateException: Invisible parameter type of 
SerializationTest$1 arg0 for public 
SerializationTest$1$DoFnInvoker(SerializationTest$1)


Seems like that it’s caused by “bytebuddy” dependency update [3] from version 
1.11.0 to 1.12.9 and it was vendored before (not sure if it’s related).

Downgrading the “bytebuddy” version to 1.11.0 fixes an error. 

I’ve not yet gone deep into a cause of this problem but maybe someone knows 
some details?

[1] https://github.com/Talend/beam-samples/ 

[2] 
https://github.com/Talend/beam-samples/runs/7856722514?check_suite_focus=true 
[3] https://github.com/apache/beam/pull/17317 


—
Alexey

> On 16 Aug 2022, at 17:54, Ritesh Ghorse via dev  wrote:
> 
> +1 (non-binding), Validated Go SDK Quickstart on Direct and Dataflow runner
> 
> 
> On Tue, Aug 16, 2022 at 4:26 AM Jan Lukavský  > wrote:
> +1 (non-binding)
> 
> Validated Java SDK with classical Flink Runner.
> 
> On 8/15/22 23:06, Chamikara Jayalath via dev wrote:
>> +1 as well
>> (I believe Kiley is addressing the container tags issue)
>> 
>> Thanks,
>> Cham
>> 
>> On Mon, Aug 15, 2022 at 1:00 PM Robert Bradshaw > > wrote:
>> +1 (binding).
>> 
>> I verified the release artifacts and signatures, and ran a couple of
>> simple Python pipelines.
>> 
>> On Mon, Aug 15, 2022 at 12:40 PM Chamikara Jayalath via dev
>> mailto:dev@beam.apache.org>> wrote:
>> >
>> >
>> >
>> > On Mon, Aug 15, 2022 at 11:37 AM Kiley Sok > > > wrote:
>> >>
>> >> Thanks everyone!
>> >>
>> >> @Chamikara Jayalath The Spark issue is running successfully for me, could 
>> >> you try it again? I'll look into the container tags.
>> >
>> >
>> > Thanks. Regarding the Spark issue, it could just be my setup then. Feel 
>> > free to close the Github issue.
>> >
>> > - Cham
>> >
>> >>
>> >>
>> >> On Mon, Aug 15, 2022 at 11:04 AM Pablo Estrada > >> > wrote:
>> >>>
>> >>> +1 - I validated tests/build with existing Dataflow Templates
>> >>> Best
>> >>> -P.
>> >>>
>> >>> On Fri, Aug 12, 2022 at 9:20 PM Ahmet Altay via dev > >>> > wrote:
>> 
>>  +1 - I validated python quickstarts on direct runner.
>> 
>>  Thank you Kiley!
>> 
>> 
>> 
>>  On Thu, Aug 11, 2022 at 9:56 PM Kiley Sok via dev >  > wrote:
>> >
>> > Hi everyone,
>> > Please review and vote on the release candidate #1 for the version 
>> > 2.41.0, as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific comments)
>> >
>> >
>> > Reviewers are encouraged to test their own use cases with the release 
>> > candidate, and vote +1 if no issues are found.
>> >
>> > The complete staging area is available for your review, which includes:
>> > * GitHub Release notes [1],
>> > * the official Apache source release to be deployed to dist.apache.org 
>> >  [2], which is signed with the key with 
>> > fingerprint 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag "v2.41.0-RC1" [5],
>> > * website pull request listing the release [6], the blog post [6], and 
>> > publishing the API reference manual [7].
>> > * Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK 
>> > 1.8.0_232.
>> > * Python artifacts are deployed along with the source release to the 
>> > dist.apache.org  [2] and PyPI[8].
>> > * Validation sheet with a tab for 2.41.0 release to help with 
>> > validation [9].
>> > * Docker images published to Docker Hub [10].
>> >
>> > The vote will be open for at least 72 hours. It is adopted by majority 
>> > approval, with at least 3 PMC affirmative votes.
>> >
>> > For guidelines on how to try the release in your projects, check out 
>> > our blog post at https://beam.apache.org/blog/validate-beam-release/ 
>> > .
>> >
>> > Thanks,
>> > Release Manager
>> >
>> > [1] https://github.com/apache/beam/milestone/3 
>> > 

Re: Runner benchmarks in portable mode

2022-07-29 Thread Alexey Romanenko
AFAIK, in Beam, we don’t have any “official” benchmarks or load tests that are 
running with any of portable runners. It would be a great improvement if we 
would have them.

—
Alexey

> On 28 Jul 2022, at 19:30, Bharath Kumara Subramanian 
>  wrote:
> 
> Hi,
> 
> We are currently working on making beam portable mode mainstream in addition 
> to supporting classic mode for Samza runner.
> 
> I was looking at OSS benchmarks on how other runners performed in portable 
> mode in comparison with the classic mode. However, all I found was 
> performance numbers and metrics for various classic runners 
> .
> 
> Checking in to see if anyone in the community has benchmarked portable mode 
> numbers for their runners. 
> 
> Additionally, I found vanilla metrics around GRPC performance 
> , although I am 
> looking for pointers to get granular insights on E2E pipeline latency. e.g., 
> the time spent on network across stages vs serialization cost for GRPC vs 
> actual time spent executing the ParDO and so-on.
> 
> 
> Thanks,
> Bharath
> 



Re: [PROPOSAL] Stop Spark2 support in Spark Runner

2022-06-30 Thread Alexey Romanenko


> On 30 Jun 2022, at 07:53, Moritz Mack  wrote:
> 
> Thanks so much Luke!
> Just to confirm, given beam-runners-spark has a much longer history, it looks 
> like the numbers are swapped.

Yes, I guess it’ a typo there. Spark3 runner artifacts got to be distributed 
starting from Beam 2.29.0

> Looks like this – unfortunately - sheds a different light on this thread… 
> thoughts?

As I said privately, I believe that the deprecation of Spark2 runner should 
finally accelerate the process of migration to Spark3. Officially we still keep 
it for the next 3 Beam releases but we can extend this period if needed.

—
Alexey

>  
> On 29.06.22, 17:21, "Luke Cwik" mailto:lc...@google.com>> 
> wrote:
>  
> Here are the download stats for the last month listed by version and count. 
> beam-runners-spark 2.29.0 16 2.30.0 22 2.31.0 22 2.32.0 16 2.33.0 18 2.34.0 
> 17 2.35.0 47 2.36.0 160 2.37.0 17 2.38.0 114 2.39.0 23 All 472 
> beam-runners-spark-3 ‍ ‍ ‍ ‍
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> Exercise caution when opening attachments or clicking any links.
> ZjQcmQRYFpfptBannerEnd
> Here are the download stats for the last month listed by version and count.
>  
> beam-runners-spark
> 2.29.0 16
> 2.30.0 22
> 2.31.0 22
> 2.32.0 16
> 2.33.0 18
> 2.34.0 17
> 2.35.0 47
> 2.36.0 160
> 2.37.0 17
> 2.38.0 114
> 2.39.0 23
> All 472
>  
> beam-runners-spark-3
> 0.1.0-incubating 24
> 0.2.0-incubating 41
> 0.3.0-incubating 44
> 0.4.0 36
> 0.5.0 26
> 0.6.0 48
> 2.0.0 400
> 2.1.0 30
> 2.2.0 34
> 2.3.0 25
> 2.4.0 32
> 2.5.0 77
> 2.6.0 23
> 2.7.0 25
> 2.8.0 27
> 2.9.0 26
> 2.10.0 27
> 2.11.0 21
> 2.12.0 321
> 2.13.0 162
> 2.14.0 21
> 2.15.0 32
> 2.16.0 35
> 2.17.0 18
> 2.18.0 67
> 2.19.0 29
> 2.20.0 22
> 2.21.0 23
> 2.22.0 47
> 2.23.0 19
> 2.24.0 63
> 2.25.0 17
> 2.26.0 23
> 2.27.0 18
> 2.28.0 81
> 2.29.0 268
> 2.30.0 26
> 2.31.0 24
> 2.32.0 69
> 2.33.0 403
> 2.34.0 352
> 2.35.0 1543
> 2.36.0 50
> 2.37.0 19
> 2.38.0 420
> 2.39.0 86
> All 5224
>  
> On Wed, Jun 29, 2022 at 1:24 AM Moritz Mack  <mailto:mm...@talend.com>> wrote:
> Who could help pulling the latest Maven download stats for beam-runners-spark 
> and beam-runners-spark-3 for the last few Beam releases?
>  
> Thanks so much!
> / Moritz
>  
> On 01.04.22, 16:54, "Moritz Mack"  <mailto:mm...@talend.com>> wrote:
>  
> I just started looking into the Spark runner code a bit to helpfully help 
> supporting it. Besides having to maintain (test!) twice the number of 
> artifacts, there’s also a significant negative impact on developer ergonomics 
> / productivity supporting 
> I just started looking into the Spark runner code a bit to helpfully help 
> supporting it.
> Besides having to maintain (test!) twice the number of artifacts, there’s 
> also a significant negative impact on developer ergonomics / productivity 
> supporting multiple major versions (separate modules to deal with breaking 
> changes and all the trouble that comes with that).
>  
> Thanks, Alexey, for opening the discussion. Certainly a big +1 from my side.
>  
> / Moritz
>  
>  
>  
> From: Alexey Romanenko  <mailto:aromanenko@gmail.com>>
> Date: Thursday, 31. March 2022 at 18:51
> To: dev mailto:dev@beam.apache.org>>
> Subject: Re: [PROPOSAL] Stop Spark2 support in Spark Runner
> 
> !---|
>   This Message Is From an External Sender
>   This message came from outside your organization.
>   Exercise caution when opening attachments or clicking any
>   links.
> |---!
> 
> 
> > On 31 Mar 2022, at 18:02, Robert Bradshaw  > <mailto:rober...@google.com>> wrote:
> > 
> > Generally makes sense to me, though I'm curious what the maintenance
> > burden is *high or low) in keeping it around.
> 
> Well, we need to provide two versions of spark runner artifacts, job-servers 
> and docker images, to test them separately (different Jenkins jobs). We also 
> have two different code paths for the cases where API is not compatible 
> between Spark2 and Spark3.  
> 
> > We should probably
> > deprecate it for a period of time before removing support.
> 
> Agree and I’d suggest even ask users on user@/twitter before. 
> 
> 
> Actually, I see some problem with naming. By default, we used to call “Spark 
> runner” as a runner that works with Spark2 (for example, the artifa

Re: Jira -> GitHub Issues Migration (This Friday)

2022-06-24 Thread Alexey Romanenko
My main point was to minimise the number of PRs that are not properly linked to 
their issues since it risks to keep this issues open even if they are already 
technically resolved. Imho, with an issue prefix in PR title, as it was before, 
it’s much easier and faster to verify.

Though, if everybody finds it not useful then we will keep a current behaviour. 


> On 24 Jun 2022, at 21:40, Robert Bradshaw  wrote:
> 
> #N will automatically link to the issue (or PR), which should
> generally show up in the description. I'm not sold on the value of
> having to have it in the title. [BEAM-] was primarily useful
> because one had to reference an external system.
> 
> On Fri, Jun 24, 2022 at 12:34 PM Danny McCormick
>  wrote:
>> 
>> I don't think a similar "[issue #] title" linking construct exists for 
>> GitHub. AFAIK the recommended method of linking is to put things in the 
>> description, and I don't really see a reason that putting the issue number 
>> in the description is harder or more likely to be forgotten; I kinda expect 
>> it will just take some time for people to adjust to doing this the new way.
>> 
>> I'm probably ambivalent (or slightly opposed) to mandating an issue in the 
>> PR title - I mostly just see it as a small extra tax with limited value 
>> since I do think we should ask for issues in the description to preserve 
>> linking. With that said, I also haven't been around the project for as long 
>> and didn't previously build that feature into my regular workflow - others 
>> might find it more useful.
>> 
>> Thanks,
>> Danny
>> 
>> On Fri, Jun 24, 2022 at 1:30 PM Alexey Romanenko  
>> wrote:
>>> 
>>> I really liked and used the “feature” that we had before when we asked 
>>> developers to add a Jira issue ID as a prefix to commit messages/PR titles.
>>> 
>>> What’s about to have a similar thing with Github Issues? As additional 
>>> bonus we could link (is it possible?) a PR to a corresponding issue (as I 
>>> can see now, quite often people tend to forget to add this into PR 
>>> description).
>>> 
>>> —
>>> Alexey
>>> 
>>> On 7 Jun 2022, at 16:04, Danny McCormick  wrote:
>>> 
>>> This is definitely possible - if we have a list of the dependencies to 
>>> ignore we can specify an ignore list, or as they come up you can comment 
>>> "@dependabot ignore". In this case, having the explicit ignore list 
>>> probably makes sense. I'll follow up with Tomo to make sure the GCP 
>>> dependencies get added to the ignore list.
>>> 
>>> Thanks,
>>> Danny
>>> 
>>> On Mon, Jun 6, 2022 at 8:10 PM Ahmet Altay  wrote:
>>>> 
>>>> 
>>>> 
>>>> On Sat, Jun 4, 2022 at 8:30 PM Sachin Agarwal  wrote:
>>>>> 
>>>>> This is great, thank you so much Danny! I checked my issues and all look 
>>>>> correct.  Thank you!
>>>>> 
>>>>> On Sat, Jun 4, 2022 at 6:40 PM Danny McCormick 
>>>>>  wrote:
>>>>>> 
>>>>>> All Jiras should now be migrated to Issues*, and the owners should be 
>>>>>> assigned or tagged. Hopefully this will help us be a more productive 
>>>>>> community and will make it easier for newcomers! If you see any issues 
>>>>>> with the migration, or generally with using issues, please let me know.
>>>> 
>>>> 
>>>> Is it possible to disable dependabot for certain dependencies? GCP 
>>>> dependencies are managed in a way to happen at a lockstep, based on 
>>>> specifically validated sets
>>>> 
>>>> /cc @Tomo Suzuki - who did lots of work in that area.
>>>> 
>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Danny
>>>>>> 
>>>>>> * The "Beam Dependency Report Jiras" were not migrated for 2 reasons. 
>>>>>> (1) There were >500 of them dating back to Nov 2019, many outdated. (2) 
>>>>>> Dependabot has been added to the repo and should take care of putting up 
>>>>>> PRs for outdated dependencies. If that is an issue for any reason let me 
>>>>>> know, it is not too late to migrate those as well.
>>>>>> 
>>>>>> On Fri, Jun 3, 2022 at 11:14 AM Danny McCormick 
>>>>>>  wrote:
>>>>>>> 
>>>>>>> Hey Hector, they were just enabled (

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

2022-06-24 Thread Alexey Romanenko
Thanks, Danny!

> On 24 Jun 2022, at 19:23, Danny McCormick  wrote:
> 
> Sure, I put up a fix - https://github.com/apache/beam/pull/22048 
> <https://github.com/apache/beam/pull/22048>
> On Fri, Jun 24, 2022 at 1:20 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
> 
> 
>> > 2. The links in this report start with api.github.* and don’t take us 
>> > directly to the issues.
>> 
>> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>> 
>> This is already fixed - Pablo actually beat me to it! 
>> <https://github.com/apache/beam/pull/22033>
> It adds also a colon after URL and some mail clients consider it as a part of 
> URL which leads to a broken link.
> Should we just remove a colon there or add a space between?
> 
> —
> Alexey
> 
>> 
>> Thanks,
>> Danny
>> 
>> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette > <mailto:bhule...@google.com>> wrote:
>> +1 for that proposal!
>> 
>> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we have 
>> > a longer time window for the rest of not triaged or stagnate issues and 
>> > include them?
>> 
>> I worry these lists would get _very_ long and wouldn't be actionable. But 
>> maybe it's worth reporting something like "There are 376 P2's with no update 
>> in the last 6 months" with a link to a query?
>> 
>> > 2. The links in this report start with api.github.* and don’t take us 
>> > directly to the issues.
>> 
>> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>> 
>> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada > <mailto:pabl...@google.com>> wrote:
>> Thanks. I like the proposal, and I've found the emails useful.
>> Best
>> -P.
>> 
>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang > <mailto:owenzhang1...@gmail.com>> wrote:
>> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA 
>> time and having no response from engineers.  We either resolve them or 
>> downgrade the priority to extend time window.
>> 
>> Besides,
>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a 
>> longer time window for the rest of not triaged or stagnate issues and 
>> include them?
>> 2. The links in this report start with api.github.* and don’t take us 
>> directly to the issues.
>> 
>> 
>> Danny McCormick > <mailto:dannymccorm...@google.com>>于2022年6月24日 周五04:48写道:
>> That generally sounds right to me - I also would vote that we consolidate to 
>> 1 email and stop distinguishing between flaky P1s and normal P1s.
>> 
>> So the single daily report would be:
>> 
>> - Unassigned P0s
>> - P0s with no update in the last 36 hours
>> - Unassigned P1s
>> - P1s with no update in the last 7 days
>> 
>> I think that will generate a pretty good list of issues that require some 
>> kind of action.
>> 
>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles > <mailto:k...@apache.org>> wrote:
>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more like 
>> ~hours for true outages of CI/website/etc) and P1s > 7 days?
>> 
>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette > <mailto:bhule...@google.com>> wrote:
>> I think that Danny's alternate proposal (a daily email that show only issues 
>> last updated >7 days ago, and those with no assignee) fits well with the two 
>> goals you describe, if we include "triage needed" issues in the latter 
>> category. Maybe we also explicitly separate these two concerns in the report?
>> 
>> 
>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles > <mailto:k...@apache.org>> wrote:
>> Forking thread because lots of people may just ignore this topic, per the 
>> discussion :-)
>> 
>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>> 
>> I'll add some other outcomes of these emails:
>> 
>>  - people file P0s that are not outages and P1s that are not data loss and I 
>> downgrade them
>>  - I randomly open up a few flaky test bugs and see if I can fix them really 
>> quick
>>  - people file legit P0s and P1s and I subscribe and follow them
>> 
>> Of these, only the last one seems important (not just that *I* follow them, 
>> but that new P0s and P1s get immediate attention from many eyes)
>> 
>> So maybe one take on the goal is to:
>> 
>>  - have new P0s and P1s evaluated quickly: P0s are an ou

Re: Jira -> GitHub Issues Migration (This Friday)

2022-06-24 Thread Alexey Romanenko
a(s) 12:33, Danny McCormick (dannymccorm...@google.com 
> <mailto:dannymccorm...@google.com>) escribió:
> Given the consensus here, I updated the tool to do this. This means that we 
> won't update the JIRAs to be read-only until after the migration is complete. 
> I'll rerun the tool if any extra jiras come in during the intervening period. 
> The tool will also still write the mapping to the file in case there are 
> unforeseen issues so that we can backfill if needed.
> 
> Thanks for the suggestion and followup Brian, Ahmet, and Alexey.
> 
> On Thu, Jun 2, 2022 at 12:16 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
> +1 That would be very helpful for mapping!
> 
>> On 2 Jun 2022, at 17:48, Ahmet Altay > <mailto:al...@google.com>> wrote:
>> 
>> Is it possible to add comments on the JIRAs with a link to the new 
>> corresponding github issue?
>> 
>> On Thu, Jun 2, 2022 at 8:47 AM Danny McCormick > <mailto:dannymccorm...@google.com>> wrote:
>> Thanks for the feedback, I agree it would be good to keep that option open - 
>> I updated the tool to write those to a file when we create an issue. I'll 
>> share that after the migration.
>> 
>> Thanks,
>> Danny
>> 
>> On Wed, Jun 1, 2022 at 7:03 PM Brian Hulette > <mailto:bhule...@google.com>> wrote:
>> Thanks Danny. Regarding links to GitHub issues, if we could at least save 
>> off a record of jira <-> issue mappings we could look at adding the links 
>> later. I think it would be nice to have those links so that anyone landing 
>> in a jira through a search or an old link can quickly find the current 
>> ticket, but I don't think that needs to block the migration.
>> 
>> On Wed, Jun 1, 2022 at 7:05 AM Danny McCormick > <mailto:dannymccorm...@google.com>> wrote:
>> Hey Brian,
>> 
>> 1. Right now, the plan is to (1) turn on the issues tab, (2) make the JIRA 
>> read only, (3) run the migration tool. Since the migration tool won't be run 
>> until after Jiras are read only, there shouldn't be issues with making sure 
>> everything gets captured.
>> 2. That current ordering does mean it's difficult to add a link to the newly 
>> created Issue, and I hadn't built in that feature. With that said, I will 
>> ask Infra if they're able to put up a banner redirecting people to GitHub 
>> for the Beam project - that should hopefully minimize some of the issues - 
>> and I'll also look into updating the tool to do that in case the banner 
>> isn't doable. I'm also planning on doing a few passes to update our docs and 
>> code comments from Jiras to issues once the migration is done.
>> 
>> Thanks,
>> Danny
>> 
>> On Tue, May 31, 2022 at 8:09 PM Brian Hulette > <mailto:bhule...@google.com>> wrote:
>> Thanks Danny, it's great to see this happening!
>> 
>> A couple of questions:
>> - Is there something we can do to remind people creating a jira that they 
>> should create a bug instead (e.g. a template)? If not I suppose we can just 
>> re-run the migration tool a few times up until jira creation is disabled to 
>> make sure everything is captured.
>> - Will your migration tooling comment on the original jira with a link to 
>> the new issue in GitHub?
>> 
>> Brian
>> 
>> On Tue, May 31, 2022 at 9:57 AM Robert Bradshaw > <mailto:rober...@google.com>> wrote:
>> Thanks for finally making this happen.
>> 
>> On Tue, May 31, 2022 at 7:18 AM Sachin Agarwal > <mailto:sachi...@google.com>> wrote:
>> >
>> > Thank you Danny! This will help us a lot, especially with new 
>> > contributors. Thanks so much!
>> >
>> > On Tue, May 31, 2022 at 4:10 AM Danny McCormick > > <mailto:dannymccorm...@google.com>> wrote:
>> >>
>> >> Hey folks, this is a reminder that we will be migrating from Jira to 
>> >> GitHub Issues this Friday (6/4). A few key details to keep in mind:
>> >>
>> >> 1. All active Jiras will get automatically migrated and assigned over the 
>> >> course of the weekend.
>> >> 2. Starting Friday (once the the Issues tab is open), please stop 
>> >> creating Jiras and start creating Issues instead. You should also 
>> >> reference issues in your PRs and commits instead of Jiras. The Jira 
>> >> creation flow will eventually be disabled.
>> >> 3. If you encounter any issues that can't be resolved by looking at the 
>> >> doc updates, please let me know and/or follow up in this thread.
>> >>
>> >> I'm looking forward to seeing how Issues can minimize friction for new 
>> >> contributors and I'm hopeful that this will be a smooth transition. If 
>> >> you have any last minute concerns let me know. For more context, see the 
>> >> original thread on this topic.
>> >>
>> >> Thanks,
>> >> Danny
> 



Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

2022-06-24 Thread Alexey Romanenko


> > 2. The links in this report start with api.github.* and don’t take us 
> > directly to the issues.
> 
> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
> 
> This is already fixed - Pablo actually beat me to it! 
> 
It adds also a colon after URL and some mail clients consider it as a part of 
URL which leads to a broken link.
Should we just remove a colon there or add a space between?

—
Alexey

> 
> Thanks,
> Danny
> 
> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette  > wrote:
> +1 for that proposal!
> 
> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a 
> > longer time window for the rest of not triaged or stagnate issues and 
> > include them?
> 
> I worry these lists would get _very_ long and wouldn't be actionable. But 
> maybe it's worth reporting something like "There are 376 P2's with no update 
> in the last 6 months" with a link to a query?
> 
> > 2. The links in this report start with api.github.* and don’t take us 
> > directly to the issues.
> 
> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
> 
> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada  > wrote:
> Thanks. I like the proposal, and I've found the emails useful.
> Best
> -P.
> 
> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang  > wrote:
> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA 
> time and having no response from engineers.  We either resolve them or 
> downgrade the priority to extend time window.
> 
> Besides,
> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a 
> longer time window for the rest of not triaged or stagnate issues and include 
> them?
> 2. The links in this report start with api.github.* and don’t take us 
> directly to the issues.
> 
> 
> Danny McCormick  >于2022年6月24日 周五04:48写道:
> That generally sounds right to me - I also would vote that we consolidate to 
> 1 email and stop distinguishing between flaky P1s and normal P1s.
> 
> So the single daily report would be:
> 
> - Unassigned P0s
> - P0s with no update in the last 36 hours
> - Unassigned P1s
> - P1s with no update in the last 7 days
> 
> I think that will generate a pretty good list of issues that require some 
> kind of action.
> 
> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles  > wrote:
> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more like 
> ~hours for true outages of CI/website/etc) and P1s > 7 days?
> 
> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette  > wrote:
> I think that Danny's alternate proposal (a daily email that show only issues 
> last updated >7 days ago, and those with no assignee) fits well with the two 
> goals you describe, if we include "triage needed" issues in the latter 
> category. Maybe we also explicitly separate these two concerns in the report?
> 
> 
> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles  > wrote:
> Forking thread because lots of people may just ignore this topic, per the 
> discussion :-)
> 
> (sometimes gmail doesn't fork thread properly, but here's hoping...)
> 
> I'll add some other outcomes of these emails:
> 
>  - people file P0s that are not outages and P1s that are not data loss and I 
> downgrade them
>  - I randomly open up a few flaky test bugs and see if I can fix them really 
> quick
>  - people file legit P0s and P1s and I subscribe and follow them
> 
> Of these, only the last one seems important (not just that *I* follow them, 
> but that new P0s and P1s get immediate attention from many eyes)
> 
> So maybe one take on the goal is to:
> 
>  - have new P0s and P1s evaluated quickly: P0s are an outage or outage-like 
> occurrence that needs immediate remedy, and P1s need to be evaluated for 
> release blocking, etc.
>  - make sure P0s and P1s get attention appropriate to their priority
> 
> It can also be helpful to just state the failure modes which would happen by 
> default if we don't have a good process or automation:
> 
>  - Real P0 gets filed and not noticed or fixed in a timely manner, blocking 
> users and/or community in real time
>  - Real P1 gets filed and not noticed, so release goes out with known data 
> loss bug or other total loss of functionality
>  - Non-real P0s and P1s accumulate, throwing off our data and making it hard 
> to find the real problems
>  - Flakes are never fixed
> 
> WDYT?
> 
> If we have P0s and P1s in the "awaiting triage" state, those are the ones we 
> need to notice. Then for a P0 or P1 outside of that state, we just need some 
> way of making sure it doesn't stagnate. Or if it does stagnate, that 
> empirically demonstrates it isn't really P1 (just like our P2 to P3 downgrade 
> automation). If everything is P1, nothing is, as they say.
> 
> Kenn
> 
> On Thu, Jun 23, 

Re: [VOTE] Release 2.40.0, candidate #2

2022-06-24 Thread Alexey Romanenko
+1 (binding)
The same testing as for RC1

—
Alexey

> On 24 Jun 2022, at 05:51, Pablo Estrada  wrote:
> 
> +1 (binding)
> 
> I've tested by building and running local tests for existing Dataflow 
> Templates.
> 
> On Thu, Jun 23, 2022 at 2:39 PM Pablo Estrada  > wrote:
> Hi all,
> I believe Dataflow containers have just been pushed, so feel free to validate 
> those now as well.
> -P.
> 
> On Thu, Jun 23, 2022 at 12:04 PM Pablo Estrada  > wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version 2.40.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>  
>  
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if no issues are found.
>  
> The complete staging area is available for your review, which includes:
> * Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.40.0-RC2" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.4 and openjdk version "1.8.0_232".
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2] and PyPI[8].
> * Validation sheet with a tab for 2.40.0 release to help with validation [9].
> * Docker images published to Docker Hub [10].
>  
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
>  
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/ 
> .
>  
> Thanks,
> -P.
> 
> P.S.: Dataflow containers have not yet been pushed, so please hold off before 
> testing that.
>  
> [1] https://github.com/apache/beam/releases/tag/v2.40.0-RC2 
>  
> [2] https://dist.apache.org/repos/dist/dev/beam/2.40.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1275/ 
>  
> [5] https://github.com/apache/beam/tree/v2.40.0-RC2 
> 
> [6] https://github.com/apache/beam/pull/21947 
> 
> [7] https://github.com/apache/beam-site/pull/632 
> 
> [8] https://pypi.org/project/apache-beam/2.40.0rc2/ 
> 
> [9] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1844197258
>  
> 
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image 
> 


Re: [VOTE] Release 2.40.0, candidate #1

2022-06-22 Thread Alexey Romanenko
+1 (binding)

I tested it with  https://github.com/Talend/beam-samples/ 
 
(Java 8&11 SDK, Spark 3 runner).

---
Alexey

> On 21 Jun 2022, at 04:28, Ahmet Altay  wrote:
> 
> 
> 
> On Mon, Jun 20, 2022 at 6:35 PM Pablo Estrada  > wrote:
> I have not yet pushed Dataflow containers. I'll push those tomorrow.
> 
> Ack. I missed the note at the end of your first email.
>  
> 
> On Mon, Jun 20, 2022 at 6:34 PM Ahmet Altay  > wrote:
> I ran into issues with running python jobs on Dataflow. They failed with an 
> "Failed to fetch \"2.40.0\" from request 
> \"/v2/cloud-dataflow/v1beta3/python38/manifests/2.40.0\"." error. It might 
> also be an issue on my end because I was using a different setup than my 
> usual. It would be good for someone else to verify running python on dataflow.
> 
> Ahmet
> 
> On Mon, Jun 20, 2022 at 4:10 PM Pablo Estrada  > wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #1 for the version 2.40.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>  
>  
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if no issues are found.
>  
> The complete staging area is available for your review, which includes:
> * Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> C79DDD47DAF3808F0B9DDFAC02B2D9F742008494 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.40.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.4 and openjdk version "1.8.0_232".
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2] and PyPI[8].
> * Validation sheet with a tab for 2.40.0 release to help with validation [9].
> * Docker images published to Docker Hub [10].
>  
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
>  
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/ 
> .
>  
> Thanks,
> -P.
> 
> P.S.: Dataflow containers have not yet been pushed, so please hold off before 
> testing that. I'll push them by tomorrow.
>  
> [1] https://github.com/apache/beam/releases/tag/untagged-c5c3f847bb360d87ac15 
>  
> [2] https://dist.apache.org/repos/dist/dev/beam/2.40.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1274/ 
> 
> [5] https://github.com/apache/beam/tree/v2.40.0-RC1 
> 
> [6] https://github.com/apache/beam/pull/21947 
> 
> [7] https://github.com/apache/beam-site/pull/632 
> 
> [8] https://pypi.org/project/apache-beam/2.40.0rc1/ 
> 
> [9] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1844197258
>  
> 
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image 
> 



Re: Jenkins CI currently unavailable

2022-06-14 Thread Alexey Romanenko
Additionally to what Kenn said, we have some documentation here:
https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips 


Though, not sure how up-to-date it is.

—
Alexey

> On 14 Jun 2022, at 16:42, Kenneth Knowles  wrote:
> 
> The UI is https://ci-beam.apache.org/  and it is 
> integrated with ASF's LDAP. I don't know if this URL is documented anywhere.
> 
> Usage of the UI is standard Jenkins. You can select any job and click "build 
> with parameters" and put in a git ref to build from.
> 
> Kenn
> 
> On Mon, Jun 13, 2022 at 5:54 PM Reuven Lax  > wrote:
> I am a committer, but I'm not sure how to even get to the Jenkins UI! Is this 
> documented anywhere?
> 
> This PR affects how the Dataflow runner works, so we should run Dataflow 
> postcommits on it.
> 
> On Mon, Jun 13, 2022 at 4:22 PM Kiley Sok  > wrote:
> Reuven, it looks like yours may have been stuck in a weird state when we 
> re-enabled the precommits. I kicked off the tests on your PR with 'retest 
> this please'
> 
> To clarify, precommits are working as before (pr comment and update 
> triggered) and so you should be able to check in code. 
> 
> If you want further testing with post commits, you'll need a committer to 
> manually trigger them on the Jenkins UI. I believe you can do this by 'Build 
> with Parameter' and putting in the PR number (correct me if I'm wrong @Robert 
> Burke ). If there are no objections, I can 
> re-enable triggers for the common postcommits. 
> 
> 
> 
> On Mon, Jun 13, 2022 at 4:06 PM Reuven Lax  > wrote:
> Are there any pointers on how to manually trigger the runs?
> 
> On Mon, Jun 13, 2022 at 4:04 PM Robert Burke  > wrote:
> You know, I do forget that committers can manually trigger Jenkins runs.
> 
> And after fiddling with the Jenkins options and filling in the expected, but 
> missing PR number parameter i think I've managed it.
> 
> Thanks for the reminder!
> 
> On Mon, Jun 13, 2022, 3:38 PM Kiley Sok  > wrote:
> Can you run the post commits from the Jenkins UI to unblock? We've turned off 
> the triggers for all post commits, but could turn it on for a select few as 
> well.
> 
> On Mon, Jun 13, 2022 at 3:31 PM Robert Burke  > wrote:
> There are a couple of Go SDK PRs that are basically blocked on final manual 
> runs of the post commits, that we'd like to get in for the 2.40 cut.
> 
> Are we intending on delaying the 2.40 cut a little bit so PRs like those can 
> make it in?
> 
> 
> On Mon, Jun 13, 2022, 1:32 PM Ahmet Altay  > wrote:
> Thank you all for working on this.
> 
> On Mon, Jun 13, 2022 at 10:09 AM Kenneth Knowles  > wrote:
> Yes, the ghprb plugin was disabled. That was the entire action. I believe my 
> PR will reduce the load caused by the ghprb plugin; we are currently 
> restarting Jenkins to re-enable it. So we can unfreeze master as soon as 
> Jenkins reboots. Basically, if your PR has a precommit status great, 
> otherwise please wait.
> 
> What we lose from my change is postcommit comment triggers. I see how this is 
> unfortunate for our established release process that runs them all on the 
> release branch.
> 
> Going forward, we are using old/unmaintained plugins and need to stop relying 
> on them. There are two obvious choices:
> 
> (1) use some "latest and greatest" Jenkins plugin - most likely the 
> multibranch pipeline plugin (aka Jenkinsfile plugin)
> (2) use GitHub Actions
> 
> I believe both of these will be comparable in migration effort. I'm in favor 
> of expanding our GitHub Actions usage to take over what we do with Jenkins. 
> We have figured out how to have self-hosted workers, just like we do for 
> Jenkins. I don't know what other pitfalls may exist.
> 
> A good first step would be to bring the GitHub Actions precommits to parity 
> with the Jenkins precommits.
> 
> +1. After spending some time, these two plugins are not very compatible and 
> migration to the new plugin would itself be a sufficiently large migration. 
> We are using GH actions to an extent today and in general they were working 
> fine.
>  
> 
> Kenn
> 
> On Mon, Jun 13, 2022 at 9:44 AM Brian Hulette  > wrote:
> Can someone familiar with this clarify the current status? It looks like 
> PostCommits and PreCommit_Cron jobs are still running on a schedule, e.g. 
> [1,2]. Was the ghprb plugin (responsible from triggering jobs based on new 
> PRs and comments) just disabled?
> 
> If that's the case then we have a full suite of PostCommits, but the only 
> precommit checks we have are GitHub Actions checks. These are pretty 
> thorough, but off the top of my head a decent amount is missing:
> - No PyLint, PyDoc precommits
> - We can't trigger 

Re: Failing beam_SeedJob

2022-06-03 Thread Alexey Romanenko
Thanks Valentyn, it may help! 

I’m wondering if the seed job was broken because of Pablo’s change or it just 
happened the same issue like in this thread and it was fixed just by Jenkins 
restart.

> On 3 Jun 2022, at 15:22, Valentyn Tymofieiev  wrote:
> 
> I am not aware of a straightforward way. I have been using this script: 
> https://gist.github.com/tvalentyn/08cdb2836d311c55ae6ec09a276aea2a 
> <https://gist.github.com/tvalentyn/08cdb2836d311c55ae6ec09a276aea2a> 
> +Pablo Estrada <mailto:pabl...@google.com> investigated this in 
> https://issues.apache.org/jira/browse/BEAM-14377 
> <https://issues.apache.org/jira/browse/BEAM-14377> but Pablo's changes were 
> rolled back because somehow seed job became broken.
> 
> On Fri, Jun 3, 2022 at 2:56 PM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
> Yes, it’s ok now. 
> Thanks all!
> 
> Side question - is there a way to run a Jenkins job against specific PR? I 
> tried to specify different git refs as an input parameter but it didn’t work 
> for me…
> 
> —
> Alexey 
> 
>> On 3 Jun 2022, at 05:02, Ahmet Altay > <mailto:al...@google.com>> wrote:
>> 
>> Seed job looks healthy again: 
>> https://ci-beam.apache.org/job/beam_SeedJob/9762/ 
>> <https://ci-beam.apache.org/job/beam_SeedJob/9762/> .
>> 
>> On Thu, Jun 2, 2022 at 10:24 AM Ryan Thompson > <mailto:ryanthomp...@google.com>> wrote:
>> I've filed a jira to infra and asked on the slack channel. Hopefully someone 
>> there has the power to restart the server.
>> 
>> https://issues.apache.org/jira/browse/INFRA-23335 
>> <https://issues.apache.org/jira/browse/INFRA-23335>
>> 
>> On Thu, Jun 2, 2022 at 12:41 PM Kenneth Knowles > <mailto:k...@apache.org>> wrote:
>> Commented on the Jira. Seems like it happens somewhat rarely, but is 
>> sometimes resolved by a restart, and sometimes has to do with some version 
>> mismatch issues. I did not find anything like a real root cause, just 
>> trial-and-error fixes. I'm not certain what may have occurred around May 19 
>> to this infrastructure. Hoping INFRA can help us sort it out.
>> 
>> Kenn
>> 
>> On Thu, Jun 2, 2022 at 9:15 AM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>> Thanks Ahmet and Ryan for taking a look!
>> 
>> I agree that referenced commits seems are not related, that us why I was 
>> puzzled with this.
>> 
>>> On 2 Jun 2022, at 17:56, Ahmet Altay >> <mailto:al...@google.com>> wrote:
>>> 
>>> 
>>> 
>>> On Thu, Jun 2, 2022 at 8:53 AM Ryan Thompson >> <mailto:ryanthomp...@google.com>> wrote:
>>> I asked in the slack channel to restart jenkins. I'm looking through the 
>>> past messages to see if there's someone there I can tag.
>>> 
>>> Thank you. Unfortunately, I do not believe we have a single expert we can 
>>> tag.
>>>  
>>> Am I right in understanding this service is managed by someone in apache 
>>> and not us?
>>> 
>>> It is mixed. ASF Infra works with a vendor and they run a hosted jenkins 
>>> instance. We control the configuration of the instance, and the worker 
>>> nodes.
>>>  
>>> 
>>> On Thu, Jun 2, 2022 at 11:47 AM Ahmet Altay >> <mailto:al...@google.com>> wrote:
>>> /cc @Kenneth Knowles <mailto:k...@google.com>
>>> On Thu, Jun 2, 2022 at 8:44 AM Ahmet Altay >> <mailto:al...@google.com>> wrote:
>>> I do not have a great idea but googling about the error, similar errors 
>>> were resolved by restarting jenkins. We could try that. We may need to ask 
>>> infra. ( @Ryan Thompson <mailto:ryanthomp...@google.com> - could you please 
>>> infra to restart jenkins?)
>>> 
>>> I do not think this issue is related to any change in the source control. 
>>> The commit referenced in the first failed job is an unrelated doc change. 
>>> 
>>> 
>>> On Thu, Jun 2, 2022 at 7:57 AM Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>> I created a jira for this (not sure if it’s P0, but P1 for sure):
>>> https://issues.apache.org/jira/browse/BEAM-14548 
>>> <https://issues.apache.org/jira/browse/BEAM-14548>
>>> 
>>> Could someone, who has more knowledge than me in Beam 
>>> infrastructure/Jenkins, take a look, please?
>>> 
>>> —
>>> Alexey
>>> 
>>>> On 31 May 2022, at 17:08, Alexey Romanenko >>> <mailto:aromanenko@gm

Re: Failing beam_SeedJob

2022-06-03 Thread Alexey Romanenko
Yes, it’s ok now. 
Thanks all!

Side question - is there a way to run a Jenkins job against specific PR? I 
tried to specify different git refs as an input parameter but it didn’t work 
for me…

—
Alexey 

> On 3 Jun 2022, at 05:02, Ahmet Altay  wrote:
> 
> Seed job looks healthy again: 
> https://ci-beam.apache.org/job/beam_SeedJob/9762/ 
> <https://ci-beam.apache.org/job/beam_SeedJob/9762/> .
> 
> On Thu, Jun 2, 2022 at 10:24 AM Ryan Thompson  <mailto:ryanthomp...@google.com>> wrote:
> I've filed a jira to infra and asked on the slack channel. Hopefully someone 
> there has the power to restart the server.
> 
> https://issues.apache.org/jira/browse/INFRA-23335 
> <https://issues.apache.org/jira/browse/INFRA-23335>
> 
> On Thu, Jun 2, 2022 at 12:41 PM Kenneth Knowles  <mailto:k...@apache.org>> wrote:
> Commented on the Jira. Seems like it happens somewhat rarely, but is 
> sometimes resolved by a restart, and sometimes has to do with some version 
> mismatch issues. I did not find anything like a real root cause, just 
> trial-and-error fixes. I'm not certain what may have occurred around May 19 
> to this infrastructure. Hoping INFRA can help us sort it out.
> 
> Kenn
> 
> On Thu, Jun 2, 2022 at 9:15 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
> Thanks Ahmet and Ryan for taking a look!
> 
> I agree that referenced commits seems are not related, that us why I was 
> puzzled with this.
> 
>> On 2 Jun 2022, at 17:56, Ahmet Altay > <mailto:al...@google.com>> wrote:
>> 
>> 
>> 
>> On Thu, Jun 2, 2022 at 8:53 AM Ryan Thompson > <mailto:ryanthomp...@google.com>> wrote:
>> I asked in the slack channel to restart jenkins. I'm looking through the 
>> past messages to see if there's someone there I can tag.
>> 
>> Thank you. Unfortunately, I do not believe we have a single expert we can 
>> tag.
>>  
>> Am I right in understanding this service is managed by someone in apache and 
>> not us?
>> 
>> It is mixed. ASF Infra works with a vendor and they run a hosted jenkins 
>> instance. We control the configuration of the instance, and the worker nodes.
>>  
>> 
>> On Thu, Jun 2, 2022 at 11:47 AM Ahmet Altay > <mailto:al...@google.com>> wrote:
>> /cc @Kenneth Knowles <mailto:k...@google.com>
>> On Thu, Jun 2, 2022 at 8:44 AM Ahmet Altay > <mailto:al...@google.com>> wrote:
>> I do not have a great idea but googling about the error, similar errors were 
>> resolved by restarting jenkins. We could try that. We may need to ask infra. 
>> ( @Ryan Thompson <mailto:ryanthomp...@google.com> - could you please infra 
>> to restart jenkins?)
>> 
>> I do not think this issue is related to any change in the source control. 
>> The commit referenced in the first failed job is an unrelated doc change. 
>> 
>> 
>> On Thu, Jun 2, 2022 at 7:57 AM Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>> I created a jira for this (not sure if it’s P0, but P1 for sure):
>> https://issues.apache.org/jira/browse/BEAM-14548 
>> <https://issues.apache.org/jira/browse/BEAM-14548>
>> 
>> Could someone, who has more knowledge than me in Beam 
>> infrastructure/Jenkins, take a look, please?
>> 
>> —
>> Alexey
>> 
>>> On 31 May 2022, at 17:08, Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>> 
>>> The first failed job is https://ci-beam.apache.org/job/beam_SeedJob/9696/ 
>>> <https://ci-beam.apache.org/job/beam_SeedJob/9696/>
>>> It fails with this error (that says not so much):
>>> 
>>> Processing DSL script .test-infra/jenkins/job_00_seed.groovy
>>> Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
>>> Processing DSL script 
>>> .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
>>> Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
>>> ERROR: java.io.IOException: Failed to persist config.xml
>>> 
>>>  and I don’t see any recent changes for these files.
>>> 
>>>> On 31 May 2022, at 16:24, Alexey Romanenko >>> <mailto:aromanenko@gmail.com>> wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> Jenkins job `beam_SeedJob` keeps failing starting from May 19th. The last 
>>>> successful build [2] was 12 days ago.
>>>> Does anyone know the reasons of this?
>>>> 
>>>> —
>>>> Alexey
>>>> 
>>>> 
>>>> [1] https://ci-beam.apache.org/job/beam_SeedJob/ 
>>>> <https://ci-beam.apache.org/job/beam_SeedJob/>
>>>> [2] https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/ 
>>>> <https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/>
>>>>
>>>> 
>>> 
>> 
> 



Re: Jira -> GitHub Issues Migration (This Friday)

2022-06-02 Thread Alexey Romanenko
+1 That would be very helpful for mapping!

> On 2 Jun 2022, at 17:48, Ahmet Altay  wrote:
> 
> Is it possible to add comments on the JIRAs with a link to the new 
> corresponding github issue?
> 
> On Thu, Jun 2, 2022 at 8:47 AM Danny McCormick  > wrote:
> Thanks for the feedback, I agree it would be good to keep that option open - 
> I updated the tool to write those to a file when we create an issue. I'll 
> share that after the migration.
> 
> Thanks,
> Danny
> 
> On Wed, Jun 1, 2022 at 7:03 PM Brian Hulette  > wrote:
> Thanks Danny. Regarding links to GitHub issues, if we could at least save off 
> a record of jira <-> issue mappings we could look at adding the links later. 
> I think it would be nice to have those links so that anyone landing in a jira 
> through a search or an old link can quickly find the current ticket, but I 
> don't think that needs to block the migration.
> 
> On Wed, Jun 1, 2022 at 7:05 AM Danny McCormick  > wrote:
> Hey Brian,
> 
> 1. Right now, the plan is to (1) turn on the issues tab, (2) make the JIRA 
> read only, (3) run the migration tool. Since the migration tool won't be run 
> until after Jiras are read only, there shouldn't be issues with making sure 
> everything gets captured.
> 2. That current ordering does mean it's difficult to add a link to the newly 
> created Issue, and I hadn't built in that feature. With that said, I will ask 
> Infra if they're able to put up a banner redirecting people to GitHub for the 
> Beam project - that should hopefully minimize some of the issues - and I'll 
> also look into updating the tool to do that in case the banner isn't doable. 
> I'm also planning on doing a few passes to update our docs and code comments 
> from Jiras to issues once the migration is done.
> 
> Thanks,
> Danny
> 
> On Tue, May 31, 2022 at 8:09 PM Brian Hulette  > wrote:
> Thanks Danny, it's great to see this happening!
> 
> A couple of questions:
> - Is there something we can do to remind people creating a jira that they 
> should create a bug instead (e.g. a template)? If not I suppose we can just 
> re-run the migration tool a few times up until jira creation is disabled to 
> make sure everything is captured.
> - Will your migration tooling comment on the original jira with a link to the 
> new issue in GitHub?
> 
> Brian
> 
> On Tue, May 31, 2022 at 9:57 AM Robert Bradshaw  > wrote:
> Thanks for finally making this happen.
> 
> On Tue, May 31, 2022 at 7:18 AM Sachin Agarwal  > wrote:
> >
> > Thank you Danny! This will help us a lot, especially with new contributors. 
> > Thanks so much!
> >
> > On Tue, May 31, 2022 at 4:10 AM Danny McCormick  > > wrote:
> >>
> >> Hey folks, this is a reminder that we will be migrating from Jira to 
> >> GitHub Issues this Friday (6/4). A few key details to keep in mind:
> >>
> >> 1. All active Jiras will get automatically migrated and assigned over the 
> >> course of the weekend.
> >> 2. Starting Friday (once the the Issues tab is open), please stop creating 
> >> Jiras and start creating Issues instead. You should also reference issues 
> >> in your PRs and commits instead of Jiras. The Jira creation flow will 
> >> eventually be disabled.
> >> 3. If you encounter any issues that can't be resolved by looking at the 
> >> doc updates, please let me know and/or follow up in this thread.
> >>
> >> I'm looking forward to seeing how Issues can minimize friction for new 
> >> contributors and I'm hopeful that this will be a smooth transition. If you 
> >> have any last minute concerns let me know. For more context, see the 
> >> original thread on this topic.
> >>
> >> Thanks,
> >> Danny



Re: Failing beam_SeedJob

2022-06-02 Thread Alexey Romanenko
Thanks Ahmet and Ryan for taking a look!

I agree that referenced commits seems are not related, that us why I was 
puzzled with this.

> On 2 Jun 2022, at 17:56, Ahmet Altay  wrote:
> 
> 
> 
> On Thu, Jun 2, 2022 at 8:53 AM Ryan Thompson  <mailto:ryanthomp...@google.com>> wrote:
> I asked in the slack channel to restart jenkins. I'm looking through the past 
> messages to see if there's someone there I can tag.
> 
> Thank you. Unfortunately, I do not believe we have a single expert we can tag.
>  
> Am I right in understanding this service is managed by someone in apache and 
> not us?
> 
> It is mixed. ASF Infra works with a vendor and they run a hosted jenkins 
> instance. We control the configuration of the instance, and the worker nodes.
>  
> 
> On Thu, Jun 2, 2022 at 11:47 AM Ahmet Altay  <mailto:al...@google.com>> wrote:
> /cc @Kenneth Knowles <mailto:k...@google.com>
> On Thu, Jun 2, 2022 at 8:44 AM Ahmet Altay  <mailto:al...@google.com>> wrote:
> I do not have a great idea but googling about the error, similar errors were 
> resolved by restarting jenkins. We could try that. We may need to ask infra. 
> ( @Ryan Thompson <mailto:ryanthomp...@google.com> - could you please infra to 
> restart jenkins?)
> 
> I do not think this issue is related to any change in the source control. The 
> commit referenced in the first failed job is an unrelated doc change. 
> 
> 
> On Thu, Jun 2, 2022 at 7:57 AM Alexey Romanenko  <mailto:aromanenko@gmail.com>> wrote:
> I created a jira for this (not sure if it’s P0, but P1 for sure):
> https://issues.apache.org/jira/browse/BEAM-14548 
> <https://issues.apache.org/jira/browse/BEAM-14548>
> 
> Could someone, who has more knowledge than me in Beam infrastructure/Jenkins, 
> take a look, please?
> 
> —
> Alexey
> 
>> On 31 May 2022, at 17:08, Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>> 
>> The first failed job is https://ci-beam.apache.org/job/beam_SeedJob/9696/ 
>> <https://ci-beam.apache.org/job/beam_SeedJob/9696/>
>> It fails with this error (that says not so much):
>> 
>> Processing DSL script .test-infra/jenkins/job_00_seed.groovy
>> Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
>> Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
>> Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
>> ERROR: java.io.IOException: Failed to persist config.xml
>> 
>>  and I don’t see any recent changes for these files.
>> 
>>> On 31 May 2022, at 16:24, Alexey Romanenko >> <mailto:aromanenko@gmail.com>> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> Jenkins job `beam_SeedJob` keeps failing starting from May 19th. The last 
>>> successful build [2] was 12 days ago.
>>> Does anyone know the reasons of this?
>>> 
>>> —
>>> Alexey
>>> 
>>> 
>>> [1] https://ci-beam.apache.org/job/beam_SeedJob/ 
>>> <https://ci-beam.apache.org/job/beam_SeedJob/>
>>> [2] https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/ 
>>> <https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/>
>>> 
>>> 
>> 
> 



Re: Failing beam_SeedJob

2022-06-02 Thread Alexey Romanenko
I created a jira for this (not sure if it’s P0, but P1 for sure):
https://issues.apache.org/jira/browse/BEAM-14548 
<https://issues.apache.org/jira/browse/BEAM-14548>

Could someone, who has more knowledge than me in Beam infrastructure/Jenkins, 
take a look, please?

—
Alexey

> On 31 May 2022, at 17:08, Alexey Romanenko  wrote:
> 
> The first failed job is https://ci-beam.apache.org/job/beam_SeedJob/9696/ 
> <https://ci-beam.apache.org/job/beam_SeedJob/9696/>
> It fails with this error (that says not so much):
> 
> Processing DSL script .test-infra/jenkins/job_00_seed.groovy
> Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
> Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
> Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
> ERROR: java.io.IOException: Failed to persist config.xml
> 
>  and I don’t see any recent changes for these files.
> 
>> On 31 May 2022, at 16:24, Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>> 
>> Hi everyone,
>> 
>> Jenkins job `beam_SeedJob` keeps failing starting from May 19th. The last 
>> successful build [2] was 12 days ago.
>> Does anyone know the reasons of this?
>> 
>> —
>> Alexey
>> 
>> 
>> [1] https://ci-beam.apache.org/job/beam_SeedJob/ 
>> <https://ci-beam.apache.org/job/beam_SeedJob/>
>> [2] https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/ 
>> <https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/>
>>  
>> 
> 



Re: [CDAP IO] Failing R JavaPrecommit checks

2022-05-31 Thread Alexey Romanenko
I guess it’s related to added a hard version of scala-library dependency. Try 
to make it “compileOnly” (as we do for Spark runner for example). Also, there 
is no need to escalate it to BeamModulePlugin.groovy - please, keep it local 
for your module.

—
Alexey


> On 31 May 2022, at 23:17, Elizaveta Lomteva  
> wrote:
> 
> Hi everyone,
> 
> In Add ReceiverBuilder for SparkReceiverIO PR [1] for the CDAP IO connector 
> project, the Java Precommit checks failed with errors unrelated to changes in 
> the PR:
> 
> 18:13:23 1: Task failed with an exception.
> 18:13:23 ---
> 18:13:23 * What went wrong:
> 18:13:23 Execution failed for task ':sdks:java:io:amazon-web-services:test'.
> 
> 18:13:23 2: Task failed with an exception.
> 18:13:23 ---
> 18:13:23 * What went wrong:
> 18:13:23 Execution failed for task ':sdks:java:io:amazon-web-services2:test'.
> 
> 18:13:23 3: Task failed with an exception.
> 18:13:23 ---
> 18:13:23 * What went wrong:
> 18:13:23 Execution failed for task ':runners:spark:3:test'.
> 
> 18:13:23 4: Task failed with an exception.
> 18:13:23 ---
> 18:13:23 * What went wrong:
> 18:13:23 Execution failed for task ':runners:flink:1.12:test'.
> 
> 18:13:23 5: Task failed with an exception.
> 18:13:23 ---
> 18:13:23 * What went wrong:
> 18:13:23 Execution failed for task ':runners:flink:1.13:test'.
> 
> 
> Does someone know why it's so and how to fix it?
> 
> Regards,
> Elizaveta
> 
> [1] [BEAM-14101] [CdapIO] Add ReceiverBuilder for SparkReceiverIO PR 
> 


Re: [ANNOUNCE] New committer: Ke Wu

2022-05-31 Thread Alexey Romanenko
Congrats, Ke!

—
Alexey

> On 31 May 2022, at 18:09, Xinyu Liu  wrote:
> 
> Congrats!
> 
> Xinyu
> 
> On Mon, May 30, 2022 at 7:46 AM Evan Galpin  > wrote:
> Congrats Ke!
> 
> - Evan
> 
> 
> On Mon, May 30, 2022 at 4:11 AM Jan Lukavský  > wrote:
> Congrats Ke!
> 
>  Jan
> 
> On 5/29/22 04:12, Yi Pan wrote:
>> Congrats, Ke!
>> 
>> -Yi
>> 
>> On Sat, May 28, 2022 at 6:57 PM Robert Burke > > wrote:
>> Congratulations!
>> Another place that runs the Go SDK ;)
>> 
>> On Fri, May 27, 2022, 3:49 PM Ahmet Altay > > wrote:
>> Hi all,
>> 
>> Please join me and the rest of the Beam PMC in welcoming a new committer: Ke 
>> Wu (kw2542@)
>> 
>> Ke has been contributing to Beam since 2020. Ke's contributions are mostly 
>> focused on the SamzaRunner, as a result of Ke's efforts Beam has a fully 
>> featured, portable, supported SamzaRunner with happy users!
>> 
>> Considering these contributions, the Beam PMC trusts Ke with the 
>> responsibilities of a Beam committer.[1]
>> 
>> Thank you Ke!
>> 
>> Ahmet
>> 
>> [1] 
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>  
>> 



Re: Failing beam_SeedJob

2022-05-31 Thread Alexey Romanenko
The first failed job is https://ci-beam.apache.org/job/beam_SeedJob/9696/ 
<https://ci-beam.apache.org/job/beam_SeedJob/9696/>
It fails with this error (that says not so much):

Processing DSL script .test-infra/jenkins/job_00_seed.groovy
Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
ERROR: java.io.IOException: Failed to persist config.xml

 and I don’t see any recent changes for these files.

> On 31 May 2022, at 16:24, Alexey Romanenko  wrote:
> 
> Hi everyone,
> 
> Jenkins job `beam_SeedJob` keeps failing starting from May 19th. The last 
> successful build [2] was 12 days ago.
> Does anyone know the reasons of this?
> 
> —
> Alexey
> 
> 
> [1] https://ci-beam.apache.org/job/beam_SeedJob/ 
> <https://ci-beam.apache.org/job/beam_SeedJob/>
> [2] https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/ 
> <https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/>
>   
> 



  1   2   3   4   5   6   >