Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-14 Thread Valentyn Tymofieiev via dev
+1 (binding).

Tested Python SDK on a batch and a streaming pipeline. Verified that the
memory leak[1] is no longer happening and pyarrow hotfix is applied. Sent
an update to CHANGES.MD to call out both.

Thanks for doing the release and patience with all the RCs.

[1] https://github.com/apache/beam/issues/28246
[2] https://github.com/apache/beam/pull/29435

On Tue, Nov 14, 2023 at 1:27 PM Bruno Volpato via dev 
wrote:

> +1 (non-binding).
>
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
> (Java SDK 11, Dataflow runner).
>
> Thanks Danny!
>
> On Mon, Nov 13, 2023 at 6:07 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #5 for the version
>> 2.52.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>> count towards the final vote, but votes from all community members is
>> encouraged and helpful for finding regressions; you can either test your
>> own use cases or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which includes:
>>
>>- GitHub Release notes [1]
>>- the official Apache source release to be deployed to dist.apache.org 
>> [2],
>>which is signed with the key with fingerprint D20316F712213422 [3]
>>- all artifacts to be deployed to the Maven Central Repository [4]
>>- source code tag "v2.52.0-RC5" [5]
>>- website pull request listing the release [6], the blog post [6],
>>and publishing the API reference manual [7]
>>- Python artifacts are deployed along with the source release to the
>>dist.apache.org [2] and PyPI[8].
>>- Go artifacts and documentation are available at pkg.go.dev [9]
>>- Validation sheet with a tab for 2.52.0 release to help with
>>validation [10]
>>- Docker images published to Docker Hub [11]
>>- PR to run tests against release branch [12]
>>
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>
>> Thanks,
>> Danny
>>
>> [1] https://github.com/apache/beam/milestone/16
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1363/
>> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
>> [6] https://github.com/apache/beam/pull/29331
>> [7] https://github.com/apache/beam-site/pull/655
>> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/29418
>>
>


Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-14 Thread Bruno Volpato via dev
+1 (non-binding).

Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates (Java
SDK 11, Dataflow runner).

Thanks Danny!

On Mon, Nov 13, 2023 at 6:07 PM Danny McCormick via dev 
wrote:

> Hi everyone,
> Please review and vote on the release candidate #5 for the version 2.52.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases or use cases from the validation sheet [10].
>
> The complete staging area is available for your review, which includes:
>
>- GitHub Release notes [1]
>- the official Apache source release to be deployed to dist.apache.org [2],
>which is signed with the key with fingerprint D20316F712213422 [3]
>- all artifacts to be deployed to the Maven Central Repository [4]
>- source code tag "v2.52.0-RC5" [5]
>- website pull request listing the release [6], the blog post [6], and
>publishing the API reference manual [7]
>- Python artifacts are deployed along with the source release to the
>dist.apache.org [2] and PyPI[8].
>- Go artifacts and documentation are available at pkg.go.dev [9]
>- Validation sheet with a tab for 2.52.0 release to help with
>validation [10]
>- Docker images published to Docker Hub [11]
>- PR to run tests against release branch [12]
>
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> blog post at https://beam.apache.org/blog/validate-beam-release/.
>
> Thanks,
> Danny
>
> [1] https://github.com/apache/beam/milestone/16
> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1363/
> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
> [6] https://github.com/apache/beam/pull/29331
> [7] https://github.com/apache/beam-site/pull/655
> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> [12] https://github.com/apache/beam/pull/29418
>


Re: Upgrading Avro dependencies

2023-11-14 Thread Alexey Romanenko
Thanks! Please, let me know if you need any help on this.

—
Alexey

> On 14 Nov 2023, at 17:52, John Casey  wrote:
> 
> The vulnerability said to upgrade to 1.11.3, so I think that would be my 
> starting point.
> 
> 
> On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko  > wrote:
>> 
>> 
>>> On 10 Nov 2023, at 19:23, John Casey >> > wrote:
>>> 
>>> I guess I'm a bit confused as to why specifically generateTestAvroJava 
>>> seems to use the wrong version. I see our version specific generated code, 
>>> but this action appears to be inherited from the plugin, and is configured 
>>> with whichever avro version is provided. Given that I tried to just change 
>>> to 1.11.3, I'm confused as to why its generating invalid java files for the 
>>> provided avro version.
>>> 
>>> Unlike the classes generated out of the JavaExec you referenced, this 
>>> appears to only generate one version of the files.
>> 
>> It was supposed to generate files with a specific Avro version every time to 
>> run the same tests again this specific Avro version. 
>> 
>>> It may be that we don't need this action, but it still seems to run, as we 
>>> depend on it in the applyAvroNature() action.
>> 
>> I started to think if we really still need this action.
>> 
>>> We could remove this entirely. The java exec only generates versions for 
>>> pre-configured test versions anyways
>> 
>> Right. The point is in how many places in Beam we need to generate these 
>> files and which version(s) of Avro to use?
>> 
>> —
>> Alexey
>> 
>>> 
>>> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko >> > wrote:
 Hi John,
 
 This old Avro version in Beam is a very long story. Briefly, since 
 initially it was toughly integrated into Java SDK “core” module then it 
 was not possible to upgrade an Avro version without breaking changes for 
 users (because of some Avro incompatible changes, as you have noticed 
 before). So, we decided to extract Avro-related classes from Beam “core” 
 to a dedicated Avro extension [2] that supports and actually is tested 
 with different Avro versions. More details on this work are here [1]
 
 Regarding auto-generated classes. Initially, we used a Gradle plugin for 
 that but it’s limited with only one Avro version per instance of this 
 plugin, so it was not possible to generate these classes with different 
 Avro versions. So, we do this with a special Gradle task (“JavaExec") that 
 executes “org.apache.avro.tool.Main” and generate Avro classes per every 
 tested Avro version [3].
 
 We still keep an old Avro version 1.8.2. as a default dependency version 
 but it will be overwritten if users have a more recent one as a project 
 dependency in their classpath.
 
 I think we need to completely remove Avro Gradle plugin (use “JavaExec” 
 task to generate Avro classes with a provided Avro version instead) and 
 update the default Avro version to the more recent one since now it’s not 
 part of Java “core”.
 
 Any thoughts?
 
 —
 Alexey
  
 
 [1] https://github.com/apache/beam/issues/24292
 [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
 [3] 
 https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
 
 
 
> On 10 Nov 2023, at 18:05, John Casey via dev  > wrote:
> 
> Hi All,
> 
> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to 
> upgrade to avro 1.11.3.
> 
> Unfortunately, it seems that our auto-generated Avro test classes aren't 
> being generated properly with this new version. I've updated our avro 
> generation plugin as well, but for whatever reason, it seems that the 
> generated AvroTest file is being generated with references to classes 
> that did exist in 1.8.2, but no longer exist in 1.11.3.
> 
> It seems like our autogeneration is being run with the wrong avro 
> version, but I can't seem to find where that would be configured.
> 
> Here is the PR with my changes so far: 
> https://github.com/apache/beam/pull/29390
> 
> Is anyone familiar with what might be misconfigured here?
> 
> John
 
>> 



Re: Hiding logging for beam playground examples

2023-11-14 Thread Robert Bradshaw via dev
+1 to at least setting the log level to higher than info. Some runner
logging (e.g. job started/done) may be useful.

On Tue, Nov 14, 2023 at 9:37 AM Joey Tran  wrote:
>
> Hi all,
>
> I just had a workshop to demo beam for people at my company and there was a 
> bit of confusion about whether the beam python playground examples were even 
> working and it turned out they just got confused by all the runner logging 
> that is output.
>
> Is this worth keeping? It seems like it'd be a common source of confusion for 
> new users
>
> Cheers,
> Joey


Hiding logging for beam playground examples

2023-11-14 Thread Joey Tran
Hi all,

I just had a workshop to demo beam for people at my company and there was a
bit of confusion about whether the beam python playground examples were
even working and it turned out they just got confused by all the runner
logging that is output.

Is this worth keeping? It seems like it'd be a common source of confusion
for new users

Cheers,
Joey


Re: The Current State of Beam Python Type Hinting

2023-11-14 Thread Robert Bradshaw via dev
Thanks for writing this up! Added some comments to the doc itself.

On Mon, Nov 13, 2023 at 11:01 PM Johanna Öjeling via dev <
dev@beam.apache.org> wrote:

> Thanks - well written! Interesting with the Any type, I learned something
> new. Added a comment.
>
> Johanna
>
> On Mon, Nov 13, 2023 at 6:02 PM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> Hey everyone,
>>
>> I put together a small doc explaining how Beam Python type hinting
>> works + where the module needs to go in the future with changes to Python
>> itself. This is over at
>> https://s.apache.org/beam-python-type-hinting-overview and I'll be
>> putting it into a few places for discoverability as well.
>>
>> Thanks,
>>
>> Jack McCluskey
>>
>> --
>>
>>
>> Jack McCluskey
>> SWE - DataPLS PLAT/ Dataflow ML
>> RDU
>> jrmcclus...@google.com
>>
>>
>>


Re: Upgrading Avro dependencies

2023-11-14 Thread John Casey via dev
The vulnerability said to upgrade to 1.11.3, so I think that would be my
starting point.


On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko 
wrote:

>
>
> On 10 Nov 2023, at 19:23, John Casey  wrote:
>
> I guess I'm a bit confused as to why specifically generateTestAvroJava
> seems to use the wrong version. I see our version specific generated code,
> but this action appears to be inherited from the plugin, and is configured
> with whichever avro version is provided. Given that I tried to just change
> to 1.11.3, I'm confused as to why its generating invalid java files for the
> provided avro version.
>
> Unlike the classes generated out of the JavaExec you referenced, this
> appears to only generate one version of the files.
>
>
> It was supposed to generate files with a specific Avro version every time
> to run the same tests again this specific Avro version.
>
> It may be that we don't need this action, but it still seems to run, as we
> depend on it in the applyAvroNature() action.
>
>
> I started to think if we really still need this action.
>
> We could remove this entirely. The java exec only generates versions for
> pre-configured test versions anyways
>
>
> Right. The point is in how many places in Beam we need to generate these
> files and which version(s) of Avro to use?
>
> —
> Alexey
>
>
> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Hi John,
>>
>> This old Avro version in Beam is a very long story. Briefly, since
>> initially it was toughly integrated into Java SDK “core” module then it was
>> not possible to upgrade an Avro version without breaking changes for users
>> (because of some Avro incompatible changes, as you have noticed before).
>> So, we decided to extract Avro-related classes from Beam “core” to a
>> dedicated Avro extension [2] that supports and actually is tested with
>> different Avro versions. More details on this work are here [1]
>>
>> Regarding auto-generated classes. Initially, we used a Gradle plugin for
>> that but it’s limited with only one Avro version per instance of this
>> plugin, so it was not possible to generate these classes with different
>> Avro versions. So, we do this with a special Gradle task (“JavaExec") that
>> executes “org.apache.avro.tool.Main” and generate Avro classes per every
>> tested Avro version [3].
>>
>> We still keep an old Avro version 1.8.2. as a default dependency version
>> but it will be overwritten if users have a more recent one as a project
>> dependency in their classpath.
>>
>> I think we need to completely remove Avro Gradle plugin (use “JavaExec”
>> task to generate Avro classes with a provided Avro version instead) and
>> update the default Avro version to the more recent one since now it’s not
>> part of Java “core”.
>>
>> Any thoughts?
>>
>> —
>> Alexey
>>
>>
>> [1] https://github.com/apache/beam/issues/24292
>> [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
>> [3]
>> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>>
>>
>>
>> On 10 Nov 2023, at 18:05, John Casey via dev  wrote:
>>
>> Hi All,
>>
>> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying
>> to upgrade to avro 1.11.3.
>>
>> Unfortunately, it seems that our auto-generated Avro test classes aren't
>> being generated properly with this new version. I've updated our avro
>> generation plugin as well, but for whatever reason, it seems that the
>> generated AvroTest file is being generated with references to classes that
>> did exist in 1.8.2, but no longer exist in 1.11.3.
>>
>> It seems like our autogeneration is being run with the wrong avro
>> version, but I can't seem to find where that would be configured.
>>
>> Here is the PR with my changes so far:
>> https://github.com/apache/beam/pull/29390
>>
>> Is anyone familiar with what might be misconfigured here?
>>
>> John
>>
>>
>>
>


Beam High Priority Issue Report (47)

2023-11-14 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/29413 [Bug]: Can not use Avro over 1.8.2 
with Beam 2.52.0
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/29076 [Failing Test]: Python ARM 
PostCommit failing after #28385
https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github 
actions tests are failing due to update of pip 
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get 
stuck for large jobs due to client dead lock
https://github.com/apache/beam/issues/28410 Support new versions of pyarrow in 
apache-beam
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data