Re: [VOTE] Release 2.41.0, release candidate #2

2022-08-19 Thread Ritesh Ghorse via dev
+1 (non-binding) validated Go SDK QuickStart on Direct and Dataflow Runner

On Fri, Aug 19, 2022 at 11:23 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> +1 (binding)
>
> Did the same validations as RC1.
>
> Thanks,
> Cham
>
> On Fri, Aug 19, 2022 at 4:36 AM Alexey Romanenko 
> wrote:
>
>> +1 (binding)
>>
>> I tested it with  https://github.com/Talend/beam-samples/ - no
>> byte_buddy issue for now.
>> (Java SDK v8 & v11, Spark 3 runner).
>>
>> ---
>> Alexey
>>
>> On 19 Aug 2022, at 10:40, Jan Lukavský  wrote:
>>
>> +1 (non-binding)
>>
>> Validated Java SDK PIpelines on Flink Runner.
>>
>>  Jan
>> On 8/18/22 22:31, Kiley Sok via dev wrote:
>>
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version
>> 2.41.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found.
>>
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org [2],
>> which is signed with the key with fingerprint
>> 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.41.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK
>> 1.8.0_312.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Validation sheet with a tab for 2.41.0 release to help with validation
>> [9].
>> * Docker images published to Docker Hub [10].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>
>> Thanks,
>> Release Manager
>>
>> [1] https://github.com/apache/beam/milestone/3
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.41.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1283/
>> [5] https://github.com/apache/beam/tree/v2.41.0-RC2
>> [6] https://github.com/apache/beam/pull/22706
>> [7] https://github.com/apache/beam-site/pull/633
>> [8] https://pypi.org/project/apache-beam/2.41.0rc2/
>> [9]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=331459080
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>
>>
>>


Re: [VOTE] Release 2.41.0, release candidate #2

2022-08-19 Thread Chamikara Jayalath via dev
+1 (binding)

Did the same validations as RC1.

Thanks,
Cham

On Fri, Aug 19, 2022 at 4:36 AM Alexey Romanenko 
wrote:

> +1 (binding)
>
> I tested it with  https://github.com/Talend/beam-samples/ - no byte_buddy
> issue for now.
> (Java SDK v8 & v11, Spark 3 runner).
>
> ---
> Alexey
>
> On 19 Aug 2022, at 10:40, Jan Lukavský  wrote:
>
> +1 (non-binding)
>
> Validated Java SDK PIpelines on Flink Runner.
>
>  Jan
> On 8/18/22 22:31, Kiley Sok via dev wrote:
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 2.41.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2],
> which is signed with the key with fingerprint
> 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.41.0-RC2" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK
> 1.8.0_312.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Validation sheet with a tab for 2.41.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> blog post at https://beam.apache.org/blog/validate-beam-release/.
>
> Thanks,
> Release Manager
>
> [1] https://github.com/apache/beam/milestone/3
> [2] https://dist.apache.org/repos/dist/dev/beam/2.41.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1283/
> [5] https://github.com/apache/beam/tree/v2.41.0-RC2
> [6] https://github.com/apache/beam/pull/22706
> [7] https://github.com/apache/beam-site/pull/633
> [8] https://pypi.org/project/apache-beam/2.41.0rc2/
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=331459080
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>
>
>


Re: Easy Multi-language via a SchemaTransform-aware Expansion Service

2022-08-19 Thread Chamikara Jayalath via dev
Hi All,

Thanks for the comments so far. Seems like we generally agree on this
proposal.

Please see https://github.com/apache/beam/pull/22802 for a prototype
implementation that adds the following.

* Support for dynamically discovering and registering SchemaTransforms in
the Java expansion service.
* Support for dynamically discovering registered SchemaTransforms from the
Python side.
* Support for using SchemaTransforms in Python pipelines.

Feel free to add more comments to the doc and/or the PR.

Thanks,
Cham







On Mon, Aug 8, 2022 at 9:34 PM Chamikara Jayalath 
wrote:

> I think the *DiscoverSchemaTransform()* RPC introduced in this proposal
> and the ability to easily deploy/use available *SchemaTransforms* using
> an expansion service essentially provide the tooling necessary for
> implementing such a service. Such a service could even startup expansion
> services to discover/list transforms available in given artifacts (for
> example, jar files).
>
> Thanks,
> Cham
>
> On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis  wrote:
>
>> I like that idea, sort of like Kafka’s Schema Service but for transforms?
>>
>> On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> This is a great idea. I would like to approach this from the
>>> perspective of making it easy to provide a catalog of well-defined
>>> transforms for use in expansion services from typical SDKs and also
>>> elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally
>>> everything about what a transform is (its config, documentation,
>>> expectations on inputs, etc.) can be specified programmatically in a
>>> way that's much easier to both author and consume than it is now.
>>>
>>> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev
>>>  wrote:
>>> >
>>> > Hi All,
>>> >
>>> > I believe we can make the multi-language pipelines offering [1] much
>>> easier to use by updating the expansion service to be fully aware of
>>> SchemaTransforms. Additionally this will make it easy to
>>> register/discover/use transforms defined in one SDK from all other SDKs.
>>> Specifically we could add the following features.
>>> >
>>> > Expansion service can be used to easily initialize and expand
>>> transforms without need for additional code.
>>> > Expansion service can be used to easily discover already registered
>>> transforms.
>>> > Pipeline SDKs can generate user-friendly stub-APIs based on transforms
>>> registered with an expansion service, eliminating the need to develop
>>> language-specific wrappers.
>>> >
>>> > Please see here for my proposal:
>>> https://s.apache.org/easy-multi-language
>>> >
>>> > Lemme know if you have any comments/questions/suggestions :)
>>> >
>>> > Thanks,
>>> > Cham
>>> >
>>> > [1]
>>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines
>>> >
>>>
>>


Re: Design Doc for Controlling Batching in RunInference

2022-08-19 Thread Andy Ye via dev
Thanks all for your feedback.

I've been looking into the batched DoFns, and will have a follow up on how
we can best interact with them.

On Mon, Aug 15, 2022 at 7:16 PM Robert Bradshaw  wrote:

> Thanks. I added some comments to the doc.
>
> I agree with Brian that it makes sense to figure out how this
> interacts with batched DoFns, as we'll want to migrate to that.
> (Perhaps they're already ready to migrate to as a first step?)
>
> On Fri, Aug 12, 2022 at 1:03 PM Brian Hulette via dev
>  wrote:
> >
> > Hi Andy,
> >
> > Thanks for writing this up! This seems like something that Batched DoFns
> could help with. Could we make a BatchConverter [1] that represents the
> necessary transformations here, and define RunInference as a Batched DoFn?
> Note that the Numpy BatchConverter already enables users to specify a batch
> dimension using a custom typehint, like NumpyArray[np.int64, (N, 10)] (the
> N identifies the batch dimension) [2]. I think we could do something
> similar, but with pytorch types. It's likely we'd need to define our own
> typehints though, I suspect pytorch typehints aren't already parameterized
> by size.
> >
> > Brian
> >
> >
> > [1]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
> > [2]
> https://github.com/apache/beam/blob/3173b503beaf30c4d32a4a39c709fd81e8161907/sdks/python/apache_beam/typehints/batch_test.py#L42
> >
> > On Fri, Aug 12, 2022 at 12:36 PM Andy Ye via dev 
> wrote:
> >>
> >> Hi everyone,
> >>
> >> I've written up a design doc [1] on controlling batching in
> RunInference. I'd appreciate any feedback. Thanks!
> >>
> >> Summary:
> >> Add a custom stacking function to RunInference to enable users to
> control how they want their data to be stacked. This addresses issues
> regarding data that have existing batching dimensions, or different sizes.
> >>
> >> Best,
> >> Andy
> >>
> >> [1]
> https://docs.google.com/document/d/1l40rOTOEqrQAkto3r_AYq8S_L06dDgoZu-4RLKAE6bo/edit#
>


Re: [VOTE] Release 2.41.0, release candidate #2

2022-08-19 Thread Alexey Romanenko
+1 (binding)

I tested it with  https://github.com/Talend/beam-samples/ 
 - no byte_buddy issue for now.
(Java SDK v8 & v11, Spark 3 runner).

---
Alexey

> On 19 Aug 2022, at 10:40, Jan Lukavský  wrote:
> 
> +1 (non-binding)
> 
> Validated Java SDK PIpelines on Flink Runner.
> 
>  Jan
> 
> On 8/18/22 22:31, Kiley Sok via dev wrote:
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version 2.41.0, 
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> 
>> Reviewers are encouraged to test their own use cases with the release 
>> candidate, and vote +1 if no issues are found.
>> 
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org 
>>  [2], which is signed with the key with fingerprint 
>> 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.41.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and 
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK 1.8.0_312.
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org  [2] and PyPI[8].
>> * Validation sheet with a tab for 2.41.0 release to help with validation [9].
>> * Docker images published to Docker Hub [10].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> For guidelines on how to try the release in your projects, check out our 
>> blog post at https://beam.apache.org/blog/validate-beam-release/ 
>> .
>> 
>> Thanks,
>> Release Manager
>> 
>> [1] https://github.com/apache/beam/milestone/3 
>> 
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.41.0/ 
>> 
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
>> 
>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1283/ 
>> 
>> [5] https://github.com/apache/beam/tree/v2.41.0-RC2 
>> 
>> [6] https://github.com/apache/beam/pull/22706 
>> 
>> [7] https://github.com/apache/beam-site/pull/633 
>> 
>> [8] https://pypi.org/project/apache-beam/2.41.0rc2/ 
>> 
>> [9] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=331459080
>>  
>> 
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image 
>> 



Beam High Priority Issue Report (72)

2022-08-19 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() 
stops forwarding change records and starts continuously throwing (large number) 
of Operation ongoing errors 
https://github.com/apache/beam/issues/22773 [Bug]: ElasticsearchIO.Write fails 
when calling outputWithTimestamp()
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22642 [Bug]: Dataflow fails to drain a 
job when using BigQuery (java sdk v.2.38)
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 

Re: [VOTE] Release 2.41.0, release candidate #2

2022-08-19 Thread Jan Lukavský

+1 (non-binding)

Validated Java SDK PIpelines on Flink Runner.

 Jan

On 8/18/22 22:31, Kiley Sok via dev wrote:

Hi everyone,
Please review and vote on the release candidate #1 for the version 
2.41.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release 
candidate, and vote +1 if no issues are found.


The complete staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org 
 [2], which is signed with the key with 
fingerprint 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.41.0-RC2" [5],
* website pull request listing the release [6], the blog post [6], and 
publishing the API reference manual [7].
* Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK 
1.8.0_312.
* Python artifacts are deployed along with the source release to the 
dist.apache.org  [2] and PyPI[8].
* Validation sheet with a tab for 2.41.0 release to help with 
validation [9].

* Docker images published to Docker Hub [10].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


For guidelines on how to try the release in your projects, check out 
our blog post at https://beam.apache.org/blog/validate-beam-release/.


Thanks,
Release Manager

[1] https://github.com/apache/beam/milestone/3
[2] https://dist.apache.org/repos/dist/dev/beam/2.41.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1283/
[5] https://github.com/apache/beam/tree/v2.41.0-RC2
[6] https://github.com/apache/beam/pull/22706
[7] https://github.com/apache/beam-site/pull/633
[8] https://pypi.org/project/apache-beam/2.41.0rc2/
[9] 
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=331459080
[10] https://hub.docker.com/search?q=apache%2Fbeam=image