Re: [PROPOSAL] Preparing for 2.51.0 Release

2023-09-13 Thread Robert Burke
Thanks Kenn!


On Wed, Sep 13, 2023, 6:20 PM Kenneth Knowles  wrote:

> Hello Beam community!
>
> The next release (2.51.0) branch cut is scheduled for September 20, 2023,
> one week from today, according to the release calendar [1].
>
> I'd like to volunteer to perform this release. My plan is to cut the
> branch on that date, and cherrypick release-blocking fixes afterwards, if
> any.
>
> Please help me make sure the release goes smoothly by:
>
> - Making sure that any unresolved release blocking issues for 2.51.0
> should have their "Milestone" marked as "2.51.0 Release" as soon as
> possible.
>
> - Reviewing the current release blockers [2] and remove the Milestone if
> they don't meet the criteria at [3]. There are currently 12 release
> blockers.
>
> Let me know if you have any comments/objections/questions.
>
> Thanks,
>
> Kenn
>
> [1]
>
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
> [2] https://github.com/apache/beam/milestone/15
> [3] https://beam.apache.org/contribute/release-blocking/
>


[PROPOSAL] Preparing for 2.51.0 Release

2023-09-13 Thread Kenneth Knowles
Hello Beam community!

The next release (2.51.0) branch cut is scheduled for September 20, 2023,
one week from today, according to the release calendar [1].

I'd like to volunteer to perform this release. My plan is to cut the branch
on that date, and cherrypick release-blocking fixes afterwards, if any.

Please help me make sure the release goes smoothly by:

- Making sure that any unresolved release blocking issues for 2.51.0 should
have their "Milestone" marked as "2.51.0 Release" as soon as possible.

- Reviewing the current release blockers [2] and remove the Milestone if
they don't meet the criteria at [3]. There are currently 12 release
blockers.

Let me know if you have any comments/objections/questions.

Thanks,

Kenn

[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
[2] https://github.com/apache/beam/milestone/15
[3] https://beam.apache.org/contribute/release-blocking/


Re: Different Beam project launched

2023-09-13 Thread Kenneth Knowles
Thanks for bringing it up. We did the standard ASF process around name
collisions a few months ago.

Kenn

On Wed, Sep 13, 2023 at 2:46 PM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> https://github.com/slai-labs/get-beam
>
> This seems to overlap with our branding/messaging on ML.
> Kerry
>


Different Beam project launched

2023-09-13 Thread Kerry Donny-Clark via dev
https://github.com/slai-labs/get-beam

This seems to overlap with our branding/messaging on ML.
Kerry


Re: Beam ML Use Cases - Google Summer of Code 2023

2023-09-13 Thread Danny McCormick via dev
Thanks for all your hard work this summer Reeba! I've really enjoyed
getting to work closely with you, and I know that Beam and its users are
better off because of your contributions.

Thanks,
Danny

On Wed, Sep 13, 2023 at 1:01 PM XQ Hu via dev  wrote:

> The blog looks great! Thanks for doing this and I hope you have learned a
> lot! Thanks a lot to Danny for your support!
>
> On Wed, Sep 13, 2023 at 12:58 PM Reeba Qureshi 
> wrote:
>
>> Hi everyone
>>
>> I have completed Google Summer of Code 2023 with Apache Beam, where I
>> worked on developing real-world ML use cases using Beam. Thank you Danny
>> for your constant support! I wrote a blog summarizing my journey, available
>> here
>> 
>> .
>>
>> Here are the use cases I built during the summer:
>> 1. Batch Image Processing | GitHub
>> 
>> 2. Streaming Sentiment Analysis | GitHub
>> 
>> 3. Batch Speech Emotion Recognition | GitHub
>> 
>>
>> I had a great experience and look forward to contributing more.
>>
>> Thanks,
>> Reeba
>>
>


Re: Beam ML Use Cases - Google Summer of Code 2023

2023-09-13 Thread XQ Hu via dev
The blog looks great! Thanks for doing this and I hope you have learned a
lot! Thanks a lot to Danny for your support!

On Wed, Sep 13, 2023 at 12:58 PM Reeba Qureshi  wrote:

> Hi everyone
>
> I have completed Google Summer of Code 2023 with Apache Beam, where I
> worked on developing real-world ML use cases using Beam. Thank you Danny
> for your constant support! I wrote a blog summarizing my journey, available
> here
> 
> .
>
> Here are the use cases I built during the summer:
> 1. Batch Image Processing | GitHub
> 
> 2. Streaming Sentiment Analysis | GitHub
> 
> 3. Batch Speech Emotion Recognition | GitHub
> 
>
> I had a great experience and look forward to contributing more.
>
> Thanks,
> Reeba
>


Re: Beam ML Use Cases - Google Summer of Code 2023

2023-09-13 Thread Reeba Qureshi
Hi everyone

I have completed Google Summer of Code 2023 with Apache Beam, where I
worked on developing real-world ML use cases using Beam. Thank you Danny
for your constant support! I wrote a blog summarizing my journey, available
here

.

Here are the use cases I built during the summer:
1. Batch Image Processing | GitHub

2. Streaming Sentiment Analysis | GitHub

3. Batch Speech Emotion Recognition | GitHub


I had a great experience and look forward to contributing more.

Thanks,
Reeba


Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-13 Thread Alexey Romanenko
I agree with Cham on these two options. 

In the end, it would be great to have such functionality (error handling / DLQ) 
integrated into Beam core API, but it will require, for sure, some technical 
discussions and reviews before - so it will take more time. 

Though, to make it available for users soon as a part of Beam distribution, 
adding this as an extension looks very feasible for me.   

—
Alexey

> On 12 Sep 2023, at 19:44, Chamikara Jayalath via dev  
> wrote:
> 
> Thanks Mazlum, this sounds great. I think there are two ways we can proceed 
> if we decide to integrate the Asgarde library into Beam.
> 
> (1) Directly import the code into Beam without significant modifications 
> and/or a review (though we may add tests).
> 
> (2) Go through a design/code review to determine whether this is the best 
> approach for implementing error handling / DLQ in Beam transforms or whether 
> there are other alternatives/modifications to Asgarde we want to consider.
> 
> If we do (1) I prefer adding Asgarde as a separate Gradle module in Beam. We 
> can later integrate it into the core module after a design/code review.
> 
> Thank,
> Cham
> 
> 
> 
> On Tue, Sep 12, 2023 at 10:26 AM Mazlum TOSUN  > wrote:
>> Hello Austin and everyone,
>> 
>> I am open for discussion.
>> 
>> My first intention with Asgarde was to help the Beam community, because Dead 
>> Letter Queue is so important in Beam and all the data pipeline frameworks.
>> When I worked with Beam on production with my customers, we needed to catch 
>> errors with side outputs and dead letter queue.
>> 
>> This library really helped us to keep a less verbose code while applying all 
>> the error handling logic, that is error prone and verbose if it is repeated.
>> 
>> As Kennet said, my intention was to stay as close as possible to Beam, with 
>> a Wrapper and a Failure Monad on top of a PCollection, to handle all the 
>> code and complexity for try catch blocks and side output.
>> 
>> For the governance, even if I am the creator of this library, the most 
>> important isn't me but the community and to help the community.
>> If the best solution to help the community is including the library directly 
>> on Beam, we can go in this direction, with of course your reviews and 
>> recommendations.
>> 
>> Then the library will belong to the community and we will continue to 
>> improve it.
>> 
>> For the decision about the best place, I will comply with the majority.
>> 
>> Best regards,
>> 
>> Mazlum
>> 
>> On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett > > wrote:
>>> @Mazlum TOSUN  --  you and I have spoken a 
>>> few times about this.  it'd be good for you to comment here on list, on any 
>>> of your concerns with governance, and/or other thoughts.  Ex: if you think 
>>> contributing asgarde directly is the thing [ or perhaps expressing any 
>>> interest helping write/contribute the relevant functionality into beam ... 
>>> it is possible that by adding the actual functionality into beam - like 
>>> Kenn's mentioned 'other place' we could make asgarde as an separate add-on 
>>> obsolete ].  
>>> 
>>> 
>>> 
>>> On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles >> > wrote:
 For anyone who hasn't clicked over the Asgarde, my TL;DR description of it 
 is that it adds the "failure monad" aka "andThen" style error/result 
 handling on top of chaining of PCollections. So it is at a similar level 
 of abstraction of our basic transforms and generally useful for chaining 
 dead-letter side outputs. It is no more or less appropriate for the core 
 SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we 
 actually aspired to have a thin core with the accessories like that in 
 another place, then it should go to that other place.
 
 Kenn
 
 On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev 
 mailto:dev@beam.apache.org>> wrote:
> > until we *require* Asgard on a core transform, it shouldn't be in the 
> > main repo
> 
> I don't think this is necessarily true if it solves end user use cases. 
> If there is a specific transform that solves a specific use case, we 
> could include it in the transforms folder for end-users, even if it isn't 
> utilized in the I/Os at present. Hence the suggestion to take the most 
> promising transforms and propose adding them with documentation, apis and 
> rationale.
> 
> -Daniel
> 
> On Fri, Sep 8, 2023 at 11:20 AM Robert Burke  > wrote:
>> I would say until we *require* Asgard on a core transform, it shouldn't 
>> be in the main repo. 
>> 
>> Incorporating something before there's a need for it is premature 
>> abstraction. We can't do things because they *might* be useful. Let's 
>> see concrete places where they are useful, or we're already having 

Beam High Priority Issue Report (42)

2023-09-13 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28168 [Bug]: BigQuery Storage Write API 
does not write with no complaint
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121