Re: [PROPOSAL] Preparing for 2.47.0 Release

2023-04-18 Thread Ahmet Altay via dev
Thank you for the update Jack!

On Tue, Apr 18, 2023 at 11:23 AM Jack McCluskey 
wrote:

> Quick update for everyone, the initial release blockers on the 2.47.0
> milestone have been resolved and the RC1 commit has been tagged. I'll be
> working on getting RC1 artifacts built now.
>
> On Thu, Apr 13, 2023 at 12:22 PM Ahmet Altay  wrote:
>
>> Sounds good. Thank you. And if you need help please reach out.
>>
>> On Thu, Apr 13, 2023 at 6:29 AM Jack McCluskey 
>> wrote:
>>
>>> We're making good progress on finding and fixing bugs. Not quite to
>>> building an RC candidate yet, but so far nothing that seems to be a
>>> difficult fix.
>>>
>>> On Wed, Apr 12, 2023 at 8:10 PM Ahmet Altay  wrote:
>>>
 Jack, how is the release coming along?

 On Tue, Apr 4, 2023 at 12:23 PM Jack McCluskey via dev <
 dev@beam.apache.org> wrote:

> Hey everyone,
>
> I need a PMC member's help adding my pubkey to
> https://dist.apache.org/repos/dist/release/beam/KEYS as well as
> adding PyPI user jrmccluskey to the maintainers of the Apache Beam 
> package.
> These are the last steps I have to do to complete prep for the release.
>
> Thanks,
>
> Jack McCluskey
>
> On Wed, Mar 22, 2023 at 11:38 AM Jack McCluskey <
> jrmcclus...@google.com> wrote:
>
>> Hey all,
>>
>> The next (2.47.0) release branch cut is scheduled for April 5th,
>> 2023, according to
>> the release calendar [1].
>>
>> I will be performing this release. My plan is to cut the branch on
>> that date, and cherrypick release-blocking fixes afterwards, if any.
>>
>> Please help me make sure the release goes smoothly by:
>> - Making sure that any unresolved release blocking issues
>> for 2.47.0 should have their "Milestone" marked as "2.47.0 Release"
>> as soon as possible.
>> - Reviewing the current release blockers [2] and remove the
>> Milestone if they don't meet the criteria at [3].
>>
>> Let me know if you have any comments/objections/questions.
>>
>> Thanks,
>>
>> Jack McCluskey
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/10
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>> --
>>
>>
>> Jack McCluskey
>> SWE - DataPLS PLAT/ Dataflow ML
>> RDU
>> jrmcclus...@google.com
>>
>>
>>


Re: [PROPOSAL] Preparing for 2.47.0 Release

2023-04-18 Thread Jack McCluskey via dev
Quick update for everyone, the initial release blockers on the 2.47.0
milestone have been resolved and the RC1 commit has been tagged. I'll be
working on getting RC1 artifacts built now.

On Thu, Apr 13, 2023 at 12:22 PM Ahmet Altay  wrote:

> Sounds good. Thank you. And if you need help please reach out.
>
> On Thu, Apr 13, 2023 at 6:29 AM Jack McCluskey 
> wrote:
>
>> We're making good progress on finding and fixing bugs. Not quite to
>> building an RC candidate yet, but so far nothing that seems to be a
>> difficult fix.
>>
>> On Wed, Apr 12, 2023 at 8:10 PM Ahmet Altay  wrote:
>>
>>> Jack, how is the release coming along?
>>>
>>> On Tue, Apr 4, 2023 at 12:23 PM Jack McCluskey via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hey everyone,

 I need a PMC member's help adding my pubkey to
 https://dist.apache.org/repos/dist/release/beam/KEYS as well as adding
 PyPI user jrmccluskey to the maintainers of the Apache Beam package. These
 are the last steps I have to do to complete prep for the release.

 Thanks,

 Jack McCluskey

 On Wed, Mar 22, 2023 at 11:38 AM Jack McCluskey 
 wrote:

> Hey all,
>
> The next (2.47.0) release branch cut is scheduled for April 5th,
> 2023, according to
> the release calendar [1].
>
> I will be performing this release. My plan is to cut the branch on
> that date, and cherrypick release-blocking fixes afterwards, if any.
>
> Please help me make sure the release goes smoothly by:
> - Making sure that any unresolved release blocking issues
> for 2.47.0 should have their "Milestone" marked as "2.47.0 Release"
> as soon as possible.
> - Reviewing the current release blockers [2] and remove the Milestone
> if they don't meet the criteria at [3].
>
> Let me know if you have any comments/objections/questions.
>
> Thanks,
>
> Jack McCluskey
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
> [2] https://github.com/apache/beam/milestone/10
> [3] https://beam.apache.org/contribute/release-blocking/
>
> --
>
>
> Jack McCluskey
> SWE - DataPLS PLAT/ Dataflow ML
> RDU
> jrmcclus...@google.com
>
>
>


[RFC] RunInference Pre/Postprocessing and DLQ UX

2023-04-18 Thread Danny McCormick via dev
Hey everyone, I put up a small design doc proposing a user experience for
adding DLQ and pre/post processing map operation support to RunInference.
I'd appreciate any thoughts or comments.

https://docs.google.com/document/d/1hr1SaWraneB9dYSFyGA99JT44oKgGNhT70wz99lmdEU/edit?usp=sharing

Thanks,
Danny


Re: [DISCUSS] @Experimental, @Internal, @Stable, etc annotations

2023-04-18 Thread Robert Burke
While this thread is beginning to move off topic, I think any real Beam 3.0
effort largely should start with "what we know we're going to keep", and
what else a refined/simplified surface looks like for each SDK. But I'm
sure there's some known things to cut too.

And that's ignoring any real breaking changes that would be healthy to make
for Beam.

But critically, what are the concrete benefits to pipeline authors in such
a move. Otherwise it's self serving churn.



On Tue, Apr 18, 2023, 9:08 AM Alexey Romanenko 
wrote:

>
> On 17 Apr 2023, at 21:14, Robert Burke  wrote:
>
> +1 on how to iterate without a Beam 3.0
>
> Often that just means, write the new thing, "support both for a
> while",make it clear how to migrate to the new thing, and the next Major
> Version just drops everything that doesn't cut the mustard anymore.
>
>
> Exactly! If we are all agree with this process of
> adding/deprecating/removing new/old API then I think we need to add it into
> Beam documentation to make it clear for developers and users (if not yet).
>
> The only issue here is that we don’t do Major releases often (v2.0.0 is
> dated 2017-05-17). I think we even don’t have a public roadmap for that and
> we almost never discussed “what" Beam 3.x should be and, the most important
> question, “when” it will happen (sorry if I missed that).
>
> —
> Alexey
>
>
>
> On Mon, Apr 17, 2023, 11:54 AM Ahmet Altay via dev 
> wrote:
>
>> It sounds like there is agreement in eliminating the
>> experimental annotation. Should we stop using them in new code? Or should
>> we do a pass to remove those annotations?
>>
>> On Mon, Apr 17, 2023 at 11:24 AM Kenneth Knowles  wrote:
>>
>>>
>>>
>>> On Mon, Apr 17, 2023 at 9:34 AM Kerry Donny-Clark via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 to eliminating @Experimental as a Beam level annotation.
 I think the main point is that if no one pays attention to such
 annotations, then they are only noise and deliver negative value.

>>>
>>> Yes. Consider these two scenarios
>>>
>>> 1. We change an "experimental" API that is widely used. This causes a
>>> pain for many users. We would probably not do it, and we would catch it in
>>> code review.
>>> 2. We change a non-"experimental" API that is fairly new. This applies
>>> to many APIs, since we rarely remember to annotate new APIs. This causes
>>> just minor pain for just a few users. TBH I would be OK with this. Rigidity
>>> in rejecting such changes just means your first draft is your final draft.
>>> Try that in any other endeavor and see how it works for you :-)
>>>
>>> And it is worse than noise - there are some users who do pay attention
>>> to the annotations and are not using things even though they are super
>>> safe. That was the main reason I started this thread. The rest of my
>>> proposal was just to try to recover some flexibility, but it seems too hard
>>> and no immediate consensus on how/if we could manage it.
>>>
>>> Kenn
>>>
>>> PS I do agree with Kerry's PS and would love to have that discussion.
>>> Perhaps separately, since it will start from square one either way. Every
>>> time someone says "Beam 3.0" we should really be thinking "how can we
>>> iterate". One big breaking version change doesn't work.
>>>
>>
>> +1 - Thinking about "How can we iterate" would allow us to build
>> something users' want in shorter timelines.
>>
>>
>>>
>>>
>>>
>>> Kerry

 PS- Kenn says " the point about the culture of stagnation came from my
 recent experiences as code reviewer where there was some idea that we
 couldn't change things even when they were plainly wrong and the change was
 plainly a fix." This seems like a major point that deserves a more focused
 discussion.

 On Fri, Apr 14, 2023 at 5:47 PM Chamikara Jayalath via dev <
 dev@beam.apache.org> wrote:

> I think we've been using the Java Experimental tags in two ways.
>
> * New APIs
> * Any APIs that use specific features identified by pre-defined
> experimental Kind types defined in [1] (for example, I/O connectors APIs
> that use Beam Schemas).
>
> Removing the experimental tag has the effect of finalizing a number of
> APIs we've been reluctant to call stable (for example, Beam Schemas,
> portability, metrics related APIs). These APIs have been around for a long
> time and I don't see them changing so probably this is the right thing to
> do. But I just wanted to call it out.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/b9f27f9da2e63b564feecaeb593d7b12783192b0/sdks/java/core/src/main/java/org/apache/beam/sdk/annotations/Experimental.java#L48
>
> On Fri, Apr 14, 2023 at 1:26 PM Ahmet Altay via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Fri, Apr 14, 2023 at 1:15 PM Kenneth Knowles 
>> wrote:
>>
>>>
>>> Thanks for the discussion. Many good points. Probably just removing
>>> all the annota

Re: [DISCUSS] @Experimental, @Internal, @Stable, etc annotations

2023-04-18 Thread Alexey Romanenko

> On 17 Apr 2023, at 21:14, Robert Burke  wrote:
> 
> +1 on how to iterate without a Beam 3.0
> 
> Often that just means, write the new thing, "support both for a while",make 
> it clear how to migrate to the new thing, and the next Major Version just 
> drops everything that doesn't cut the mustard anymore.

Exactly! If we are all agree with this process of adding/deprecating/removing 
new/old API then I think we need to add it into Beam documentation to make it 
clear for developers and users (if not yet).

The only issue here is that we don’t do Major releases often (v2.0.0 is dated 
2017-05-17). I think we even don’t have a public roadmap for that and we almost 
never discussed “what" Beam 3.x should be and, the most important question, 
“when” it will happen (sorry if I missed that). 

—
Alexey

> 
> 
> On Mon, Apr 17, 2023, 11:54 AM Ahmet Altay via dev  > wrote:
>> It sounds like there is agreement in eliminating the experimental 
>> annotation. Should we stop using them in new code? Or should we do a pass to 
>> remove those annotations?
>> 
>> On Mon, Apr 17, 2023 at 11:24 AM Kenneth Knowles > > wrote:
>>> 
>>> 
>>> On Mon, Apr 17, 2023 at 9:34 AM Kerry Donny-Clark via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 +1 to eliminating @Experimental as a Beam level annotation.
 I think the main point is that if no one pays attention to such 
 annotations, then they are only noise and deliver negative value. 
>>> 
>>> Yes. Consider these two scenarios
>>> 
>>> 1. We change an "experimental" API that is widely used. This causes a pain 
>>> for many users. We would probably not do it, and we would catch it in code 
>>> review.
>>> 2. We change a non-"experimental" API that is fairly new. This applies to 
>>> many APIs, since we rarely remember to annotate new APIs. This causes just 
>>> minor pain for just a few users. TBH I would be OK with this. Rigidity in 
>>> rejecting such changes just means your first draft is your final draft. Try 
>>> that in any other endeavor and see how it works for you :-)
>>> 
>>> And it is worse than noise - there are some users who do pay attention to 
>>> the annotations and are not using things even though they are super safe. 
>>> That was the main reason I started this thread. The rest of my proposal was 
>>> just to try to recover some flexibility, but it seems too hard and no 
>>> immediate consensus on how/if we could manage it.
>>> 
>>> Kenn
>>> 
>>> PS I do agree with Kerry's PS and would love to have that discussion. 
>>> Perhaps separately, since it will start from square one either way. Every 
>>> time someone says "Beam 3.0" we should really be thinking "how can we 
>>> iterate". One big breaking version change doesn't work.
>> 
>> +1 - Thinking about "How can we iterate" would allow us to build something 
>> users' want in shorter timelines.
>>  
>>> 
>>> 
>>> 
 Kerry
 
 PS- Kenn says " the point about the culture of stagnation came from my 
 recent experiences as code reviewer where there was some idea that we 
 couldn't change things even when they were plainly wrong and the change 
 was plainly a fix." This seems like a major point that deserves a more 
 focused discussion.
 
 On Fri, Apr 14, 2023 at 5:47 PM Chamikara Jayalath via dev 
 mailto:dev@beam.apache.org>> wrote:
> I think we've been using the Java Experimental tags in two ways.
> 
> * New APIs
> * Any APIs that use specific features identified by pre-defined 
> experimental Kind types defined in [1] (for example, I/O connectors APIs 
> that use Beam Schemas). 
> 
> Removing the experimental tag has the effect of finalizing a number of 
> APIs we've been reluctant to call stable (for example, Beam Schemas, 
> portability, metrics related APIs). These APIs have been around for a 
> long time and I don't see them changing so probably this is the right 
> thing to do. But I just wanted to call it out.
> 
> Thanks,
> Cham
> 
> [1] 
> https://github.com/apache/beam/blob/b9f27f9da2e63b564feecaeb593d7b12783192b0/sdks/java/core/src/main/java/org/apache/beam/sdk/annotations/Experimental.java#L48
> 
> On Fri, Apr 14, 2023 at 1:26 PM Ahmet Altay via dev  > wrote:
>> 
>> 
>> On Fri, Apr 14, 2023 at 1:15 PM Kenneth Knowles > > wrote:
>>> 
>>> Thanks for the discussion. Many good points. Probably just removing all 
>>> the annotations is a noop to users, and will solve the "afraid to use 
>>> experimental features" problem.
>>> 
>>> Regarding stability, the capabilities of Java (and Python is much much 
>>> worse) make it infeasible to produce quality software with the rule 
>>> "once it is public it is frozen forever". But on the other hand, there 
>>> isn't much of a practical alternative. Most projects just make breaking

Beam High Priority Issue Report (28)

2023-04-18 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/26264 [Bug]: Elevated serialized size for 
BigQueryTableSource causing IllegalArgumentException in WorkerCustomSources
https://github.com/apache/beam/issues/26126 [Failing Test]: 
beam_PostCommit_XVR_Samza permared validatesCrossLanguageRunnerGoUsingJava 
TestDebeziumIO_BasicRead
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId