[Proposal] State and Timer Composites (Go SDK)
Hi everyone, I have a proposal of an approach we could take for the Go SDK WRT State and Timers, that might be of interest. No changes are required to the FnAPI, as this is an SDK level proposal. Proposal w/ example: https://github.com/apache/beam/issues/25894 The short version is to enable users (or beam contributors) to produce better abstractions around state and timers, instead of requiring direct use of the primitives. Higher level components would permit easier re-use of common patterns that would otherwise require manual replication in a user's DoFn. This is current Go SDK specific, as we're finishing off being able have timer support there, and would require a small non-breaking change on how state is handled to enable. Please take a look, and let me know what you think (here or on the issue). Robert Burke Beam Go Busybody
DRAFT - Apache Beam Board Report - March 2023
Hi all, I'm a bit late on notice here, but the next Beam board report is due ASAP. If you can add any notes in the next 24 hours or so that would be great. Please help me to draft it at https://s.apache.org/beam-draft-report-2023-03. I've opened edit access to anyone with the link to minimize friction of drafting. Ideas: - highlights from CHANGES.md - interesting technical discussions - integrations with other projects - community events - major user facing addition/deprecation Past reports are at https://whimsy.apache.org/board/minutes/Beam.html for examples. I will edit the final version from everyone's suggestions. Thanks, Kenn
Beam High Priority Issue Report (35)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/25860 [Bug]: ValueError: Invalid DisplayDataItem when using AsSingleton for side input. https://github.com/apache/beam/issues/25675 [Bug]: Reenable GroupIntoBatchesTest.testWithShardedKeyInGlobalWindow: causes dataflow suite to be permared https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to lock gradle https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently skips most of records without job fail https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22115 [Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output to Failed Inserts PCollection https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21104 Flaky: apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK harness startup https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful https://github.com/apache/beam/issues/19465 Explore possibilities to lower in-use IP address quota footprint. P1 Issues with no update in the last week: https://github.com/apache/beam/issues/25669 [Bug]: Different orderings of SchemaAwareExternalTransform() kwargs may result in misplaced arguments https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true for unequal rows https://github.com/apache/beam/issues/23848 Support for Python 3.11 https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22969 Discrepancy in behavior of `DoFn.process()` when `yield` is combined with `return` statement, or vice versa https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/21645 beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId