[Announce] Beam 2.55.0 Release
We are happy to present the new 2.55.0 release of Beam. This release includes both improvements and new functionality. See https://beam.apache.org/get-started/downloads/ for this release. For more information on changes in 2.55.0, check out the detailed release notes at https://github.com/apache/beam/milestone/19 . - Highlights * The Python SDK will now include automatically generated wrappers for external Java transforms! (https://github.com/apache/beam/pull/29834) - I/Os * Added support for handling bad records to BigQueryIO ( https://github.com/apache/beam/pull/30081). * Full Support for Storage Read and Write APIs * Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported) * No Support for Extract or Streaming Inserts * Added support for handling bad records to PubSubIO ( https://github.com/apache/beam/pull/30372). * Support is not available for handling schema mismatches, and enabling error handling for writing to Pub/Sub topics with schemas is not recommended * `--enableBundling` pipeline option for BigQueryIO DIRECT_READ is replaced by `--enableStorageReadApiV2`. Both were considered experimental and subject to change (Java) (https://github.com/apache/beam/issues/26354). - New Features / Improvements * Allow writing clustered and not time-partitioned BigQuery tables (Java) ( https://github.com/apache/beam/pull/30094). * Redis cache support added to RequestResponseIO and Enrichment transform (Python) (https://github.com/apache/beam/pull/30307). * Merged `sdks/java/fn-execution` and `runners/core-construction-java` into the main SDK. These artifacts were never meant for users, but noting that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality. * Added Vertex AI Feature Store handler for Enrichment transform (Python) ( https://github.com/apache/beam/pull/30388). - Breaking Changes * Arrow version was bumped to 15.0.0 from 5.0.0 ( https://github.com/apache/beam/pull/30181). * Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes). * The issue stems from distroless containers lacking additional tools, which current custom container processes may rely on. * See https://beam.apache.org/documentation/runtime/environments/#from-scratch-go for instructions on building and using a custom container. * Python SDK has changed the default value for the `--max_cache_memory_usage_mb` pipeline option from 100 to 0. This option was first introduced in the 2.52.0 SDK version. This change restores the behavior of the 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. ( https://github.com/apache/beam/issues/30360). - Deprecations * N/A - Bug fixes * Fixed `SpannerIO.readChangeStream` to support propagating credentials from pipeline options to the `getDialect` calls for authenticating with Spanner (Java) ( https://github.com/apache/beam/pull/30361). * Reduced the number of HTTP requests in GCSIO function calls (Python) ( https://github.com/apache/beam/pull/30205). - Security Fixes * Go SDK base container image moved to distroless/base-nossl-debian12, reducing vulnerable container surface to kernel and glibc ( https://github.com/apache/beam/pull/30011). - Known Issues * In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 ( https://github.com/apache/beam/pull/30679). - List of Contributors According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors! Ahmed Abualsaud Anand Inguva Andrew Crites Andrey Devyatkin Arun Pandian Arvind Ram Chamikara Jayalath Chris Gray Claire McGinty Damon Douglas Dan Ellis Danny McCormick Daria Bezkorovaina Dima I Edward Cui Ferran Fernández Garrido GStravinsky Jan Lukavský Jason Mitchell JayajP Jeff Kinard Jeffrey Kinard Kenneth Knowles Mattie Fu Michel Davit Oleh Borysevych Ritesh Ghorse Ritesh Tarway Robert Bradshaw Robert Burke Sam Whittle Scott Strong Shunping Huang Steven van Rossum Svetak Sundhar Talat UYARER Ukjae Jeong (Jay) Vitaly Terentyev Vlado Djerek Yi Hu akashorabek case-k clmccart dengwe1 dhruvdua hardshah johnjcasey liferoad martin trieu tvalentyn - Release Manager -- Yi Hu, (he/him/his) Software Engineer
Beam High Priority Issue Report (55)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/30737 [Failing Test]: Playground PreCommit failing goLint https://github.com/apache/beam/issues/30644 The Inference Python Benchmarks Dataflow job is flaky https://github.com/apache/beam/issues/30612 The Playground CI Nightly job is flaky https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark Dataflow job is flaky https://github.com/apache/beam/issues/30530 The LoadTests Java GBK Smoke job is flaky https://github.com/apache/beam/issues/30529 The PostCommit Java Sickbay job is flaky https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance Tests job is flaky https://github.com/apache/beam/issues/30526 The PerformanceTests xlang KafkaIO Python job is flaky https://github.com/apache/beam/issues/30525 The PostCommit Python ValidatesContainer Dataflow With RC job is flaky https://github.com/apache/beam/issues/30521 The LoadTests Go Combine Flink Batch job is flaky https://github.com/apache/beam/issues/30520 The LoadTests Python Combine Flink Streaming job is flaky https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava Dataflow job is flaky https://github.com/apache/beam/issues/30517 The PostCommit XVR Direct job is flaky https://github.com/apache/beam/issues/30513 The PostCommit Python job is flaky https://github.com/apache/beam/issues/30511 The LoadTests Python Smoke job is flaky https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch job is flaky https://github.com/apache/beam/issues/30506 The TypeScript Tests job is flaky https://github.com/apache/beam/issues/30505 The PostRelease Nightly Snapshot job is flaky https://github.com/apache/beam/issues/30504 The LoadTests Python Combine Dataflow Streaming job is flaky https://github.com/apache/beam/issues/30503 The PostCommit Java ValidatesRunner Flink Java11 job is flaky https://github.com/apache/beam/issues/30502 The LoadTests Go CoGBK Flink Batch job is flaky https://github.com/apache/beam/issues/30498 [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for large Kafka topic https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may cause the pipeline to get stuck indefinitely https://github.com/apache/beam/issues/29902 [Bug]: Messages are not ACK on Pubsub starting Beam 2.52.0 on Flink Runner in detached mode https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness doesn't update user counters in OnTimer callback functions https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader provided by apache beam does not pick the event time for watermarking https://github.com/apache/beam/issues/28383 [Failing Test]: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric https://github.com/apache/beam/issues/28326 Bug: apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]