[Announce] Beam 2.55.0 Release

2024-03-26 Thread Yi Hu via dev
We are happy to present the new 2.55.0 release of Beam.
This release includes both improvements and new functionality.
See https://beam.apache.org/get-started/downloads/ for this release.

For more information on changes in 2.55.0, check out the detailed release
notes at https://github.com/apache/beam/milestone/19 .

- Highlights

* The Python SDK will now include automatically generated wrappers for
external Java transforms! (https://github.com/apache/beam/pull/29834)

- I/Os

* Added support for handling bad records to BigQueryIO (
https://github.com/apache/beam/pull/30081).
  * Full Support for Storage Read and Write APIs
  * Partial Support for File Loads (Failures writing to files supported,
failures loading files to BQ unsupported)
  * No Support for Extract or Streaming Inserts
* Added support for handling bad records to PubSubIO (
https://github.com/apache/beam/pull/30372).
  * Support is not available for handling schema mismatches, and enabling
error handling for writing to Pub/Sub topics with schemas is not recommended
* `--enableBundling` pipeline option for BigQueryIO DIRECT_READ is replaced
by `--enableStorageReadApiV2`. Both were considered experimental and
subject to change (Java) (https://github.com/apache/beam/issues/26354).

- New Features / Improvements

* Allow writing clustered and not time-partitioned BigQuery tables (Java) (
https://github.com/apache/beam/pull/30094).
* Redis cache support added to RequestResponseIO and Enrichment transform
(Python) (https://github.com/apache/beam/pull/30307).
* Merged `sdks/java/fn-execution` and `runners/core-construction-java` into
the main SDK. These artifacts were never meant for users, but noting
  that they no longer exist. These are steps to bring portability into the
core SDK alongside all other core functionality.
* Added Vertex AI Feature Store handler for Enrichment transform (Python) (
https://github.com/apache/beam/pull/30388).

- Breaking Changes

* Arrow version was bumped to 15.0.0 from 5.0.0 (
https://github.com/apache/beam/pull/30181).
* Go SDK users who build custom worker containers may run into issues with
the move to distroless containers as a base (see Security Fixes).
  * The issue stems from distroless containers lacking additional tools,
which current custom container processes may rely on.
  * See
https://beam.apache.org/documentation/runtime/environments/#from-scratch-go
for instructions on building and using a custom container.
* Python SDK has changed the default value for the
`--max_cache_memory_usage_mb` pipeline option from 100 to 0. This option
was first introduced in the 2.52.0 SDK version. This change restores the
behavior of the 2.51.0 SDK, which does not use the state cache. If your
pipeline uses iterable side inputs views, consider increasing the cache
size by setting the option manually. (
https://github.com/apache/beam/issues/30360).

- Deprecations

* N/A

- Bug fixes

* Fixed `SpannerIO.readChangeStream` to support propagating credentials
from pipeline options
  to the `getDialect` calls for authenticating with Spanner (Java) (
https://github.com/apache/beam/pull/30361).
* Reduced the number of HTTP requests in GCSIO function calls (Python) (
https://github.com/apache/beam/pull/30205).

- Security Fixes

* Go SDK base container image moved to distroless/base-nossl-debian12,
reducing vulnerable container surface to kernel and glibc (
https://github.com/apache/beam/pull/30011).

- Known Issues

* In Python pipelines, when shutting down inactive bundle processors,
shutdown logic can overaggressively hold the lock, blocking acceptance of
new work. Symptoms of this issue include slowness or stuckness in
long-running jobs. Fixed in 2.56.0 (
https://github.com/apache/beam/pull/30679).

- List of Contributors

According to git shortlog, the following people contributed to the
{$RELEASE_VERSION} release. Thank you to all contributors!

Ahmed Abualsaud

Anand Inguva

Andrew Crites

Andrey Devyatkin

Arun Pandian

Arvind Ram

Chamikara Jayalath

Chris Gray

Claire McGinty

Damon Douglas

Dan Ellis

Danny McCormick

Daria Bezkorovaina

Dima I

Edward Cui

Ferran Fernández Garrido

GStravinsky

Jan Lukavský

Jason Mitchell

JayajP

Jeff Kinard

Jeffrey Kinard

Kenneth Knowles

Mattie Fu

Michel Davit

Oleh Borysevych

Ritesh Ghorse

Ritesh Tarway

Robert Bradshaw

Robert Burke

Sam Whittle

Scott Strong

Shunping Huang

Steven van Rossum

Svetak Sundhar

Talat UYARER

Ukjae Jeong (Jay)

Vitaly Terentyev

Vlado Djerek

Yi Hu

akashorabek

case-k

clmccart

dengwe1

dhruvdua

hardshah

johnjcasey

liferoad

martin trieu

tvalentyn

-


Release Manager

-- 

Yi Hu, (he/him/his)

Software Engineer


Beam High Priority Issue Report (55)

2024-03-26 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/30737 [Failing Test]: Playground 
PreCommit failing goLint
https://github.com/apache/beam/issues/30644 The Inference Python Benchmarks 
Dataflow job is flaky
https://github.com/apache/beam/issues/30612 The Playground CI Nightly job is 
flaky
https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark 
Dataflow job is flaky
https://github.com/apache/beam/issues/30530 The LoadTests Java GBK Smoke job is 
flaky
https://github.com/apache/beam/issues/30529 The PostCommit Java Sickbay job is 
flaky
https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance 
Tests job is flaky
https://github.com/apache/beam/issues/30526 The PerformanceTests xlang KafkaIO 
Python job is flaky
https://github.com/apache/beam/issues/30525 The PostCommit Python 
ValidatesContainer Dataflow With RC job is flaky
https://github.com/apache/beam/issues/30521 The LoadTests Go Combine Flink 
Batch job is flaky
https://github.com/apache/beam/issues/30520 The LoadTests Python Combine Flink 
Streaming job is flaky
https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava 
Dataflow job is flaky
https://github.com/apache/beam/issues/30517 The PostCommit XVR Direct job is 
flaky
https://github.com/apache/beam/issues/30513 The PostCommit Python job is flaky
https://github.com/apache/beam/issues/30511 The LoadTests Python Smoke job is 
flaky
https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch 
job is flaky
https://github.com/apache/beam/issues/30506 The TypeScript Tests job is flaky
https://github.com/apache/beam/issues/30505 The PostRelease Nightly Snapshot 
job is flaky
https://github.com/apache/beam/issues/30504 The LoadTests Python Combine 
Dataflow Streaming job is flaky
https://github.com/apache/beam/issues/30503 The PostCommit Java ValidatesRunner 
Flink Java11 job is flaky
https://github.com/apache/beam/issues/30502 The LoadTests Go CoGBK Flink Batch 
job is flaky
https://github.com/apache/beam/issues/30498 [Bug]: Beam Sql is ignoring aliases 
fields in some situations which causes to huge data loss
https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for 
large Kafka topic
https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may 
cause the pipeline to get stuck indefinitely
https://github.com/apache/beam/issues/29902 [Bug]: Messages are not ACK on 
Pubsub starting Beam 2.52.0 on Flink Runner in detached mode
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]