Beam High Priority Issue Report (53)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/31346 The PreCommit Java Debezium IO Direct job is flaky https://github.com/apache/beam/issues/31297 [Failing Test]: PreCommit YAML Xlang Direct fails on GHA https://github.com/apache/beam/issues/31254 [Failing Test]: Onnx inference unit tests are failing. https://github.com/apache/beam/issues/31122 The PostCommit Go VR Flink job is flaky https://github.com/apache/beam/issues/30757 [Bug]: Beam Playground scio examples cannot run https://github.com/apache/beam/issues/30737 [Failing Test]: Playground PreCommit failing goLint https://github.com/apache/beam/issues/30612 The Playground CI Nightly job is flaky https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark Dataflow job is flaky https://github.com/apache/beam/issues/30529 The PostCommit Java Sickbay job is flaky https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance Tests job is flaky https://github.com/apache/beam/issues/30526 The PerformanceTests xlang KafkaIO Python job is flaky https://github.com/apache/beam/issues/30525 The PostCommit Python ValidatesContainer Dataflow With RC job is flaky https://github.com/apache/beam/issues/30521 The LoadTests Go Combine Flink Batch job is flaky https://github.com/apache/beam/issues/30520 The LoadTests Python Combine Flink Streaming job is flaky https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava Dataflow job is flaky https://github.com/apache/beam/issues/30517 The PostCommit XVR Direct job is flaky https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch job is flaky https://github.com/apache/beam/issues/30506 The TypeScript Tests job is flaky https://github.com/apache/beam/issues/30503 The PostCommit Java ValidatesRunner Flink Java11 job is flaky https://github.com/apache/beam/issues/30502 The LoadTests Go CoGBK Flink Batch job is flaky https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for large Kafka topic https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may cause the pipeline to get stuck indefinitely https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness doesn't update user counters in OnTimer callback functions https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader provided by apache beam does not pick the event time for watermarking https://github.com/apache/beam/issues/28383 [Failing Test]: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric https://github.com/apache/beam/issues/28326 Bug: apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.c
default_sdk_harness_log_level multi-language support
Hi! I am trying to adjust the log level for a Beam YAML pipeline that uses Kafka. Since Kafka is a multi-language transform, default_sdk_harness_log_level is not currently supported, which means I can't limit the logs (Kafka can be a bit too verbose sometimes). This could lead to increased logging costs if you use a cloud service such as Dataflow to run the pipeline. Is there any initiative at the moment to support default_sdk_harness_log_level in multi-language transforms? Maybe some workaround? I suppose that in terms of Beam YAML, we could pass these options to the schema providers somehow. However, I don't think that solution is ideal, as it would not address the issue for other people using multi-language transforms outside of Beam YAML. Thanks! Ferran