Re: [Feature Proposal] Add ARM Support to Beam SDK Container Images
Thanks Celeste! I left a few comments. Overall I like the proposal, but I think that the open question "If Beam SDK containers are still released by the release manager, how should we integrate the multiarch containers into the current Beam container release process?" needs to be answered before I can be fully +1 on the proposal. Ideally this shouldn't create any special work for release managers (other than waiting a bit longer for the docker publish steps to finish). Thanks, Danny On Tue, Jul 18, 2023 at 6:59 PM Valentyn Tymofieiev wrote: > Hi Celeste, > > Thanks for the proposal and researching the options. Using multi-arch > images seems like a good way to reduce the complexity associated with > correctly selecting the architecture on the runner. It sounds like there > may be implications for release process, which future release managers may > need to be aware of, and there might be an increase in some test suites > time now once we build ARM images. > > Left a few comments on the doc and happy to help with PR review when it is > ready. > > bcc'ing a few folks who might have feedback or to whom this proposal might > be of interest. > > Valentyn > > > > On Tue, Jul 18, 2023 at 3:12 PM Celeste Zeng > wrote: > >> Hi everyone, >> >> My name is Celeste. I work for the GCP Dataflow team and I am trying to >> add ARM support to Beam SDK container images. The ultimate goal is to make >> the released Beam SDK container images become multi-arch images, which >> support both x86 and ARM. I compiled the following doc to include the >> feature overview, my proposed implementation plan, as well as testing plan. >> And I appreciate any feedback! >> >> >> https://docs.google.com/document/d/1ikbEJNsFH1D9HqiMqiVyyMlNpDgSqxXK22nUoetzW6I/edit?usp=sharing >> >> Also, please refer to the pull request to see proposed changes: >> https://github.com/apache/beam/pull/27311 >> >> Thanks a lot! >> >> Sincerely, >> Celeste Zeng >> celestezen...@gmail.com >> >
Re: [Feature Proposal] Add ARM Support to Beam SDK Container Images
Thanks. Left a few comments on the doc. Looking forward to ARM support. On Tue, Jul 18, 2023 at 3:59 PM Valentyn Tymofieiev via dev < dev@beam.apache.org> wrote: > Hi Celeste, > > Thanks for the proposal and researching the options. Using multi-arch > images seems like a good way to reduce the complexity associated with > correctly selecting the architecture on the runner. It sounds like there > may be implications for release process, which future release managers may > need to be aware of, and there might be an increase in some test suites > time now once we build ARM images. > > Left a few comments on the doc and happy to help with PR review when it is > ready. > > bcc'ing a few folks who might have feedback or to whom this proposal might > be of interest. > > Valentyn > > > > On Tue, Jul 18, 2023 at 3:12 PM Celeste Zeng > wrote: > >> Hi everyone, >> >> My name is Celeste. I work for the GCP Dataflow team and I am trying to >> add ARM support to Beam SDK container images. The ultimate goal is to make >> the released Beam SDK container images become multi-arch images, which >> support both x86 and ARM. I compiled the following doc to include the >> feature overview, my proposed implementation plan, as well as testing plan. >> And I appreciate any feedback! >> >> >> https://docs.google.com/document/d/1ikbEJNsFH1D9HqiMqiVyyMlNpDgSqxXK22nUoetzW6I/edit?usp=sharing >> >> Also, please refer to the pull request to see proposed changes: >> https://github.com/apache/beam/pull/27311 >> >> Thanks a lot! >> >> Sincerely, >> Celeste Zeng >> celestezen...@gmail.com >> >
Beam High Priority Issue Report (36)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27315 [Failing Test]: PubsubReadIT timeout pollForResultForDuration https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27312 [Bug]: JmsIO create connection based on the number of threads https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to SchemaCoder after upgrading to 2.48 https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit is failing due to exceeded rate limits https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in FlinkRunner leads to a data loss https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21104 Flaky: apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful P1 Issues with no update in the last week: https://github.com/apache/beam/issues/27330 [Bug]: Python SDK crashes