Re: [Feature Proposal] Add ARM Support to Beam SDK Container Images

2023-07-19 Thread Danny McCormick via dev
Thanks Celeste! I left a few comments. Overall I like the proposal, but I
think that the open question "If Beam SDK containers are still released by
the release manager, how should we integrate the multiarch containers into
the current Beam container release process?" needs to be answered before I
can be fully +1 on the proposal. Ideally this shouldn't create any special
work for release managers (other than waiting a bit longer for the docker
publish steps to finish).

Thanks,
Danny

On Tue, Jul 18, 2023 at 6:59 PM Valentyn Tymofieiev 
wrote:

> Hi Celeste,
>
> Thanks for the proposal and researching the options. Using multi-arch
> images seems like a good way to reduce the complexity associated with
> correctly selecting  the architecture on the runner. It sounds like there
> may be implications for release process, which future release managers may
> need to be aware of, and there might be an increase in some test suites
> time now once we build ARM images.
>
> Left a few comments on the doc and happy to help with PR review when it is
> ready.
>
> bcc'ing a few folks who might have feedback or to whom this proposal might
> be of interest.
>
> Valentyn
>
>
>
> On Tue, Jul 18, 2023 at 3:12 PM Celeste Zeng 
> wrote:
>
>> Hi everyone,
>>
>> My name is Celeste. I work for the GCP Dataflow team and I am trying to
>> add ARM support to Beam SDK container images. The ultimate goal is to make
>> the released Beam SDK container images become multi-arch images, which
>> support both x86 and ARM. I compiled the following doc to include the
>> feature overview, my proposed implementation plan, as well as testing plan.
>> And I appreciate any feedback!
>>
>>
>> https://docs.google.com/document/d/1ikbEJNsFH1D9HqiMqiVyyMlNpDgSqxXK22nUoetzW6I/edit?usp=sharing
>>
>> Also, please refer to the pull request to see proposed changes:
>> https://github.com/apache/beam/pull/27311
>>
>> Thanks a lot!
>>
>> Sincerely,
>> Celeste Zeng
>> celestezen...@gmail.com
>>
>


Re: [Feature Proposal] Add ARM Support to Beam SDK Container Images

2023-07-19 Thread Robert Bradshaw via dev
Thanks. Left a few comments on the doc. Looking forward to ARM support.

On Tue, Jul 18, 2023 at 3:59 PM Valentyn Tymofieiev via dev <
dev@beam.apache.org> wrote:

> Hi Celeste,
>
> Thanks for the proposal and researching the options. Using multi-arch
> images seems like a good way to reduce the complexity associated with
> correctly selecting  the architecture on the runner. It sounds like there
> may be implications for release process, which future release managers may
> need to be aware of, and there might be an increase in some test suites
> time now once we build ARM images.
>
> Left a few comments on the doc and happy to help with PR review when it is
> ready.
>
> bcc'ing a few folks who might have feedback or to whom this proposal might
> be of interest.
>
> Valentyn
>
>
>
> On Tue, Jul 18, 2023 at 3:12 PM Celeste Zeng 
> wrote:
>
>> Hi everyone,
>>
>> My name is Celeste. I work for the GCP Dataflow team and I am trying to
>> add ARM support to Beam SDK container images. The ultimate goal is to make
>> the released Beam SDK container images become multi-arch images, which
>> support both x86 and ARM. I compiled the following doc to include the
>> feature overview, my proposed implementation plan, as well as testing plan.
>> And I appreciate any feedback!
>>
>>
>> https://docs.google.com/document/d/1ikbEJNsFH1D9HqiMqiVyyMlNpDgSqxXK22nUoetzW6I/edit?usp=sharing
>>
>> Also, please refer to the pull request to see proposed changes:
>> https://github.com/apache/beam/pull/27311
>>
>> Thanks a lot!
>>
>> Sincerely,
>> Celeste Zeng
>> celestezen...@gmail.com
>>
>


Beam High Priority Issue Report (36)

2023-07-19 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27315 [Failing Test]: PubsubReadIT 
timeout pollForResultForDuration
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27312 [Bug]: JmsIO create connection 
based on the number of threads
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/27330 [Bug]: Python SDK crashes