Re: OpenJDK8 / OpenJDK11 container deprecation
This sounds reasonable to me as well. I've made swaps like this in the past, the base image of each is probably a bigger factor than the JDK. The openjdk images were based on Debian 11. The default eclipse-temurin images are based on Ubuntu 22.04 with an alpine option. Ubuntu is a Debian derivative but the versions and package names aren't exact matches and Ubuntu tends to update a little faster. For most users I don't think this will matter but users building custom containers may need to make minor changes. The alpine option will be much smaller (which could be a significant improvement) but would be a more significant change to the environment. On Tue, Feb 7, 2023 at 5:18 PM Robert Bradshaw via dev wrote: > Seams reasonable to me. > > On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user > wrote: > > > > As per [1], the JDK8 and JDK11 containers that Apache Beam uses have > stopped being built and supported since July 2022. I have filed [2] to > track the resolution of this issue. > > > > Based upon [1], almost everyone is swapping to the eclipse-temurin > container[3] as their base based upon the linked issues from the > deprecation notice[1]. The eclipse-temurin container is released under > these licenses: > > Apache License, Version 2.0 > > Eclipse Distribution License 1.0 (BSD) > > Eclipse Public License 2.0 > > 一 (Secondary) GNU General Public License, version 2 with OpenJDK > Assembly Exception > > 一 (Secondary) GNU General Public License, version 2 with the GNU > Classpath Exception > > > > I propose that we swap all our containers to the eclipse-temurin > containers[3]. > > > > Open to other ideas and also would be great to hear about your > experience in any other projects that you have had to make a similar > decision. > > > > 1: https://github.com/docker-library/openjdk/issues/505 > > 2: https://github.com/apache/beam/issues/25371 > > 3: https://hub.docker.com/_/eclipse-temurin >
Re: OpenJDK8 / OpenJDK11 container deprecation
Seams reasonable to me. On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user wrote: > > As per [1], the JDK8 and JDK11 containers that Apache Beam uses have stopped > being built and supported since July 2022. I have filed [2] to track the > resolution of this issue. > > Based upon [1], almost everyone is swapping to the eclipse-temurin > container[3] as their base based upon the linked issues from the deprecation > notice[1]. The eclipse-temurin container is released under these licenses: > Apache License, Version 2.0 > Eclipse Distribution License 1.0 (BSD) > Eclipse Public License 2.0 > 一 (Secondary) GNU General Public License, version 2 with OpenJDK Assembly > Exception > 一 (Secondary) GNU General Public License, version 2 with the GNU Classpath > Exception > > I propose that we swap all our containers to the eclipse-temurin > containers[3]. > > Open to other ideas and also would be great to hear about your experience in > any other projects that you have had to make a similar decision. > > 1: https://github.com/docker-library/openjdk/issues/505 > 2: https://github.com/apache/beam/issues/25371 > 3: https://hub.docker.com/_/eclipse-temurin
OpenJDK8 / OpenJDK11 container deprecation
As per [1], the JDK8 and JDK11 containers that Apache Beam uses have stopped being built and supported since July 2022. I have filed [2] to track the resolution of this issue. Based upon [1], almost everyone is swapping to the eclipse-temurin container[3] as their base based upon the linked issues from the deprecation notice[1]. The eclipse-temurin container is released under these licenses: Apache License, Version 2.0 Eclipse Distribution License 1.0 (BSD) Eclipse Public License 2.0 一 (Secondary) GNU General Public License, version 2 with OpenJDK Assembly Exception 一 (Secondary) GNU General Public License, version 2 with the GNU Classpath Exception I propose that we swap all our containers to the eclipse-temurin containers[3]. Open to other ideas and also would be great to hear about your experience in any other projects that you have had to make a similar decision. 1: https://github.com/docker-library/openjdk/issues/505 2: https://github.com/apache/beam/issues/25371 3: https://hub.docker.com/_/eclipse-temurin
Re: Python 3.11 support in Apache Beam
On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva wrote: > Yes, it is related to protobuf only. But I think the update of these > dependencies are required for Python 3.11 since the newer versions have > support for Python 3.11 wheels. > Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for protobuf==3.x.x and that can cause friction. https://pypi.org/project/protobuf/3.20.3/#files I would probably narrow the problem further to demonstrate which stubs are not being generated, and if reason not obvious we can also ask for feedback from protobuf maintainers. Also - do we by chance need to update some other deps from https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33 for this to work? Also: tracking issue for protobuf4 support in Beam: https://github.com/apache/beam/issues/24569. If we use older versions of these packages, then we have to depend on > installing those packages on Python 3.11 from source distributions which is > not desired. > > I am working parallely on that issue in a different PR > https://github.com/apache/beam/pull/24599 but I think this issue should > be a blocker for Python 3.11 update. > > On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev > wrote: > >> Hi Anand, >> >> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev >> wrote: >> >>> Hi all, >>> >>> We are planning to work on adding support for Python 3.11[1] to Apache >>> Beam Python SDK. >>> >>> As part of this effort, we are going to update the python build >>> dependencies defined at [2]. >>> >>> Right now, there is an error with the newer version of >>> protobuf(4.21.11). It is not generating _urn files. >>> >>> It can be reproduced by >>> >> >>> 1. python setup.py sdist >>> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz >>> 3. switch to python interpreter and run import apache_beam as beam >>> >> I think the error you are describing is related to protobuf 4, so the >> repro should focus on the portion where generation of stubs is happening. >> Presumably some stubs are not generated on protobuf 4 + Python 3.11? >> >> >>> >>> will lead to *ImportError: cannot import name >>> 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'. *Running >>> `python gen_protos.py` to forcefully generate files didn't help either. >>> >>> If you have encountered this error and found a resolution, please let me >>> know(that would be super helpful). >>> >>> I am going to work on this soon. Please let me know if you want to >>> collaborate. >>> >>> Thanks, >>> Anand Inguva >>> >>> *[1] *https://github.com/apache/beam/pull/24721 >>> [2] >>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt >>> >>
Re: Python 3.11 support in Apache Beam
Yes, it is related to protobuf only. But I think the update of these dependencies are required for Python 3.11 since the newer versions have support for Python 3.11 wheels. If we use older versions of these packages, then we have to depend on installing those packages on Python 3.11 from source distributions which is not desired. I am working parallely on that issue in a different PR https://github.com/apache/beam/pull/24599 but I think this issue should be a blocker for Python 3.11 update. On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev wrote: > Hi Anand, > > On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev > wrote: > >> Hi all, >> >> We are planning to work on adding support for Python 3.11[1] to Apache >> Beam Python SDK. >> >> As part of this effort, we are going to update the python build >> dependencies defined at [2]. >> >> Right now, there is an error with the newer version of protobuf(4.21.11). >> It is not generating _urn files. >> >> It can be reproduced by >> > >> 1. python setup.py sdist >> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz >> 3. switch to python interpreter and run import apache_beam as beam >> > I think the error you are describing is related to protobuf 4, so the > repro should focus on the portion where generation of stubs is happening. > Presumably some stubs are not generated on protobuf 4 + Python 3.11? > > >> >> will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns' >> from 'apache_beam.portability.api'. *Running `python gen_protos.py` to >> forcefully generate files didn't help either. >> >> If you have encountered this error and found a resolution, please let me >> know(that would be super helpful). >> >> I am going to work on this soon. Please let me know if you want to >> collaborate. >> >> Thanks, >> Anand Inguva >> >> *[1] *https://github.com/apache/beam/pull/24721 >> [2] >> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt >> >
Re: Python 3.11 support in Apache Beam
Hi Anand, On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev wrote: > Hi all, > > We are planning to work on adding support for Python 3.11[1] to Apache > Beam Python SDK. > > As part of this effort, we are going to update the python build > dependencies defined at [2]. > > Right now, there is an error with the newer version of protobuf(4.21.11). > It is not generating _urn files. > > It can be reproduced by > > 1. python setup.py sdist > 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz > 3. switch to python interpreter and run import apache_beam as beam > I think the error you are describing is related to protobuf 4, so the repro should focus on the portion where generation of stubs is happening. Presumably some stubs are not generated on protobuf 4 + Python 3.11? > > will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns' > from 'apache_beam.portability.api'. *Running `python gen_protos.py` to > forcefully generate files didn't help either. > > If you have encountered this error and found a resolution, please let me > know(that would be super helpful). > > I am going to work on this soon. Please let me know if you want to > collaborate. > > Thanks, > Anand Inguva > > *[1] *https://github.com/apache/beam/pull/24721 > [2] > https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt >
Python 3.11 support in Apache Beam
Hi all, We are planning to work on adding support for Python 3.11[1] to Apache Beam Python SDK. As part of this effort, we are going to update the python build dependencies defined at [2]. Right now, there is an error with the newer version of protobuf(4.21.11). It is not generating _urn files. It can be reproduced by 1. python setup.py sdist 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz 3. switch to python interpreter and run import apache_beam as beam will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'. *Running `python gen_protos.py` to forcefully generate files didn't help either. If you have encountered this error and found a resolution, please let me know(that would be super helpful). I am going to work on this soon. Please let me know if you want to collaborate. Thanks, Anand Inguva *[1] *https://github.com/apache/beam/pull/24721 [2] https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
Beam High Priority Issue Report (40)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/25140 [Bug]: GenerateSequence is broken on SDF https://github.com/apache/beam/issues/24971 [Bug]: Messages are not published when a connection is closed with JmsIO https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24655 [Bug]: Pipeline fusion should break at @RequiresStableInput boundary https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be passed to flink runner https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to lock gradle https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/22969 Discrepancy in behavior of `DoFn.process()` when `yield` is combined with `return` statement, or vice versa https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently skips most of records without job fail https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output to Failed Inserts PCollection https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not raise exception for unsuccessful states. https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21104 Flaky: apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK harness startup https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM on Flink https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful https://github.com/apache/beam/issues/19465 Explore possibilities to lower in-use IP address quota footprint. https://github.com/apache/beam/issues/19241 Python Dataflow integration tests should export the pipeline Job ID and console output to Jenkins Test Result section P1 Issues with no update in the last week: https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true for unequal rows https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/22115 [Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action