Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-07 Thread Andrew Pilloud via dev
This sounds reasonable to me as well.

I've made swaps like this in the past, the base image of each is probably a
bigger factor than the JDK. The openjdk images were based on Debian 11. The
default eclipse-temurin images are based on Ubuntu 22.04 with an alpine
option. Ubuntu is a Debian derivative but the versions and package names
aren't exact matches and Ubuntu tends to update a little faster. For most
users I don't think this will matter but users building custom containers
may need to make minor changes. The alpine option will be much smaller
(which could be a significant improvement) but would be a more significant
change to the environment.

On Tue, Feb 7, 2023 at 5:18 PM Robert Bradshaw via dev 
wrote:

> Seams reasonable to me.
>
> On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user 
> wrote:
> >
> > As per [1], the JDK8 and JDK11 containers that Apache Beam uses have
> stopped being built and supported since July 2022. I have filed [2] to
> track the resolution of this issue.
> >
> > Based upon [1], almost everyone is swapping to the eclipse-temurin
> container[3] as their base based upon the linked issues from the
> deprecation notice[1]. The eclipse-temurin container is released under
> these licenses:
> > Apache License, Version 2.0
> > Eclipse Distribution License 1.0 (BSD)
> > Eclipse Public License 2.0
> > 一 (Secondary) GNU General Public License, version 2 with OpenJDK
> Assembly Exception
> > 一 (Secondary) GNU General Public License, version 2 with the GNU
> Classpath Exception
> >
> > I propose that we swap all our containers to the eclipse-temurin
> containers[3].
> >
> > Open to other ideas and also would be great to hear about your
> experience in any other projects that you have had to make a similar
> decision.
> >
> > 1: https://github.com/docker-library/openjdk/issues/505
> > 2: https://github.com/apache/beam/issues/25371
> > 3: https://hub.docker.com/_/eclipse-temurin
>


Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-07 Thread Robert Bradshaw via dev
Seams reasonable to me.

On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user  wrote:
>
> As per [1], the JDK8 and JDK11 containers that Apache Beam uses have stopped 
> being built and supported since July 2022. I have filed [2] to track the 
> resolution of this issue.
>
> Based upon [1], almost everyone is swapping to the eclipse-temurin 
> container[3] as their base based upon the linked issues from the deprecation 
> notice[1]. The eclipse-temurin container is released under these licenses:
> Apache License, Version 2.0
> Eclipse Distribution License 1.0 (BSD)
> Eclipse Public License 2.0
> 一 (Secondary) GNU General Public License, version 2 with OpenJDK Assembly 
> Exception
> 一 (Secondary) GNU General Public License, version 2 with the GNU Classpath 
> Exception
>
> I propose that we swap all our containers to the eclipse-temurin 
> containers[3].
>
> Open to other ideas and also would be great to hear about your experience in 
> any other projects that you have had to make a similar decision.
>
> 1: https://github.com/docker-library/openjdk/issues/505
> 2: https://github.com/apache/beam/issues/25371
> 3: https://hub.docker.com/_/eclipse-temurin


OpenJDK8 / OpenJDK11 container deprecation

2023-02-07 Thread Luke Cwik via dev
As per [1], the JDK8 and JDK11 containers that Apache Beam uses have
stopped being built and supported since July 2022. I have filed [2] to
track the resolution of this issue.

Based upon [1], almost everyone is swapping to the eclipse-temurin
container[3] as their base based upon the linked issues from the
deprecation notice[1]. The eclipse-temurin container is released under
these licenses:
Apache License, Version 2.0
Eclipse Distribution License 1.0 (BSD)
Eclipse Public License 2.0
一 (Secondary) GNU General Public License, version 2 with OpenJDK Assembly
Exception
一 (Secondary) GNU General Public License, version 2 with the GNU Classpath
Exception

I propose that we swap all our containers to the eclipse-temurin
containers[3].

Open to other ideas and also would be great to hear about your experience
in any other projects that you have had to make a similar decision.

1: https://github.com/docker-library/openjdk/issues/505
2: https://github.com/apache/beam/issues/25371
3: https://hub.docker.com/_/eclipse-temurin


Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Valentyn Tymofieiev via dev
On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva  wrote:

> Yes, it is related to protobuf only. But I think the update of these
> dependencies are required for Python 3.11 since the newer versions have
> support for Python 3.11 wheels.
>
Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for
protobuf==3.x.x and that can cause friction.
https://pypi.org/project/protobuf/3.20.3/#files

I would probably narrow the problem further to demonstrate which stubs are
not being generated, and if reason not obvious we can also ask for feedback
from protobuf maintainers. Also - do we by chance need to update some other
deps from
https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
for this to work?

Also: tracking issue for protobuf4 support in Beam:
https://github.com/apache/beam/issues/24569.

If we use older versions of these packages, then we have to depend on
> installing those packages on Python 3.11 from source distributions which is
> not desired.
>
> I am working parallely on that issue in a different PR
> https://github.com/apache/beam/pull/24599 but I think this issue should
> be a blocker for Python 3.11 update.
>
> On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev 
> wrote:
>
>> Hi Anand,
>>
>> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev 
>> wrote:
>>
>>> Hi all,
>>>
>>> We are planning to work on adding support for Python 3.11[1] to Apache
>>> Beam Python SDK.
>>>
>>> As part of this effort, we are going to update the python build
>>> dependencies defined at [2].
>>>
>>> Right now, there is an error with the newer version of
>>> protobuf(4.21.11). It is not generating _urn files.
>>>
>>> It can be reproduced by
>>>
>>
>>> 1. python setup.py sdist
>>> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
>>> 3. switch to python interpreter and run import apache_beam as beam
>>>
>> I think the error you are describing is related to protobuf 4, so the
>> repro should focus on the portion where generation of stubs is happening.
>> Presumably some stubs are not generated on protobuf 4 + Python 3.11?
>>
>>
>>>
>>> will lead to *ImportError: cannot import name
>>> 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'.  *Running
>>> `python gen_protos.py` to forcefully generate files didn't help either.
>>>
>>> If you have encountered this error and found a resolution, please let me
>>> know(that would be super helpful).
>>>
>>> I am going to work on this soon. Please let me know if you want to
>>> collaborate.
>>>
>>> Thanks,
>>> Anand Inguva
>>>
>>> *[1] *https://github.com/apache/beam/pull/24721
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>>>
>>


Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Anand Inguva via dev
Yes, it is related to protobuf only. But I think the update of these
dependencies are required for Python 3.11 since the newer versions have
support for Python 3.11 wheels. If we use older versions of these packages,
then we have to depend on installing those packages on Python 3.11 from
source distributions which is not desired.

I am working parallely on that issue in a different PR
https://github.com/apache/beam/pull/24599 but I think this issue should be
a blocker for Python 3.11 update.

On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev 
wrote:

> Hi Anand,
>
> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev 
> wrote:
>
>> Hi all,
>>
>> We are planning to work on adding support for Python 3.11[1] to Apache
>> Beam Python SDK.
>>
>> As part of this effort, we are going to update the python build
>> dependencies defined at [2].
>>
>> Right now, there is an error with the newer version of protobuf(4.21.11).
>> It is not generating _urn files.
>>
>> It can be reproduced by
>>
>
>> 1. python setup.py sdist
>> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
>> 3. switch to python interpreter and run import apache_beam as beam
>>
> I think the error you are describing is related to protobuf 4, so the
> repro should focus on the portion where generation of stubs is happening.
> Presumably some stubs are not generated on protobuf 4 + Python 3.11?
>
>
>>
>> will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns'
>> from 'apache_beam.portability.api'.  *Running `python gen_protos.py` to
>> forcefully generate files didn't help either.
>>
>> If you have encountered this error and found a resolution, please let me
>> know(that would be super helpful).
>>
>> I am going to work on this soon. Please let me know if you want to
>> collaborate.
>>
>> Thanks,
>> Anand Inguva
>>
>> *[1] *https://github.com/apache/beam/pull/24721
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>>
>


Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Valentyn Tymofieiev via dev
Hi Anand,

On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev 
wrote:

> Hi all,
>
> We are planning to work on adding support for Python 3.11[1] to Apache
> Beam Python SDK.
>
> As part of this effort, we are going to update the python build
> dependencies defined at [2].
>
> Right now, there is an error with the newer version of protobuf(4.21.11).
> It is not generating _urn files.
>
> It can be reproduced by
>

> 1. python setup.py sdist
> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
> 3. switch to python interpreter and run import apache_beam as beam
>
I think the error you are describing is related to protobuf 4, so the repro
should focus on the portion where generation of stubs is happening.
Presumably some stubs are not generated on protobuf 4 + Python 3.11?


>
> will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns'
> from 'apache_beam.portability.api'.  *Running `python gen_protos.py` to
> forcefully generate files didn't help either.
>
> If you have encountered this error and found a resolution, please let me
> know(that would be super helpful).
>
> I am going to work on this soon. Please let me know if you want to
> collaborate.
>
> Thanks,
> Anand Inguva
>
> *[1] *https://github.com/apache/beam/pull/24721
> [2]
> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>


Python 3.11 support in Apache Beam

2023-02-07 Thread Anand Inguva via dev
Hi all,

We are planning to work on adding support for Python 3.11[1] to Apache Beam
Python SDK.

As part of this effort, we are going to update the python build
dependencies defined at [2].

Right now, there is an error with the newer version of protobuf(4.21.11).
It is not generating _urn files.

It can be reproduced by

1. python setup.py sdist
2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
3. switch to python interpreter and run import apache_beam as beam

will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns'
from 'apache_beam.portability.api'.  *Running `python gen_protos.py` to
forcefully generate files didn't help either.

If you have encountered this error and found a resolution, please let me
know(that would be super helpful).

I am going to work on this soon. Please let me know if you want to
collaborate.

Thanks,
Anand Inguva

*[1] *https://github.com/apache/beam/pull/24721
[2]
https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt


Beam High Priority Issue Report (40)

2023-02-07 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/25140 [Bug]: GenerateSequence is broken 
on SDF
https://github.com/apache/beam/issues/24971 [Bug]: Messages are not published 
when a connection is closed with JmsIO
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24655 [Bug]: Pipeline fusion should break 
at @RequiresStableInput boundary
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be 
passed to flink runner
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action