Re: Python 3.11 support in Apache Beam

2023-04-13 Thread Anand Inguva via dev
Thanks Ahmet. I will check them out soon.

On Thu, Apr 13, 2023 at 6:24 PM Ahmet Altay  wrote:

> I forgot to add the link, [1] was meant to be :
> https://docs.python.org/3/whatsnew/3.11.html#faster-cpython
>
> On Thu, Apr 13, 2023 at 10:17 AM Anand Inguva 
> wrote:
>
>> Yes Ahmet. That would be great.
>>
>> There are some load tests defined in the
>> https://github.com/apache/beam/blob/master/.test-infra which could be
>> useful for performance testing of Beam between 3.10 and 3.11. Do you
>> suggest any other tests?
>>
>
> I have not looked at the full list. I do not think we will see much in IO
> bound pipelines, or pipelines that do most of their work with a C extension
> library already. Maybe some of the load tests like pardo load tests?
>
> If feasible, we could convert benchmarks to run on 3.11 and see which ones
> will see a larger improvement.
>
> Also apparently there is a potential regression of using up to 20% more
> memory (
> https://docs.python.org/3/whatsnew/3.11.html#will-cpython-3-11-use-more-memory).
> I wonder if that will negatively impact us. If feasible, it would be useful
> to understand that as well.
>
>
>>
>> On Wed, Apr 12, 2023 at 8:04 PM Ahmet Altay  wrote:
>>
>>> Thank you, this is great!
>>>
>>> Python 3.11 announcement had a claim about performance [1]:
>>>
>>> "CPython 3.11 is an average of 25% faster than CPython 3.10 as measured
>>> with the pyperformance benchmark suite, when compiled with GCC on Ubuntu
>>> Linux. Depending on your workload, the overall speedup could be 10-60%."
>>>
>>> Have we measured this in Beam? Are we seeing any benefits? If not, why?
>>> If yes, this would be a cool blog post as well.
>>>
>>> Ahmet
>>>
>>>
>>> On Wed, Apr 5, 2023 at 1:12 PM Anand Inguva via dev 
>>> wrote:
>>>
 Python 3.11 support has been merged at
 https://github.com/apache/beam/pull/26121 targeting Beam 2.47.0
 release.

 Please let me know if you have any questions.

 Thanks,
 Anand

 On Tue, Feb 21, 2023 at 6:04 PM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Thanks a lot Anand. I'll take a look at the PRs.
>
> On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva 
> wrote:
>
>> I was able to spin up a PR: https://github.com/apache/beam/pull/24599
>> that updates the build dependencies of Apache Beam.
>>
>> Several GCP dependencies needed to be updated as well. I covered them
>> in the PR: https://github.com/apache/beam/pull/24599
>>
>> On Thu, Feb 9, 2023 at 3:29 PM Anand Inguva 
>> wrote:
>>
>>> Yes, we may need to update all of them
>>> .
>>> I can add more information once I dig into the issue(most likely next
>>> week). I will comment on my findings on the issue:
>>> https://github.com/apache/beam/issues/24569 and will periodically
>>> update this thread.
>>>
>>> On Tue, Feb 7, 2023 at 5:47 PM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva 
 wrote:

> Yes, it is related to protobuf only. But I think the update of
> these dependencies are required for Python 3.11 since the newer 
> versions
> have support for Python 3.11 wheels.
>
 Assuming you refer to protobuf. Yes, there are no wheels for 3.10
 for protobuf==3.x.x and that can cause friction.
 https://pypi.org/project/protobuf/3.20.3/#files

 I would probably narrow the problem further to demonstrate which
 stubs are not being generated, and if reason not obvious we can also 
 ask
 for feedback from protobuf maintainers. Also - do we by chance need to
 update some other deps from
 https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
 for this to work?

 Also: tracking issue for protobuf4 support in Beam:
 https://github.com/apache/beam/issues/24569.

 If we use older versions of these packages, then we have to depend
> on installing those packages on Python 3.11 from source distributions 
> which
> is not desired.
>
> I am working parallely on that issue in a different PR
> https://github.com/apache/beam/pull/24599 but I think this issue
> should be a blocker for Python 3.11 update.
>
> On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> Hi Anand,
>>
>> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> We are planning to work on adding support for Python 3.11[1] to
>>> Apache Beam Python SDK.
>>>
>>> As part of this effort, we are going to update 

Re: Python 3.11 support in Apache Beam

2023-04-13 Thread Ahmet Altay via dev
I forgot to add the link, [1] was meant to be :
https://docs.python.org/3/whatsnew/3.11.html#faster-cpython

On Thu, Apr 13, 2023 at 10:17 AM Anand Inguva 
wrote:

> Yes Ahmet. That would be great.
>
> There are some load tests defined in the
> https://github.com/apache/beam/blob/master/.test-infra which could be
> useful for performance testing of Beam between 3.10 and 3.11. Do you
> suggest any other tests?
>

I have not looked at the full list. I do not think we will see much in IO
bound pipelines, or pipelines that do most of their work with a C extension
library already. Maybe some of the load tests like pardo load tests?

If feasible, we could convert benchmarks to run on 3.11 and see which ones
will see a larger improvement.

Also apparently there is a potential regression of using up to 20% more
memory (
https://docs.python.org/3/whatsnew/3.11.html#will-cpython-3-11-use-more-memory).
I wonder if that will negatively impact us. If feasible, it would be useful
to understand that as well.


>
> On Wed, Apr 12, 2023 at 8:04 PM Ahmet Altay  wrote:
>
>> Thank you, this is great!
>>
>> Python 3.11 announcement had a claim about performance [1]:
>>
>> "CPython 3.11 is an average of 25% faster than CPython 3.10 as measured
>> with the pyperformance benchmark suite, when compiled with GCC on Ubuntu
>> Linux. Depending on your workload, the overall speedup could be 10-60%."
>>
>> Have we measured this in Beam? Are we seeing any benefits? If not, why?
>> If yes, this would be a cool blog post as well.
>>
>> Ahmet
>>
>>
>> On Wed, Apr 5, 2023 at 1:12 PM Anand Inguva via dev 
>> wrote:
>>
>>> Python 3.11 support has been merged at
>>> https://github.com/apache/beam/pull/26121 targeting Beam 2.47.0
>>> release.
>>>
>>> Please let me know if you have any questions.
>>>
>>> Thanks,
>>> Anand
>>>
>>> On Tue, Feb 21, 2023 at 6:04 PM Valentyn Tymofieiev 
>>> wrote:
>>>
 Thanks a lot Anand. I'll take a look at the PRs.

 On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva 
 wrote:

> I was able to spin up a PR: https://github.com/apache/beam/pull/24599
> that updates the build dependencies of Apache Beam.
>
> Several GCP dependencies needed to be updated as well. I covered them
> in the PR: https://github.com/apache/beam/pull/24599
>
> On Thu, Feb 9, 2023 at 3:29 PM Anand Inguva 
> wrote:
>
>> Yes, we may need to update all of them
>> .
>> I can add more information once I dig into the issue(most likely next
>> week). I will comment on my findings on the issue:
>> https://github.com/apache/beam/issues/24569 and will periodically
>> update this thread.
>>
>> On Tue, Feb 7, 2023 at 5:47 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva 
>>> wrote:
>>>
 Yes, it is related to protobuf only. But I think the update of
 these dependencies are required for Python 3.11 since the newer 
 versions
 have support for Python 3.11 wheels.

>>> Assuming you refer to protobuf. Yes, there are no wheels for 3.10
>>> for protobuf==3.x.x and that can cause friction.
>>> https://pypi.org/project/protobuf/3.20.3/#files
>>>
>>> I would probably narrow the problem further to demonstrate which
>>> stubs are not being generated, and if reason not obvious we can also ask
>>> for feedback from protobuf maintainers. Also - do we by chance need to
>>> update some other deps from
>>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
>>> for this to work?
>>>
>>> Also: tracking issue for protobuf4 support in Beam:
>>> https://github.com/apache/beam/issues/24569.
>>>
>>> If we use older versions of these packages, then we have to depend
 on installing those packages on Python 3.11 from source distributions 
 which
 is not desired.

 I am working parallely on that issue in a different PR
 https://github.com/apache/beam/pull/24599 but I think this issue
 should be a blocker for Python 3.11 update.

 On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Hi Anand,
>
> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
>
>> Hi all,
>>
>> We are planning to work on adding support for Python 3.11[1] to
>> Apache Beam Python SDK.
>>
>> As part of this effort, we are going to update the python build
>> dependencies defined at [2].
>>
>> Right now, there is an error with the newer version of
>> protobuf(4.21.11). It is not generating _urn files.
>>
>> It can be reproduced by
>>
>

Beam Dependency Check Report (2023-04-13)

2023-04-13 Thread Apache Jenkins Server
<<< text/html; charset=UTF-8: Unrecognized >>>


Re: Python 3.11 support in Apache Beam

2023-04-13 Thread Anand Inguva via dev
Yes Ahmet. That would be great.

There are some load tests defined in the
https://github.com/apache/beam/blob/master/.test-infra which could be
useful for performance testing of Beam between 3.10 and 3.11. Do you
suggest any other tests?

On Wed, Apr 12, 2023 at 8:04 PM Ahmet Altay  wrote:

> Thank you, this is great!
>
> Python 3.11 announcement had a claim about performance [1]:
>
> "CPython 3.11 is an average of 25% faster than CPython 3.10 as measured
> with the pyperformance benchmark suite, when compiled with GCC on Ubuntu
> Linux. Depending on your workload, the overall speedup could be 10-60%."
>
> Have we measured this in Beam? Are we seeing any benefits? If not, why? If
> yes, this would be a cool blog post as well.
>
> Ahmet
>
>
> On Wed, Apr 5, 2023 at 1:12 PM Anand Inguva via dev 
> wrote:
>
>> Python 3.11 support has been merged at
>> https://github.com/apache/beam/pull/26121 targeting Beam 2.47.0 release.
>>
>> Please let me know if you have any questions.
>>
>> Thanks,
>> Anand
>>
>> On Tue, Feb 21, 2023 at 6:04 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> Thanks a lot Anand. I'll take a look at the PRs.
>>>
>>> On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva 
>>> wrote:
>>>
 I was able to spin up a PR: https://github.com/apache/beam/pull/24599
 that updates the build dependencies of Apache Beam.

 Several GCP dependencies needed to be updated as well. I covered them
 in the PR: https://github.com/apache/beam/pull/24599

 On Thu, Feb 9, 2023 at 3:29 PM Anand Inguva 
 wrote:

> Yes, we may need to update all of them
> .
> I can add more information once I dig into the issue(most likely next
> week). I will comment on my findings on the issue:
> https://github.com/apache/beam/issues/24569 and will periodically
> update this thread.
>
> On Tue, Feb 7, 2023 at 5:47 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva 
>> wrote:
>>
>>> Yes, it is related to protobuf only. But I think the update of these
>>> dependencies are required for Python 3.11 since the newer versions have
>>> support for Python 3.11 wheels.
>>>
>> Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for
>> protobuf==3.x.x and that can cause friction.
>> https://pypi.org/project/protobuf/3.20.3/#files
>>
>> I would probably narrow the problem further to demonstrate which
>> stubs are not being generated, and if reason not obvious we can also ask
>> for feedback from protobuf maintainers. Also - do we by chance need to
>> update some other deps from
>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
>> for this to work?
>>
>> Also: tracking issue for protobuf4 support in Beam:
>> https://github.com/apache/beam/issues/24569.
>>
>> If we use older versions of these packages, then we have to depend on
>>> installing those packages on Python 3.11 from source distributions 
>>> which is
>>> not desired.
>>>
>>> I am working parallely on that issue in a different PR
>>> https://github.com/apache/beam/pull/24599 but I think this issue
>>> should be a blocker for Python 3.11 update.
>>>
>>> On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 Hi Anand,

 On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev <
 dev@beam.apache.org> wrote:

> Hi all,
>
> We are planning to work on adding support for Python 3.11[1] to
> Apache Beam Python SDK.
>
> As part of this effort, we are going to update the python build
> dependencies defined at [2].
>
> Right now, there is an error with the newer version of
> protobuf(4.21.11). It is not generating _urn files.
>
> It can be reproduced by
>

> 1. python setup.py sdist
> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
> 3. switch to python interpreter and run import apache_beam as beam
>
 I think the error you are describing is related to protobuf 4, so
 the repro should focus on the portion where generation of stubs is
 happening. Presumably some stubs are not generated on protobuf 4 + 
 Python
 3.11?


>
> will lead to *ImportError: cannot import name
> 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'.  
> *Running
> `python gen_protos.py` to forcefully generate files didn't help 
> either.
>
> If you have encountered this error and found a resolution, please
> let me know(that would be super helpful).
>
> I am going to work on this soon. 

Re: [PROPOSAL] Preparing for 2.47.0 Release

2023-04-13 Thread Ahmet Altay via dev
Sounds good. Thank you. And if you need help please reach out.

On Thu, Apr 13, 2023 at 6:29 AM Jack McCluskey 
wrote:

> We're making good progress on finding and fixing bugs. Not quite to
> building an RC candidate yet, but so far nothing that seems to be a
> difficult fix.
>
> On Wed, Apr 12, 2023 at 8:10 PM Ahmet Altay  wrote:
>
>> Jack, how is the release coming along?
>>
>> On Tue, Apr 4, 2023 at 12:23 PM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey everyone,
>>>
>>> I need a PMC member's help adding my pubkey to
>>> https://dist.apache.org/repos/dist/release/beam/KEYS as well as adding
>>> PyPI user jrmccluskey to the maintainers of the Apache Beam package. These
>>> are the last steps I have to do to complete prep for the release.
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> On Wed, Mar 22, 2023 at 11:38 AM Jack McCluskey 
>>> wrote:
>>>
 Hey all,

 The next (2.47.0) release branch cut is scheduled for April 5th, 2023,
 according to
 the release calendar [1].

 I will be performing this release. My plan is to cut the branch on that
 date, and cherrypick release-blocking fixes afterwards, if any.

 Please help me make sure the release goes smoothly by:
 - Making sure that any unresolved release blocking issues
 for 2.47.0 should have their "Milestone" marked as "2.47.0 Release" as
 soon as possible.
 - Reviewing the current release blockers [2] and remove the Milestone
 if they don't meet the criteria at [3].

 Let me know if you have any comments/objections/questions.

 Thanks,

 Jack McCluskey

 [1]
 https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
 [2] https://github.com/apache/beam/milestone/10
 [3] https://beam.apache.org/contribute/release-blocking/

 --


 Jack McCluskey
 SWE - DataPLS PLAT/ Dataflow ML
 RDU
 jrmcclus...@google.com





Re: [PROPOSAL] Preparing for 2.47.0 Release

2023-04-13 Thread Jack McCluskey via dev
We're making good progress on finding and fixing bugs. Not quite to
building an RC candidate yet, but so far nothing that seems to be a
difficult fix.

On Wed, Apr 12, 2023 at 8:10 PM Ahmet Altay  wrote:

> Jack, how is the release coming along?
>
> On Tue, Apr 4, 2023 at 12:23 PM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> Hey everyone,
>>
>> I need a PMC member's help adding my pubkey to
>> https://dist.apache.org/repos/dist/release/beam/KEYS as well as adding
>> PyPI user jrmccluskey to the maintainers of the Apache Beam package. These
>> are the last steps I have to do to complete prep for the release.
>>
>> Thanks,
>>
>> Jack McCluskey
>>
>> On Wed, Mar 22, 2023 at 11:38 AM Jack McCluskey 
>> wrote:
>>
>>> Hey all,
>>>
>>> The next (2.47.0) release branch cut is scheduled for April 5th, 2023,
>>> according to
>>> the release calendar [1].
>>>
>>> I will be performing this release. My plan is to cut the branch on that
>>> date, and cherrypick release-blocking fixes afterwards, if any.
>>>
>>> Please help me make sure the release goes smoothly by:
>>> - Making sure that any unresolved release blocking issues
>>> for 2.47.0 should have their "Milestone" marked as "2.47.0 Release" as
>>> soon as possible.
>>> - Reviewing the current release blockers [2] and remove the Milestone
>>> if they don't meet the criteria at [3].
>>>
>>> Let me know if you have any comments/objections/questions.
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> [1]
>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>> [2] https://github.com/apache/beam/milestone/10
>>> [3] https://beam.apache.org/contribute/release-blocking/
>>>
>>> --
>>>
>>>
>>> Jack McCluskey
>>> SWE - DataPLS PLAT/ Dataflow ML
>>> RDU
>>> jrmcclus...@google.com
>>>
>>>
>>>


Beam High Priority Issue Report (27)

2023-04-13 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/26251 [Failing Test]: 
beam_PreCommit_Python_Coverage_Commit is failing due to codecov package removal 
from Pypi
https://github.com/apache/beam/issues/26126 [Failing Test]: 
beam_PostCommit_XVR_Samza permared validatesCrossLanguageRunnerGoUsingJava 
TestDebeziumIO_BasicRead
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId