Re: [VOTE] Release 2.56.0, release candidate #2

2024-05-01 Thread Valentyn Tymofieiev via dev
Thanks, Danny.

+1

Verified Python pipelines on Dataflow, checked that released containers
include fixes for a regression in GRPC IO that affected last release.

On Wed, May 1, 2024 at 11:30 AM Danny McCormick 
wrote:

> I added https://github.com/apache/beam/pull/31038#issuecomment-2088879104 
> triaging
> the issues. The JPMS ones do seem to have been from the Dataflow containers
> not being published, they succeeded on retry. The remaining issues
> generally seem to be flakes or infrastructure issues.
>
> Thanks,
> Danny
>
> On Wed, May 1, 2024 at 1:42 PM Danny McCormick 
> wrote:
>
>> I'll go through and triage/rekick off some Dataflow suites and summarize
>> the results in the PR. The JPMS ones (and possibly others) might be because
>> I hadn't published Dataflow containers yet when I ran them, the log isn't
>> helpful so I'll try to capture the actual job which failed.
>>
>> Thanks,
>> Danny
>>
>> On Wed, May 1, 2024 at 1:05 PM Valentyn Tymofieiev via dev <
>> dev@beam.apache.org> wrote:
>>
>>> What is the high-level summary of test failures on
>>> https://github.com/apache/beam/pull/31038 - are all issues
>>> preexisting/infra-related/already tracked?
>>>
>>> In particular, I noticed two failures in the Java JPMS test suite, which
>>> I hadn't come across before.
>>>
>>> On Wed, May 1, 2024 at 8:16 AM Ritesh Ghorse via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Ran a few python pipelines on Direct and Dataflow runner
>>>>
>>>> Thanks!
>>>>
>>>> On Wed, May 1, 2024 at 10:00 AM Jeff Kinard via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Validated Dataflow YamlTemplate using Java and Python transforms
>>>>> (xlang) and several Beam YAML tests.
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>> On Sat, Apr 27, 2024 at 7:42 AM Danny McCormick via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>> Please review and vote on the release candidate #2 for the version
>>>>>> 2.56.0, as follows:
>>>>>> [ ] +1, Approve the release
>>>>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>>>>
>>>>>> Reviewers are encouraged to test their own use cases with the release
>>>>>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>>>>>> count towards the final vote, but votes from all community members is
>>>>>> encouraged and helpful for finding regressions; you can either test your
>>>>>> own use cases [13] or use cases from the validation sheet [10].
>>>>>>
>>>>>> The complete staging area is available for your review, which
>>>>>> includes:
>>>>>> * GitHub Release notes [1],
>>>>>> * the official Apache source release to be deployed to
>>>>>> dist.apache.org [2], which is signed with the key with fingerprint
>>>>>> D20316F712213422 [3],
>>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>>> * source code tag "v2.56.0-RC2" [5],
>>>>>> * website pull request listing the release [6], the blog post [6],
>>>>>> and publishing the API reference manual [7].
>>>>>> * Python artifacts are deployed along with the source release to the
>>>>>> dist.apache.org [2] and PyPI[8].
>>>>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>>>>> * Validation sheet with a tab for 2.56.0 release to help with
>>>>>> validation [10].
>>>>>> * Docker images published to Docker Hub [11].
>>>>>> * PR to run tests against release branch [12].
>>>>>>
>>>>>> The vote will be open for at least 72 hours. It is adopted by
>>>>>> majority approval, with at least 3 PMC affirmative votes.
>>>>>>
>>>>>> For guidelines on how to try the release in your projects, check out
>>>>>> our RC testing guide [13].
>>>>>>
>>>>>> Thanks,
>>>>>> Danny
>>>>>>
>>>>>> [1] https://github.com/apache/beam/milestone/20
>>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.56.0/
>>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>>>> [4]
>>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1377/
>>>>>> [5] https://github.com/apache/beam/tree/v2.56.0-RC2
>>>>>> [6] https://github.com/apache/beam/pull/31094
>>>>>> [7] https://github.com/apache/beam-site/pull/665
>>>>>> [8] https://pypi.org/project/apache-beam/2.56.0rc2/
>>>>>> [9]
>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.56.0-RC2/go/pkg/beam
>>>>>> [10]
>>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1992402651
>>>>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>>>> [12] https://github.com/apache/beam/pull/31038
>>>>>> [13]
>>>>>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>>>>>
>>>>>


Re: [VOTE] Release 2.56.0, release candidate #2

2024-05-01 Thread Valentyn Tymofieiev via dev
What is the high-level summary of test failures on
https://github.com/apache/beam/pull/31038 - are all issues
preexisting/infra-related/already tracked?

In particular, I noticed two failures in the Java JPMS test suite, which I
hadn't come across before.

On Wed, May 1, 2024 at 8:16 AM Ritesh Ghorse via dev 
wrote:

> +1 (non-binding)
>
> Ran a few python pipelines on Direct and Dataflow runner
>
> Thanks!
>
> On Wed, May 1, 2024 at 10:00 AM Jeff Kinard via dev 
> wrote:
>
>> +1 (non-binding)
>>
>> Validated Dataflow YamlTemplate using Java and Python transforms (xlang)
>> and several Beam YAML tests.
>>
>> Thanks,
>> Jeff
>>
>> On Sat, Apr 27, 2024 at 7:42 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version
>>> 2.56.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>>> count towards the final vote, but votes from all community members is
>>> encouraged and helpful for finding regressions; you can either test your
>>> own use cases [13] or use cases from the validation sheet [10].
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint D20316F712213422 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.56.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.56.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> RC testing guide [13].
>>>
>>> Thanks,
>>> Danny
>>>
>>> [1] https://github.com/apache/beam/milestone/20
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.56.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1377/
>>> [5] https://github.com/apache/beam/tree/v2.56.0-RC2
>>> [6] https://github.com/apache/beam/pull/31094
>>> [7] https://github.com/apache/beam-site/pull/665
>>> [8] https://pypi.org/project/apache-beam/2.56.0rc2/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.56.0-RC2/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1992402651
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/31038
>>> [13]
>>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>>
>>


Re: Structured Logging in python

2024-04-15 Thread Valentyn Tymofieiev via dev
Thanks! I opened https://github.com/apache/beam/issues/30978 . Feel free to
self-assign when you have time to work on it.

On Mon, Apr 15, 2024 at 1:43 PM Geddy Schellevis 
wrote:

> The “custom_data” field didn’t work.
> I am Happy with helping with the implementation of  this.
>
> Op ma 15 apr 2024 om 22:10 schreef Ahmet Altay via dev <
> dev@beam.apache.org>
>
>> Is there an open github issue for this? Perhaps it would be a good
>> project to implement in Python, using the java version as a
>> reference implementation?
>>
>> On Thu, Apr 11, 2024 at 12:04 PM Udi Meiri  wrote:
>>
>>> Hi,
>>>
>>> I believe this wasn't implemented for Python (only Java). You can try
>>> adding structured data (extra keyword) under the key "custom_data" and that
>>> might work.
>>>
>>> On 2024/04/11 17:49:43 Valentyn Tymofieiev wrote:
>>> > Thanks for reaching out. There was a proposal a while back:
>>> > https://s.apache.org/beam-structured-logging
>>> >
>>> > /cc: @u...@apache.org - do you know the current status?
>>> >
>>> > Thanks a lot!
>>> >
>>> > On Thu, Apr 11, 2024 at 8:29 AM Geddy Schellevis <
>>> geddyschelle...@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi all,
>>> > >
>>> > > I would like to know if it is possible to have structured logging in
>>> > > Dataflow.
>>> > > In the attached file, you can find the code that I am trying to do.
>>> > >
>>> > > I see the logs are appearing in gcp log explorer, but I cannot see
>>> the
>>> > > extra fields.
>>> > >
>>> > > Best regards,
>>> > >
>>> >
>>>
>>


Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Valentyn Tymofieiev via dev
On Thu, Apr 11, 2024 at 1:00 PM Joey Tran  wrote:

> Thanks for the feedback!
>
> Sounds like it'll be a while before this is supported. Would it make sense
> to formally _not_ support this functionality by raising an early and clear
> exception if someone does try to use a side input with a combiner? As it
> stands now, the exception a user gets is very difficult to understand. My
> motivation for looking into this was a new user I was introducing to Beam
> ran into this issue and their first impression was that Beam had terrible
> error handling / messages.
>

In general, failing early and with a proper message is a good idea. If
something fails 100% of the time later, it is better to fail early.  If we
can't fail early, try to fail with a better error message. If some
functionality fails 90% of the time but works 10% of the time, we'd have to
be more careful to not accidentally introduce a breaking change for some
niche group of users. For combiners with side inputs, IIRC there was a
reference in BEAM-8400 <https://issues.apache.org/jira/browse/BEAM-8400> that
they might work in some cases but not other cases - I haven't verified
that.



>
> Best,
> Joey
>
> On Thu, Apr 11, 2024 at 3:52 PM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> I took a look and mentioned the PR to a few folks. Couple of thoughts:
>> - We should avoid Beam adding a high-level functionality only for Batch.
>> - Supporting Windowing/Triggers would likely be non-trivial and worth
>> considering early in the design.
>> - If you'd like to continue working on this, I would suggest to start a
>> document and gradually cover the following (with help/feedback from others
>> in this list):
>>- the motivational example, available workarounds (if any)
>>- background on aspects of combiner implementation that are relevant.
>>- proposed approach  and considered alternatives
>>- any runner-specific considerations.
>>
>> Thanks,
>> Valentyn
>>
>> On Fri, Mar 29, 2024 at 5:06 AM Joey Tran 
>> wrote:
>>
>>> I posted a PoC PR [1] for fixing deferred side inputs with combiners in
>>> the python SDK. Would someone be willing to take a look at it?
>>>
>>> I have it working but could use some feedback on where to take it next.
>>> It looks like bundle processor combiner operations don't currently support
>>> side inputs [2] so I added a conditional in `CombinePerKey` that checks
>>> whether it was instantiated with a side input and if so, use a ParDo-based
>>> version of the combiner so we can piggyback off of the Do operations
>>> implementation of side inputs rather than reimplementing it for the
>>> combiner operation.
>>>
>>> [1] https://github.com/apache/beam/pull/30743
>>> [2]
>>> https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146
>>>
>>


Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Valentyn Tymofieiev via dev
I took a look and mentioned the PR to a few folks. Couple of thoughts:
- We should avoid Beam adding a high-level functionality only for Batch.
- Supporting Windowing/Triggers would likely be non-trivial and worth
considering early in the design.
- If you'd like to continue working on this, I would suggest to start a
document and gradually cover the following (with help/feedback from others
in this list):
   - the motivational example, available workarounds (if any)
   - background on aspects of combiner implementation that are relevant.
   - proposed approach  and considered alternatives
   - any runner-specific considerations.

Thanks,
Valentyn

On Fri, Mar 29, 2024 at 5:06 AM Joey Tran  wrote:

> I posted a PoC PR [1] for fixing deferred side inputs with combiners in
> the python SDK. Would someone be willing to take a look at it?
>
> I have it working but could use some feedback on where to take it next. It
> looks like bundle processor combiner operations don't currently support
> side inputs [2] so I added a conditional in `CombinePerKey` that checks
> whether it was instantiated with a side input and if so, use a ParDo-based
> version of the combiner so we can piggyback off of the Do operations
> implementation of side inputs rather than reimplementing it for the
> combiner operation.
>
> [1] https://github.com/apache/beam/pull/30743
> [2]
> https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146
>


Re: tox issues in dev container

2024-04-05 Thread Valentyn Tymofieiev via dev
Could you please provide more info about how you create your environment?
Also what OS do you use?

On Fri, Apr 5, 2024 at 2:08 PM Joey Tran  wrote:

> Yeah that was the tox command I was running
>
> On Fri, Apr 5, 2024, 4:37 PM XQ Hu via dev  wrote:
>
>>
>> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks
>>
>> This generally works well. Have you checked this?
>>
>> On Fri, Apr 5, 2024 at 4:07 PM Joey Tran 
>> wrote:
>>
>>> I think I might be doing something silly with my environment.
>>>
>>> I'm trying to lint using tox in a dev container, but running tox ends
>>> with this error:
>>> ```
>>> (env)  jtran@[Beam Build Env.]:~/beam {flatmapdefault} ]
>>> $ tox
>>>   File "/usr/lib/python3/dist-packages/tox/reporter.py", line 32, in
>>> __init__
>>> self._reset(**kwargs)
>>>   File "/usr/lib/python3/dist-packages/tox/reporter.py", line 38, in
>>> _reset
>>> self.tw = py.io.TerminalWriter()
>>> AttributeError: module 'py' has no attribute 'io'
>>> ```
>>>
>>> This is preventing me from linting (sorry to everyone on my PRs who keep
>>> seeing linting errors...)
>>>
>>> Any help here would be welcome. I've been struggling generally to get a
>>> stable dev environment working.
>>>
>>> Cheers,
>>> Joey
>>>
>>


Re: [VOTE] Patch Release 2.55.1, release candidate #2

2024-04-03 Thread Valentyn Tymofieiev via dev
Hi Danny,

Thanks for volunteering to do this patch release.

For review convenience, this is the diff:
  - Diff of release branches:
https://github.com/apache/beam/compare/release-2.55.0...release-2.55.1
  - The diff of tags v2.55.0-RC3 and v2.55.1-RC2:
https://github.com/apache/beam/compare/v2.55.0-RC3...v2.55.1-RC2  is
somewhat misleading, it looks as though there is a change in the version
naming pattern, but upon inspection of gradle.properties for each tag
individually, the pattern is the same and doesn't include dev/SNAPSHOT
suffixes.

> I put together a patch release per the conversation in
https://lists.apache.org/thread/kvq1wsj505pvopkq186dnvc0l6ryyfh0.

Noting that the 2.55.1 doesn't fix another Python SDK known issue that was
called out in that thread, which is fine with me, just calling out the
difference from previous discussion.

Also noting that there is no PR postsubmit test suite running against the
release branch in the vote email. Given the diff, that's also fine since
previous tests runs didn't detect the breakage, but in general we  should
include that for patch releases as well.

+1. Spot-checked some Python SDK artifacts and containers.

On Wed, Apr 3, 2024 at 8:08 AM Danny McCormick via dev 
wrote:

> Hi everyone,
>
> I put together a patch release per the conversation in
> https://lists.apache.org/thread/kvq1wsj505pvopkq186dnvc0l6ryyfh0.
>
> Please review and vote on the release candidate #2 (I messed up rc1) for
> the version 2.55.1, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases [9] or use cases from the validation sheet [7].
>
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org
> [1], which is signed with the key with fingerprint D20316F712213422 [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * source code tag "v2.55.1-RC2" [4],
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [1] and PyPI[5].
> * Go artifacts and documentation are available at pkg.go.dev [6]
> * Validation sheet with a tab for 2.55.1 release to help with validation
> [7].
> * Docker images published to Docker Hub [8].
>
> This release does not include any website changes since it is addressing a
> single bug fix as discussed in
> https://lists.apache.org/thread/kvq1wsj505pvopkq186dnvc0l6ryyfh0.
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> RC testing guide [9].
>
> Thanks,
> Danny
>
> [1] https://dist.apache.org/repos/dist/dev/beam/2.55.1/
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1375/
> [4] https://github.com/apache/beam/tree/v2.55.1-RC2
> [5] https://pypi.org/project/apache-beam/2.55.1rc2/
> [6]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.55.1-RC2/go/pkg/beam
> [7]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=686075626
> [8] https://hub.docker.com/search?q=apache%2Fbeam=image
> [9]
> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>


Re: Patch release proposal

2024-03-28 Thread Valentyn Tymofieiev via dev
If we do a patch release for Python SDK, let's also patch another known
issue for which fix is available:
https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1

On Thu, Mar 28, 2024 at 8:01 AM Yi Hu via dev  wrote:

> 2.55.0 release manager here
>
> The patch itself [1] is trivial, however, the release process is not
> trivial. There is little documentation nor practice for a patch release
> process. I could imagine two options
>
> 1. Do a full "2.55.1" release
>
> 2. Do a patch release only for Python SDK, that is
>   a. cherry-pick [1] into release-2.55.0 branch
>   b. tag a 2.55.1rc1 release candidate - note that the source code of
> release candidate (e.g. apache_beam/version.py) still reads 2.55.0. This
> ensures Python SDK picks up the Java expansion service / job server of
> existing version (2.55.0). We did it once for Go SDK (
> https://github.com/apache/beam/tree/sdks/v2.48.2)
>   c. Build the release candidate for Python wheels (also Python
> containers? Not sure if it is needed)
>   d. send out the RC for validation
>   e. finalize the release
>
> If we decided to do a patch release I would prefer option 2. I can take on
> that if decided to do. However, if we decide do a full release (or both
> Java and Python) I would suggest defer to next release cycle, as the
> release process itself could take ~10 days minimum if there is single RC.
>
> Besides, there should be a Beam YAML validation workflow and added in
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>
>
> [1] https://github.com/apache/beam/pull/30780
>
> On Thu, Mar 28, 2024 at 10:22 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> +1 on a patch release - we've done a fair amount of work to make
>> releasing easier, and one of my hopes is that it will enable quick patches
>> like this. I'd vote we try to fix the underlying Java piece as well,
>> though, doing a patch release for one language shouldn't be significantly
>> cheaper than doing it for multiple languages.
>>
>> Thanks,
>> Danny
>>
>> On Wed, Mar 27, 2024 at 7:19 PM Robert Burke  wrote:
>>
>>> +1 to a targeted patch release.
>>>
>>> We did the same for the Go SDK a little while back. It would be good to
>>> see what's different for a different SDK.
>>>
>>> On Wed, Mar 27, 2024, 4:01 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Given the severity of the breakage, and the simplicity of the
 workaround, I'm in favor of a patch release. I think we could do
 Python-only, which would make the process even more lightweight.

 On Wed, Mar 27, 2024 at 3:48 PM Jeff Kinard 
 wrote:

> Hi all,
>
> Beam 2.55 was released with a bug that causes WriteToJson on Beam YAML
> to fail when using the Java variant. This also affects any user attempting
> to use the Xlang JsonWriteTransformProvider -
> https://github.com/apache/beam/blob/master/sdks/java/io/json/src/main/java/org/apache/beam/sdk/io/json/providers/JsonWriteTransformProvider.java
>
> This is due to a change to
> https://github.com/apache/beam/blob/master/sdks/java/io/json/build.gradle
> that removed
> a dependency on everit which also removed it from being packaged into
> the expansion service JAR:
> beam-sdks-java-extensions-sql-expansion-service-2.55.0.jar
>
> There is a temporary fix to disable the provider in Beam YAML:
> https://github.com/apache/beam/pull/30777
>
> I think with the total loss of function, and a trivial fix, it is
> worth creating a patch release of Beam 2.55 to include this fix.
>
> - Jeff
>
>


Re: [VOTE] Release 2.55.0, release candidate #3

2024-03-22 Thread Valentyn Tymofieiev via dev
+1 (binding). Checked some of the released artifacts, release blog, and ran
a couple Python pipelines on Dataflow.


> * GitHub Release notes [1]

Is the link correct? It points to the milestone.

On Fri, Mar 22, 2024 at 1:10 PM Yi Hu via dev  wrote:

> +1 (non-binding)
>
> 1. Checked published Java artifacts
>
> 2. Tested with GCP IO performance benchmark
>
> 3. Tested with Java PostRelease workflow (including QuickstartJavaDirect,
> QuickstartJavaDataflow, QuickstartJavaSpark, QuickstartJavaTwister2,
> QuickstartJavaFlinkLocal, MobileGamingJavaDirect, GamingJavaDataflow,
> MobileGamingJavaDataflowBom) [2]
>
> [1]
> https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/main/it/google-cloud-platform
> [2] https://github.com/apache/beam/pull/30721
>
> On Thu, Mar 21, 2024 at 10:59 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> +1 - validated some ML examples with the interactive runner
>>
>> Thanks,
>> Danny
>>
>> On Thu, Mar 21, 2024 at 9:21 AM Jan Lukavský  wrote:
>>
>>> +1 (binding)
>>>
>>> Tested Java SDK with FlinkRunner.
>>>
>>>  Jan
>>> On 3/20/24 22:40, Chamikara Jayalath via dev wrote:
>>>
>>> +1 (binding)
>>>
>>> Tested multi-lang Java/Python pipelines and upgrading BQ/Kafka
>>> transforms from 2.53.0 to 2.55.0 using the Transform Service.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Tue, Mar 19, 2024 at 2:10 PM XQ Hu via dev 
>>> wrote:
>>>
 +1 (non-binding). Ran the simple ML pipeline without any issue:
 https://github.com/google/dataflow-ml-starter/actions/runs/8349158153

 On Tue, Mar 19, 2024 at 11:55 AM Ritesh Ghorse via dev <
 dev@beam.apache.org> wrote:

> +1 (non-binding) - Ran a few python batch examples on Direct and
> Dataflow runner.
>
> Thanks!
>
> On Tue, Mar 19, 2024 at 10:56 AM Yi Hu via dev 
> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #3 for the version
>> 2.55.0, as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if
>> no issues are found. Only PMC member votes will count towards the
>> final vote, but votes from all
>> community members is encouraged and helpful for finding regressions;
>> you can either test your own
>> use cases [13] or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which
>> includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> D20316F712213422 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.55.0-RC3" [5],
>> * website pull request listing the release [6], the blog post [6],
>> and publishing the API reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI [8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.55.0 release to help with
>> validation [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out
>> our RC testing guide [13].
>>
>> Thanks,
>> Release Manager
>>
>> [1] https://github.com/apache/beam/milestone/19
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.55.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1373/
>> [5] https://github.com/apache/beam/tree/v2.55.0-RC3
>> [6] https://github.com/apache/beam/pull/30607
>> [7] https://github.com/apache/beam-site/pull/661
>> [8] https://pypi.org/project/apache-beam/2.55.0rc3/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.55.0-RC3/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/30569
>> [13]
>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>
>>
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>>
>>


Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
It's fair. if we change the default value, we can perhaps add an error
handling logic so that (pcoll) | beam.Flatten() fails with an error that
recommends (pcoll) | beam.FlatMap(), instead of saying that input is not
an iterable.

On Thu, Mar 21, 2024 at 3:41 PM Joey Tran  wrote:

> +1
>
> On Thu, Mar 21, 2024 at 6:30 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> I would be more comfortable with a default for FlatMap than overloading
>> Flatten in this way. Distinguishing between
>>
>> (pcoll,) | beam.Flatten()
>>
>> and
>>
>> (pcoll) | beam.Flatten()
>>
>> seems a bit error prone.
>>
>>
>> On Thu, Mar 21, 2024 at 2:23 PM Joey Tran 
>> wrote:
>>
>>> Ah, I misunderstood your original suggestion then. That makes sense
>>> then. I have already seen someone get a little confused about the names and
>>> surprised that Flatten doesn't do what FlatMap does.
>>>
>>> On Thu, Mar 21, 2024 at 5:20 PM Valentyn Tymofieiev 
>>> wrote:
>>>
>>>> Beam throws an error at submission time in Python if you pass a single
>>>> PCollection  to Flatten. The scenario you describe concerns a one-element
>>>> list.
>>>>
>>>> On Thu, Mar 21, 2024, 13:43 Joey Tran 
>>>> wrote:
>>>>
>>>>> I think it'd be quite surprising if beam.Flatten would become
>>>>> equivalent to FlatMap if passed only a single pcollection. One use case
>>>>> that would be broken from that is cases where someone might be flattening 
>>>>> a
>>>>> variable number of pcollections, including possibly only one pcollection.
>>>>> In that case, that single pcollection suddenly get FlatMapped.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> One possible alternative is to define beam.Flatten for a single
>>>>>> collection to be functionally equivalent to beam.FlatMap(lambda x: x), 
>>>>>> but
>>>>>> that would be a larger change and such behavior might need to be
>>>>>> consistent across SDKs and documented. Adding a default value is a 
>>>>>> simpler
>>>>>> change.
>>>>>>
>>>>>> I can also confirm that the usage
>>>>>>
>>>>>> |  'Flatten' >> beam.FlatMap(lambda x: x)
>>>>>>
>>>>>> is fairly common by inspecting uses of Beam internally.
>>>>>> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the
>>>>>>> first of which does what you want.
>>>>>>>
>>>>>>> Giving FlatMap a default arg of lambda x: x is an interesting idea.
>>>>>>> The only downside I see is a less clear error if one forgets to provide
>>>>>>> this (now mandatory) parameter, but maybe that's low enough to be worth 
>>>>>>> the
>>>>>>> convenience?
>>>>>>>
>>>>>>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran <
>>>>>>> joey.t...@schrodinger.com> wrote:
>>>>>>>
>>>>>>>> That's not really the same thing, is it? `beam.Flatten` combines
>>>>>>>> two or more pcollections into a single pcollection while beam.FlatMap
>>>>>>>> unpacks iterables of elements (i.e. PCollection> ->
>>>>>>>> PCollection)
>>>>>>>>
>>>>>>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev <
>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi, you can use beam.Flatten() instead.
>>>>>>>>>
>>>>>>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran <
>>>>>>>>> joey.t...@schrodinger.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey all,
>>>>>>>>>>
>>>>>>>>>> Using an identity function for FlatMap comes up more often than
>>>>>>>>>> using FlatMap without an identity function. Would it make sense to 
>>>>>>>>>> use the
>>>>>>>>>> identity function as a default?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>


Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
Beam throws an error at submission time in Python if you pass a single
PCollection  to Flatten. The scenario you describe concerns a one-element
list.

On Thu, Mar 21, 2024, 13:43 Joey Tran  wrote:

> I think it'd be quite surprising if beam.Flatten would become equivalent
> to FlatMap if passed only a single pcollection. One use case that would be
> broken from that is cases where someone might be flattening a variable
> number of pcollections, including possibly only one pcollection. In that
> case, that single pcollection suddenly get FlatMapped.
>
>
>
> On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> One possible alternative is to define beam.Flatten for a single
>> collection to be functionally equivalent to beam.FlatMap(lambda x: x), but
>> that would be a larger change and such behavior might need to be
>> consistent across SDKs and documented. Adding a default value is a simpler
>> change.
>>
>> I can also confirm that the usage
>>
>> |  'Flatten' >> beam.FlatMap(lambda x: x)
>>
>> is fairly common by inspecting uses of Beam internally.
>> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the first
>>> of which does what you want.
>>>
>>> Giving FlatMap a default arg of lambda x: x is an interesting idea. The
>>> only downside I see is a less clear error if one forgets to provide this
>>> (now mandatory) parameter, but maybe that's low enough to be worth the
>>> convenience?
>>>
>>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran 
>>> wrote:
>>>
>>>> That's not really the same thing, is it? `beam.Flatten` combines two or
>>>> more pcollections into a single pcollection while beam.FlatMap unpacks
>>>> iterables of elements (i.e. PCollection> -> PCollection)
>>>>
>>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> Hi, you can use beam.Flatten() instead.
>>>>>
>>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran 
>>>>> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> Using an identity function for FlatMap comes up more often than using
>>>>>> FlatMap without an identity function. Would it make sense to use the
>>>>>> identity function as a default?
>>>>>>
>>>>>>
>>>>>>
>>>>>>


Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
One possible alternative is to define beam.Flatten for a single collection
to be functionally equivalent to beam.FlatMap(lambda x: x), but that would
be a larger change and such behavior might need to be consistent across
SDKs and documented. Adding a default value is a simpler change.

I can also confirm that the usage

|  'Flatten' >> beam.FlatMap(lambda x: x)

is fairly common by inspecting uses of Beam internally.
On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev 
wrote:

> IIRC, Java has Flatten.iterables() and Flatten.collections(), the first of
> which does what you want.
>
> Giving FlatMap a default arg of lambda x: x is an interesting idea. The
> only downside I see is a less clear error if one forgets to provide this
> (now mandatory) parameter, but maybe that's low enough to be worth the
> convenience?
>
> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran 
> wrote:
>
>> That's not really the same thing, is it? `beam.Flatten` combines two or
>> more pcollections into a single pcollection while beam.FlatMap unpacks
>> iterables of elements (i.e. PCollection> -> PCollection)
>>
>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi, you can use beam.Flatten() instead.
>>>
>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran 
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>> Using an identity function for FlatMap comes up more often than using
>>>> FlatMap without an identity function. Would it make sense to use the
>>>> identity function as a default?
>>>>
>>>>
>>>>
>>>>


Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
Actually, disregard that, Flatten is used in a different context to flatten
multiple collections.

On Thu, Mar 21, 2024 at 11:55 AM Valentyn Tymofieiev 
wrote:

> Hi, you can use beam.Flatten() instead.
>
> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran 
> wrote:
>
>> Hey all,
>>
>> Using an identity function for FlatMap comes up more often than using
>> FlatMap without an identity function. Would it make sense to use the
>> identity function as a default?
>>
>>
>>
>>


Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
Hi, you can use beam.Flatten() instead.

On Thu, Mar 21, 2024 at 10:55 AM Joey Tran 
wrote:

> Hey all,
>
> Using an identity function for FlatMap comes up more often than using
> FlatMap without an identity function. Would it make sense to use the
> identity function as a default?
>
>
>
>


Re: Update confluent dependencies version in kafka io

2024-03-11 Thread Valentyn Tymofieiev via dev
Welcome to dev@ Maciej. I think as long as an upgrade doesn't cause
breaking changes for the users, there shouldn't be any concerns.

Having a dependency on a 5 yr old library on the other hand is  a concern.
For Python SDK, we try to upgrade to new major versions within a year after
they are released.


On Mon, Mar 11, 2024 at 3:09 PM XQ Hu via dev  wrote:

> This sounds great! Feel free to create an issue to track this work!
>
> On Mon, Mar 11, 2024 at 9:36 AM Maciej Szwaja via dev 
> wrote:
>
>> Hi,
>>
>> This is my first email to this list, hello everyone!
>>
>> I have a question regarding the confluent version dependency that beam,
>> or more specifically kafka io extension, is currently using. Tl;dr is that
>> I'd like to update the confluent library version to something more recent,
>> and was wondering if I could just open a PR and maybe create an issue for
>> that in github.
>>
>> More context:
>> Right now the confluent dependency version is set at 5.3.2, which is
>> quite old (5 years or so) - one feature that we recently discovered is
>> missing in this version is the support for configuring schema registry
>> client's ssl configuration - according to the (current) docs (
>> https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html)
>> there's a whole bunch of configuration keys starting with a "ssl." prefix
>> that dictate how a schema registry client might access the schema registry
>> - the issue is that support for those config keys on the client side had
>> only been added starting from version 5.4.0 IIUC (
>> https://github.com/confluentinc/schema-registry/pull/957). Without it,
>> the ssl has to be configured using standard JVM classes such as SSLContext
>> or HttpsURLConnection, which on top of being slightly unintuitive is also
>> undocumented - therefore I'd like to suggest updating the confluent libs
>> version to something more recent. I have already managed to update them to
>> a version as high as 7.6.0 (which is the most recent one, from what I
>> gather), and successfully run tests and build the extension - apart from
>> updating the confluent library version itself I had to additionally update
>> the avro code generation plugin as well as the avro dependency.
>>
>> Let me know what you think.
>>
>> Thanks,
>> Maciej
>>
>


Re: Issue building python SDK with M2 Mac

2024-03-08 Thread Valentyn Tymofieiev via dev
it sounds like the the error might be happening during building of python
wheels, it seems that `-arch ` parameter is not being correctly evaluated
for your platform and is omitted. I am not sure what is causing this.
I am also not sure what dependency generates that command line
(distutils/setuptools), perhaps updating it would help. or maybe you could
try to do it in a docker container.

On Fri, Mar 8, 2024 at 10:38 AM XQ Hu via dev  wrote:

> I do not have the problem to do this on my M1 by cloning the repo and
> using conda to create venv with python 3.11 and installing it with pip
> install -e ".[gcp,test]". It installs numpy with 1.26.4.
>
> On Thu, Mar 7, 2024 at 7:48 AM Joey Tran 
> wrote:
>
>> Hey all,
>>
>> I'm trying to get a beam python SDK dev environment going but I'm a bit
>> stuck. I'm just settings things up with a virtual env as specified in the
>> docs[1], but `pip install -e .[gcp,test]` ends with a clang error:
>>
>> ```
>>   clang -Wsign-compare -Wunreachable-code -fno-common -dynamic
>> -DNDEBUG -g -fwrapv -O3 -Wall -isysroot
>> /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -arch
>> -I/private/var/folders/n1/6qk3ljm97h32j1g7qg5s0prhgq/T/pip-build-env-0q14luhy/overlay/lib/python3.11/site-packages/numpy/core/include
>> -I/Users/jtran/repo/hjtran/beam/sdks/python/env/include
>> -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11
>> -c apache_beam/coders/coder_impl_row_encoders.c -o
>> /var/folders/n1/6qk3ljm97h32j1g7qg5s0prhgq/T/tmpro15g_f6.build-temp/apache_beam/coders/coder_impl_row_encoders.o
>>   clang: error: invalid arch name '-arch
>> -I/private/var/folders/n1/6qk3ljm97h32j1g7qg5s0prhgq/T/pip-build-env-0q14luhy/overlay/lib/python3.11/site-packages/numpy/core/include'
>>   Traceback (most recent call last):
>> File
>> "/private/var/folders/n1/6qk3ljm97h32j1g7qg5s0prhgq/T/pip-build-env-0q14luhy/overlay/lib/python3.11/site-packages/setuptools/_distutils/unixccompiler.py",
>> line 185, in _compile
>>   self.spawn(compiler_so + cc_args + [src, '-o', obj] +
>> extra_postargs)
>> File
>> "/private/var/folders/n1/6qk3ljm97h32j1g7qg5s0prhgq/T/pip-build-env-0q14luhy/overlay/lib/python3.11/site-packages/setuptools/_distutils/ccompiler.py",
>> line 1041, in spawn
>>   spawn(cmd, dry_run=self.dry_run, **kwargs)
>> File
>> "/private/var/folders/n1/6qk3ljm97h32j1g7qg5s0prhgq/T/pip-build-env-0q14luhy/overlay/lib/python3.11/site-packages/setuptools/_distutils/spawn.py",
>> line 70, in spawn
>>   raise DistutilsExecError(
>>   distutils.errors.DistutilsExecError: command '/usr/bin/clang'
>> failed with exit code 1
>>
>>   During handling of the above exception, another exception occurred:
>>
>> ```
>>
>> I'm pretty stumped as to how to go forward.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-VirtualEnvironmentSetup
>>
>


Re: [VOTE] Vendored Dependencies Release

2024-02-14 Thread Valentyn Tymofieiev via dev
+1 (binding)

On Wed, Feb 14, 2024 at 7:52 AM Kenneth Knowles  wrote:

> +1 (binding)
>
> On Wed, Feb 14, 2024 at 10:48 AM Robert Burke  wrote:
>
>> +1 (binding)
>>
>> On Wed, Feb 14, 2024, 7:35 AM Yi Hu via dev  wrote:
>>
>>> +1 (non-binding)
>>>
>>> checked artifact packages not leaking namespace (or under
>>> org.apache.beam.vendor.grpc.v1p60p1) and the tests in
>>> https://github.com/apache/beam/pull/30212
>>>
>>>
>>>
>>>
>>> On Tue, Feb 13, 2024 at 4:29 AM Sam Whittle  wrote:
>>>
 Hi,
 Sorry I missed that close step. Done!
 Sam

 On Mon, Feb 12, 2024 at 8:32 PM Yi Hu via dev 
 wrote:

> Hi,
>
> I am trying to open "
> https://repository.apache.org/content/repositories/orgapachebeam-1369/;
> but get "[id=orgapachebeam-1369] exists but is not exposed." It seems the
> staging repository needs to be closed to have it available to public: [1]
>
> [1]
> https://docs.google.com/document/d/1ztEoyGkqq9ie5riQxRtMuBu3vb6BUO91mSMn1PU0pDA/edit?disco=vHX80XE
>
> On Mon, Feb 12, 2024 at 1:44 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (binding)
>>
>> Thanks,
>> Cham
>>
>> On Fri, Feb 9, 2024 at 5:25 AM Sam Whittle 
>> wrote:
>>
>>> Please review the release of the following artifacts that we vendor,
>>> following the process [5]:
>>>
>>>  * beam-vendor-grpc-1-60-1:0.2
>>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> beam-vendor-grpc-1-60-1:0.2 as follows:
>>>
>>> [ ] +1, Approve the release
>>>
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>>
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [1], which is signed with the key with fingerprint
>>> FCFD152811BF1578 [2],
>>>
>>> * all artifacts to be deployed to the Maven Central Repository [3],
>>>
>>> * commit hash "2d08b32e674a1046ba7be0ae5f1e4b7b05b73488" [4].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>>
>>> Sam
>>>
>>> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>>>
>>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>
>>> [3]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1369/
>>>
>>> [4]
>>> https://github.com/apache/beam/commit/2d08b32e674a1046ba7be0ae5f1e4b7b05b73488
>>>
>>> [5] https://s.apache.org/beam-release-vendored-artifacts
>>>
>>


Re: [ANNOUNCE] New Committer: Svetak Sundhar

2024-02-12 Thread Valentyn Tymofieiev via dev
Congrats, Svetak!

On Mon, Feb 12, 2024 at 11:20 AM Kenneth Knowles  wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer:
> Svetak Sundhar (sve...@apache.org).
>
> Svetak has been with Beam since 2021. Svetak has contributed code to many
> areas of Beam, including notebooks, Beam Quest, dataframes, and IOs. We
> also want to especially highlight the effort Svetak has put into improving
> Beam's documentation, participating in release validation, and evangelizing
> Beam.
>
> Considering his contributions to the project over this timeframe, the Beam
> PMC trusts Svetak with the responsibilities of a Beam committer. [1]
>
> Thank you Svetak! And we are looking to see more of your contributions!
>
> Kenn, on behalf of the Apache Beam PMC
>
> [1]
>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>


Re: [VOTE] Release 2.54.0, release candidate #2

2024-02-09 Thread Valentyn Tymofieiev via dev
+1.

Checked postcommit test results for Python SDK, and exercised a couple of
Datadow scenarios.

On Thu, Feb 8, 2024, 14:07 Svetak Sundhar via dev 
wrote:

> +1 (Non-Binding)
>
> Tested with Python SDK on DirectRunner and Dataflow Runner
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Thu, Feb 8, 2024 at 12:45 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (binding)
>>
>> Tried out Java/Python multi-lang jobs and upgrading BQ/Kafka transforms
>> from 2.53.0 to 2.54.0 using the Transform Service.
>>
>> Thanks,
>> Cham
>>
>> On Wed, Feb 7, 2024 at 5:52 PM XQ Hu via dev  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Validated with a simple RunInference Python pipeline:
>>> https://github.com/google/dataflow-ml-starter/actions/runs/7821639833/job/21339032997
>>>
>>> On Wed, Feb 7, 2024 at 7:10 PM Yi Hu via dev 
>>> wrote:
>>>
 +1 (non-binding)

 Validated with Dataflow Template:
 https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1317

 Regards,

 On Wed, Feb 7, 2024 at 11:18 AM Ritesh Ghorse via dev <
 dev@beam.apache.org> wrote:

> +1 (non-binding)
>
> Ran a few batch and streaming examples for Python SDK on Dataflow
> Runner
>
> Thanks!
>
> On Wed, Feb 7, 2024 at 4:08 AM Jan Lukavský  wrote:
>
>> +1 (binding)
>>
>> Validated Java SDK with Flink runner.
>>
>>  Jan
>> On 2/7/24 06:23, Robert Burke via dev wrote:
>>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version
>> 2.54.0,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if
>> no issues are found. Only PMC member votes will count towards the
>> final
>> vote, but votes from all
>> community members is encouraged and helpful for finding regressions;
>> you
>> can either test your own
>> use cases [13] or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which
>> includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2],
>> which is signed with the key with fingerprint D20316F712213422 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.54.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.54.0 release to help with
>> validation
>> [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out
>> our RC
>> testing guide [13].
>>
>> Thanks,
>> Robert Burke
>> Beam 2.54.0 Release Manager
>>
>> [1] https://github.com/apache/beam/milestone/18?closed=1
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.54.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1368/
>> [5] https://github.com/apache/beam/tree/v2.54.0-RC2
>> [6] https://github.com/apache/beam/pull/30201
>> [7] https://github.com/apache/beam-site/pull/659
>> [8] https://pypi.org/project/apache-beam/2.54.0rc2/
>> [9]
>>
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.54.0-RC2/go/pkg/beam
>> [10]
>>
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=28763708
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/30104
>> [13]
>>
>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>
>>


Re: [PROPOSAL] Re-release vendor grpc

2024-02-07 Thread Valentyn Tymofieiev via dev
On Wed, Feb 7, 2024 at 1:34 AM Sam Whittle via dev 
wrote:

> Related to this, could a PMC member add my key to
> https://dist.apache.org/repos/dist/release/beam/KEYS?
>
Done, thanks.

> I've appended it to https://dist.apache.org/repos/dist/dev/beam/KEYS
> Thanks!
> Sam
>
> On Wed, Feb 7, 2024 at 12:15 AM Kenneth Knowles  wrote:
>
>> SGTM. Thanks for doing this!
>>
>> On Tue, Feb 6, 2024 at 5:20 PM Sam Whittle  wrote:
>>
>>> Hi everyone,
>>>
>>> I would like to volunteer to rerelease the Beam vendored grpc 1.60.1.
>>> The grpc version will be unchanged but additional jars
>>> 'io.grpc:grpc-services' and 'io.grpc:grpc-util' will be added due to [1]
>>> addressing [2]
>>>
>>> My plan is to follow the release process [3, 4], which involves
>>> preparing for the release, building a candidate, voting and finalizing the
>>> release. I plan on integrating the vendored artifact
>>> org.apache.beam:beam-vendor-grpc-1_60_1:0.2 into the 2.55.0 release.
>>>
>>> Please let me know if you have any comments/objections/questions.
>>>
>>> Thanks,
>>>
>>> Sam
>>>
>>> [1] https://github.com/apache/beam/pull/30196
>>> [2] https://github.com/apache/beam/issues/24835
>>> [3] https://github.com/apache/beam/tree/master/vendor
>>> [4]
>>> https://docs.google.com/document/d/1ztEoyGkqq9ie5riQxRtMuBu3vb6BUO91mSMn1PU0pDA/edit#heading=h.vhcuqlttpnog
>>>
>>


Fwd: Community over Code EU 2024 Travel Assistance Applications now open!

2024-01-26 Thread Valentyn Tymofieiev via dev
FYI.

-- Forwarded message -

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

TAC will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

When replying, please reply to travel-assista...@apache.org


Re: Google Artifact Registry detects critical vuln CVE-2023-45853 in beam dataflow

2024-01-24 Thread Valentyn Tymofieiev via dev
> Does the beam project generally attempt to address as many of these
vulnerabilities?

Beam does not retroactively patch released container images, but we use the
latest available docker base images during each Beam release. Many
vulnerabilities concern software packages preinstalled in the Docker base
layer (currently we use Debian bookworm). Such packages are not necessarily
used over the course of running a Beam pipeline, so some attack vectors are
not applicable but of course it would depend on a particular vulnerability.

Note that Beam users can supply custom container images to use in their
pipeline. For example, one can create an image based on 'distroless'
distribution [1], which would significantly reduce the number of
preinstalled packages. For more information on customizing container
images, see [2] [3].

[1] ttps://github.com/GoogleContainerTools/distroless
[2] https://beam.apache.org/documentation/runtime/environments/
[3] https://cloud.google.com/dataflow/docs/guides/build-container-image

On Tue, Jan 23, 2024 at 1:30 PM 8 Gianfortoni <8...@tokentransit.com> wrote:

> Hi team,
>
> We recently starting using the Google Artifact Registry's container
> scanning, and have been able to fix almost all critical vulnerabilities
> across our codebase. The one exception is the docker container created when
> we deploy our dataflow beam jobs.
>
> The "critical" vulnerability reported is
> https://security-tracker.debian.org/tracker/CVE-2023-45853, and we are
> using Apache Beam golang v2.53.0. I cannot tell whether this is something
> that is even easily fixable in the docker setup or whether beam is even
> affected by this issue.
>
> Has anyone else run into this issue? Would a beam dataflow job actually be
> affected or is this more relevant for someone actually running servers on
> this particular version of debian? Should we just be ignoring this
> "critical" vulnerability since it is just in the docker container for a
> couple of batch jobs? Does the beam project generally attempt to address as
> many of these vulnerabilities?
>
> Best,
> 8
> Token Transit
>


Re: Hiding logging for beam playground examples

2023-11-15 Thread Valentyn Tymofieiev via dev
I am also not familiar with Playground. I suspect you could try to make it
crash and maybe find a stacktrace? Setting logging could like like so:
https://github.com/apache/beam/blob/729c4de416b8252ec99f0a1253ac7af3023733df/sdks/python/apache_beam/examples/wordcount.py#L110

On Wed, Nov 15, 2023 at 12:06 PM Joey Tran 
wrote:

> The motivating example does not use LogElements, just Map(print)
>
> https://beam.apache.org/documentation/transforms/python/aggregation/combineglobally/#example-2-combining-with-a-lambda-function
>
> Some examples of the extraneous logging:
> ```
> 2023-09-08 22:46:37,334 [INFO]   populate_data_channel_coders at 0x7ff2665e1a20> 
> 2023-09-08 22:46:37,336 [INFO] Creating state cache with size 104857600
> 2023-09-08 22:46:37,338 [INFO] Created Worker handler
>  object at 0x7ff2664c9870> for environment
> ref_Environment_default_environment_2 (beam:env:embedded_python:v1, b'')
> ```
>
> The example code itself doesn't set the log level in some playground code.
> Does anyone have a pointer to where? I'm not familiar
>
> On Wed, Nov 15, 2023 at 2:10 PM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> Are the examples using LogElements?
>> https://github.com/apache/beam/blob/2012107a0fa2bb3fedf1b5aedcb49445534b2dad/sdks/python/apache_beam/transforms/util.py#L1271
>>
>> Note that LogElements by default prints to stdout, but can be configured
>> to use a different logger. We could also change the default.
>>
>> On Tue, Nov 14, 2023 at 9:48 AM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 to at least setting the log level to higher than info. Some runner
>>> logging (e.g. job started/done) may be useful.
>>>
>>> On Tue, Nov 14, 2023 at 9:37 AM Joey Tran 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I just had a workshop to demo beam for people at my company and there
>>> was a bit of confusion about whether the beam python playground examples
>>> were even working and it turned out they just got confused by all the
>>> runner logging that is output.
>>> >
>>> > Is this worth keeping? It seems like it'd be a common source of
>>> confusion for new users
>>> >
>>> > Cheers,
>>> > Joey
>>>
>>


Re: Hiding logging for beam playground examples

2023-11-15 Thread Valentyn Tymofieiev via dev
Are the examples using LogElements?
https://github.com/apache/beam/blob/2012107a0fa2bb3fedf1b5aedcb49445534b2dad/sdks/python/apache_beam/transforms/util.py#L1271

Note that LogElements by default prints to stdout, but can be configured to
use a different logger. We could also change the default.

On Tue, Nov 14, 2023 at 9:48 AM Robert Bradshaw via dev 
wrote:

> +1 to at least setting the log level to higher than info. Some runner
> logging (e.g. job started/done) may be useful.
>
> On Tue, Nov 14, 2023 at 9:37 AM Joey Tran 
> wrote:
> >
> > Hi all,
> >
> > I just had a workshop to demo beam for people at my company and there
> was a bit of confusion about whether the beam python playground examples
> were even working and it turned out they just got confused by all the
> runner logging that is output.
> >
> > Is this worth keeping? It seems like it'd be a common source of
> confusion for new users
> >
> > Cheers,
> > Joey
>


Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-14 Thread Valentyn Tymofieiev via dev
+1 (binding).

Tested Python SDK on a batch and a streaming pipeline. Verified that the
memory leak[1] is no longer happening and pyarrow hotfix is applied. Sent
an update to CHANGES.MD to call out both.

Thanks for doing the release and patience with all the RCs.

[1] https://github.com/apache/beam/issues/28246
[2] https://github.com/apache/beam/pull/29435

On Tue, Nov 14, 2023 at 1:27 PM Bruno Volpato via dev 
wrote:

> +1 (non-binding).
>
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
> (Java SDK 11, Dataflow runner).
>
> Thanks Danny!
>
> On Mon, Nov 13, 2023 at 6:07 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #5 for the version
>> 2.52.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>> count towards the final vote, but votes from all community members is
>> encouraged and helpful for finding regressions; you can either test your
>> own use cases or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which includes:
>>
>>- GitHub Release notes [1]
>>- the official Apache source release to be deployed to dist.apache.org 
>> [2],
>>which is signed with the key with fingerprint D20316F712213422 [3]
>>- all artifacts to be deployed to the Maven Central Repository [4]
>>- source code tag "v2.52.0-RC5" [5]
>>- website pull request listing the release [6], the blog post [6],
>>and publishing the API reference manual [7]
>>- Python artifacts are deployed along with the source release to the
>>dist.apache.org [2] and PyPI[8].
>>- Go artifacts and documentation are available at pkg.go.dev [9]
>>- Validation sheet with a tab for 2.52.0 release to help with
>>validation [10]
>>- Docker images published to Docker Hub [11]
>>- PR to run tests against release branch [12]
>>
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>
>> Thanks,
>> Danny
>>
>> [1] https://github.com/apache/beam/milestone/16
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1363/
>> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
>> [6] https://github.com/apache/beam/pull/29331
>> [7] https://github.com/apache/beam-site/pull/655
>> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/29418
>>
>


Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-10 Thread Valentyn Tymofieiev via dev
As mentioned in another thread [1], there is a recently detected
vulnerability in pyarrow [2].

It appears to be a concern for Beam users that we can mitigate in the
upcoming release.

We can reassess early next week in case there is a revised assessment for
severity for this vulnerability. In the meantime I went ahead and created
an issue to track remediation in Beam and marked it as a blocker for 2.52.0
[3],  and sent a PR to consider for master [4] and the release branch [5].

Thanks,
Valentyn

[1] https://lists.apache.org/thread/cdo18g6g7q1804yp2q5pwf8t7s1td8lv
[2] https://lists.apache.org/thread/yhy7tdfjf9hrl9vfrtzo8p2cyjq87v7n
[3] https://github.com/apache/beam/issues/29392
[4] https://github.com/apache/beam/pull/29396
[5] https://github.com/apache/beam/pull/29402


On Fri, Nov 10, 2023 at 12:56 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> +1 (binding).
>
> Tested multi-lang Java/Python jobs.
>
> Thanks,
> Cham
>
> On Fri, Nov 10, 2023, 12:28 PM Svetak Sundhar via dev 
> wrote:
>
>> +1 Non Binding -- tested Python SDK batch.
>>
>>
>> Svetak Sundhar
>>
>>   Data Engineer
>> s vetaksund...@google.com
>>
>>
>>
>> On Fri, Nov 10, 2023 at 2:58 PM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> > Note: the release guide
>>> 
>>>  and blog post
>>> 
>>>  say
>>> the RC image has a tag "${RELEASE_VERSION}_rc{RC_NUM}", whereas the actual
>>> tags on Docker Hub are mostly "${RELEASE_VERSION}rc{RC_NUM}" without the
>>> "_" since 2.40.0. If this is the new standard we may want to update all
>>> places where this is stated?
>>>
>>> Yep, we should update! If you put up a PR I'm happy to approve :)
>>> otherwise I can loop it into my post release docs update.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Fri, Nov 10, 2023 at 2:00 PM Johanna Öjeling via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 (non-binding)

 Tested the Go SDK on Dataflow with own use cases.

 Note: the release guide
 
  and blog post
 
  say
 the RC image has a tag "${RELEASE_VERSION}_rc{RC_NUM}", whereas the actual
 tags on Docker Hub are mostly "${RELEASE_VERSION}rc{RC_NUM}" without the
 "_" since 2.40.0. If this is the new standard we may want to update all
 places where this is stated?

 Johanna

 On Fri, Nov 10, 2023 at 5:56 PM Robert Bradshaw via dev <
 dev@beam.apache.org> wrote:

> +1 (binding)
>
> Artifacts and signatures look good, validated one of the Python wheels
> in a fresh install.
>
> On Fri, Nov 10, 2023 at 7:23 AM Alexey Romanenko
>  wrote:
> >
> > +1 (binding)
> >
> > Java SDK with Spark runner
> >
> > —
> > Alexey
> >
> > On 9 Nov 2023, at 16:44, Ritesh Ghorse via dev 
> wrote:
> >
> > +1 (non-binding)
> >
> > Validated Python SDK quickstart batch and streaming.
> >
> > Thanks!
> >
> > On Thu, Nov 9, 2023 at 9:25 AM Jan Lukavský  wrote:
> >>
> >> +1 (binding)
> >>
> >> Validated Java SDK with Flink runner on own use cases.
> >>
> >>  Jan
> >>
> >> On 11/9/23 03:31, Danny McCormick via dev wrote:
> >>
> >> Hi everyone,
> >> Please review and vote on the release candidate #3 for the version
> 2.52.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific
> comments)
> >>
> >>
> >> Reviewers are encouraged to test their own use cases with the
> release candidate, and vote +1 if no issues are found. Only PMC member
> votes will count towards the final vote, but votes from all community
> members is encouraged and helpful for finding regressions; you can either
> test your own use cases or use cases from the validation sheet [10].
> >>
> >> The complete staging area is available for your review, which
> includes:
> >>
> >> GitHub Release notes [1]
> >> the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> D20316F712213422 [3]
> >> all artifacts to be deployed to the Maven Central Repository [4]
> >> source code tag "v2.52.0-RC3" [5]
> >> website pull request listing the release [6], the blog post [6],
> and publishing the API reference manual [7]
> >> Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> >> Go artifacts and documentation are available at pkg.go.dev [9]
> >> 

Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Valentyn Tymofieiev via dev
>From  https://pypi.org/project/pyarrow-hotfix/ :

pyarrow_hotfix must be imported in your application or library code for it
to take effect.
Just installing the package is not sufficient:

For Beam users, that means that the pipeline code running on the workers
would need to import this module on every worker, for example by adding
this line to DoFn.setup or in main session (if pipeline is composed only
from one file AND uses dill pickler with --save_main_session flag).

We will continue addressing this in
https://github.com/apache/beam/issues/29392.

On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev 
wrote:

> Hi Piotr, thanks for bringing this to the list.
>
> There is a FR to support pyarrow
> https://github.com/apache/beam/issues/28410 . I looked into it briefly in
> https://github.com/apache/beam/pull/28437 but saw some test failures and
> it has been on back burner. Given the news about vulnerability it would
> make sense to prioritize this.
>
> I think we could decouple this from 2.52.0 release since:
>   1) there is a workaround
>   2) new versions of pyarrow haven't been fully tested with Beam
>   3) Beam 2.52.0 fixes some other issues that are known to affecting
> users, e.g. https://github.com/apache/beam/issues/28246
>
> From
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
> :
>   > If you cannot upgrade to PyArrow 14.0.1, you can use the
> pyarrow-hotfix package to disable the vulnerability on older versions of
> PyArrow. However, this is not a permanent solution, and you should upgrade
> to PyArrow 14.0.1 as soon as possible. We could consider adding
> pyarrow-hotfix to the containers for 2.52.0 release. CC: @Danny McCormick
>  (release manager).
>
> Beam users can also install this additional dependency via one of the ways
> described in
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .
>
>
>
> On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
> contact.wisniowskipi...@gmail.com> wrote:
>
>> Hi,
>>
>> Few days ago this one was detected:
>>
>> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>>
>> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>>
>> 1. Is there a reason for not allowing newer versions of pyarrow?
>>
>> 2. Is there any planned effort on updating this to `14.0.1`? Is it
>> possible to push the update to `2.52.0` beam release? I know the beam
>> release is almost there.
>>
>> Best
>>
>> Wiśniowski Piotr
>>
>>


Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Valentyn Tymofieiev via dev
Hi Piotr, thanks for bringing this to the list.

There is a FR to support pyarrow https://github.com/apache/beam/issues/28410
. I looked into it briefly in https://github.com/apache/beam/pull/28437 but
saw some test failures and it has been on back burner. Given the news about
vulnerability it would make sense to prioritize this.

I think we could decouple this from 2.52.0 release since:
  1) there is a workaround
  2) new versions of pyarrow haven't been fully tested with Beam
  3) Beam 2.52.0 fixes some other issues that are known to affecting users,
e.g. https://github.com/apache/beam/issues/28246

From
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
:
  > If you cannot upgrade to PyArrow 14.0.1, you can use the pyarrow-hotfix
package to disable the vulnerability on older versions of PyArrow. However,
this is not a permanent solution, and you should upgrade to PyArrow 14.0.1
as soon as possible. We could consider adding pyarrow-hotfix to the
containers for 2.52.0 release. CC: @Danny McCormick
 (release manager).

Beam users can also install this additional dependency via one of the ways
described in
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .



On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
contact.wisniowskipi...@gmail.com> wrote:

> Hi,
>
> Few days ago this one was detected:
>
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>
> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>
> 1. Is there a reason for not allowing newer versions of pyarrow?
>
> 2. Is there any planned effort on updating this to `14.0.1`? Is it
> possible to push the update to `2.52.0` beam release? I know the beam
> release is almost there.
>
> Best
>
> Wiśniowski Piotr
>
>


Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-06 Thread Valentyn Tymofieiev via dev
> PR to run tests against release branch [12].

 https://github.com/apache/beam/pull/28663 is closed and test signal is no
longer available. did all the tests pass?

On Fri, Oct 6, 2023 at 5:32 AM Alexey Romanenko 
wrote:

> +1 (binding)
>
> —
> Alexey
>
> > On 5 Oct 2023, at 18:38, Jean-Baptiste Onofré  wrote:
> >
> > +1 (binding)
> >
> > Thanks !
> > Regards
> > JB
> >
> > On Tue, Oct 3, 2023 at 7:58 PM Kenneth Knowles  wrote:
> >>
> >> Hi everyone,
> >>
> >> Please review and vote on the release candidate #1 for the version
> 2.51.0, as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases or use cases from the validation sheet [10].
> >>
> >> The complete staging area is available for your review, which includes:
> >>
> >> GitHub Release notes [1],
> >> the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint  [3],
> >> all artifacts to be deployed to the Maven Central Repository [4],
> >> source code tag "v1.2.3-RC3" [5],
> >> website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> >> Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle
> JDK JDK_VERSION.
> >> Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> >> Go artifacts and documentation are available at pkg.go.dev [9]
> >> Validation sheet with a tab for 1.2.3 release to help with validation
> [10].
> >> Docker images published to Docker Hub [11].
> >> PR to run tests against release branch [12].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >>
> >> For guidelines on how to try the release in your projects, check out
> our blog post at https://beam.apache.org/blog/validate-beam-release/.
> >>
> >> Thanks,
> >> Kenn
> >>
> >> [1] https://github.com/apache/beam/milestone/15
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.51.0
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1356/
> >> [5] https://github.com/apache/beam/tree/v2.51.0-RC1
> >> [6] https://github.com/apache/beam/pull/28800
> >> [7] https://github.com/apache/beam-site/pull/649
> >> [8] https://pypi.org/project/apache-beam/2.51.0rc1/
> >> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.51.0-RC1/go/pkg/beam
> >> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=437054928
> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> >> [12] https://github.com/apache/beam/pull/28663
>
>


Re: [LAZY CONSENSUS] Create separate repository for Swift SDK

2023-09-25 Thread Valentyn Tymofieiev via dev
On Mon, Sep 25, 2023 at 9:03 AM Kenneth Knowles  wrote:

> Hi all,
>
> I propose to unblock Byron's work by creating a new repository for the
> Beam Swift SDK. This will be the first of its kind, and break from
> tradition of having Beam be kind of a mini-mono-repo.
>
> Discussion of the Swift SDK and request for a separate repo is at
> https://lists.apache.org/thread/25tp4yoptqxzty8t4fqznqxc3cwklpss
>

Additional context (since there was a branching between dev and user
threads):   https://lists.apache.org/thread/pc0s0953z6z09z597h1rwdskk2y00hmo
. From the first message: *the "Swift Way" would be to have it in its own
repo so that it can easily be used from the Swift Package Manager. *



> I have created this thread to clearly separate this one issue, and clearly
> record if we have consensus (or not).
>
> If no one has an objection or further discussion needed in 72 hours, it
> can be considered approved and I will create the repository. See
> https://community.apache.org/committers/lazyConsensus.html
>
> Kenn
>


Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-30 Thread Valentyn Tymofieiev via dev
We have identified the leak. https://github.com/apache/beam/issues/28246
has the details and workarounds.

On Mon, Aug 28, 2023 at 9:57 AM Valentyn Tymofieiev 
wrote:

> This appears to be a recent issue reported also by others (e.g.
> https://github.com/apache/beam/issues/28142), it's being actively
> investigated. Therefore, it is unlikely that memory fragmentation is an
> issue.
>
> On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev 
> wrote:
>
>> Hi, thanks for reaching out.
>>
>> I'd be curious to see whether the memory consumption patterns you observe
>> change if you switch the memory allocator library.
>>
>> For example, you could try to use a custom container, install jemalloc
>> and enable it. See:
>> https://beam.apache.org/documentation/runtime/environments ,
>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers
>>
>> Your Dockerfile might look like the following:
>>
>> FROM apache/beam_python3.10_sdk:2.49.0
>>
>> # Prebuilt other dependencies
>> RUN apt-get update \
>>   && apt-get install -y libjemalloc-dev
>>
>> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
>>
>> # Set the entrypoint to the Apache Beam SDK launcher.
>> ENTRYPOINT ["/opt/apache/beam/boot"]
>>
>>
>> On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee  wrote:
>>
>>> Hello!
>>>
>>> I'm an avid apache beam user (on Dataflow) and we use beam to stream
>>> blockchain data to various sinks. I recently noticed some memory issues
>>> across all our pipelines but have yet to be able to find the root cause and
>>> was hoping someone on your team might be able to help. If this isn't the
>>> right avenue for it, please let me know how I should reach out.
>>>
>>> The details are here in stackoverflow:
>>>
>>>
>>> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>>>
>>> Thanks,
>>> Chenghan
>>> CTO | Allium
>>>
>>


Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-28 Thread Valentyn Tymofieiev via dev
This appears to be a recent issue reported also by others (e.g.
https://github.com/apache/beam/issues/28142), it's being actively
investigated. Therefore, it is unlikely that memory fragmentation is an
issue.

On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev 
wrote:

> Hi, thanks for reaching out.
>
> I'd be curious to see whether the memory consumption patterns you observe
> change if you switch the memory allocator library.
>
> For example, you could try to use a custom container, install jemalloc and
> enable it. See: https://beam.apache.org/documentation/runtime/environments
> , https://cloud.google.com/dataflow/docs/guides/using-custom-containers
>
> Your Dockerfile might look like the following:
>
> FROM apache/beam_python3.10_sdk:2.49.0
>
> # Prebuilt other dependencies
> RUN apt-get update \
>   && apt-get install -y libjemalloc-dev
>
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
>
> # Set the entrypoint to the Apache Beam SDK launcher.
> ENTRYPOINT ["/opt/apache/beam/boot"]
>
>
> On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee  wrote:
>
>> Hello!
>>
>> I'm an avid apache beam user (on Dataflow) and we use beam to stream
>> blockchain data to various sinks. I recently noticed some memory issues
>> across all our pipelines but have yet to be able to find the root cause and
>> was hoping someone on your team might be able to help. If this isn't the
>> right avenue for it, please let me know how I should reach out.
>>
>> The details are here in stackoverflow:
>>
>>
>> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>>
>> Thanks,
>> Chenghan
>> CTO | Allium
>>
>


Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-25 Thread Valentyn Tymofieiev via dev
+1

Verified that the issue detected in RC0 has been resolved. Successfully ran
a Python pipeline on ARM Dataflow workers.

Noted that Dataflow runner logs became less verbose as the result of
https://github.com/apache/beam/pull/27788. One line that I often pay
attention to no longer appears at the default  INFO log level:

```
INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-26T03:45:35.126Z:
JOB_MESSAGE_DETAILED: All workers have finished the startup processes and
began to receive work requests.
```

Dataflow service can be adjusted to compensate for this (internal change:
http://cl/560265419 ).

On Fri, Aug 25, 2023 at 3:05 PM Bruno Volpato via dev 
wrote:

> +1 (non-binding).
>
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
> (Java SDK 11, Dataflow runner).
>
> Thanks Robert!
>
> On Thu, Aug 24, 2023 at 7:12 PM Robert Burke  wrote:
>
>> Two minor erata from the previous email:
>>
>> The validation spreadsheet link should be:
>>
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
>>
>> And the source code tag is: "v2.50.0-RC2"
>>
>> On 2023/08/24 23:09:23 Robert Burke wrote:
>> > Hi everyone,
>> > Please review and vote on the release candidate #2 for the version
>> 2.50.0,
>> > as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific comments)
>> >
>> >
>> > Reviewers are encouraged to test their own use cases with the release
>> > candidate, and vote +1 if
>> > no issues are found. Only PMC member votes will count towards the final
>> > vote, but votes from all
>> > community members is encouraged and helpful for finding regressions; you
>> > can either test your own
>> > use cases or use cases from the validation sheet [10].
>> >
>> > Issues noted in RC1 vote proposal [13] have now been resolved.
>> >
>> > The staging area is available for your review, which includes:
>> > * GitHub Release notes [1],
>> > * the official Apache source release to be deployed to dist.apache.org
>> [2],
>> > which is signed with the key with fingerprint 02677FF4371A3756 (
>> > lostl...@apache.org) or D20316F712213422
>> > (GitHub Action automated) [[3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag "v2.50.0-RC2" [5],
>> > * website pull request listing the release [6], the blog post [6], and
>> > publishing the API reference manual [7].
>> > * Java artifacts were built with Gradle 7.5.1 and OpenJDK
>> (Temurin)(build
>> > 1.8.0_382-b05).
>> > * Python artifacts are deployed along with the source release to the
>> > dist.apache.org [2] and PyPI[8].
>> > * Go artifacts and documentation are available at pkg.go.dev [9]
>> > * Validation sheet with a tab for 2.50.0 release to help with validation
>> > [10].
>> > * Docker images published to Docker Hub [11].
>> > * PR to run tests against release branch [12].
>> >
>> > The vote will be open for at least 72 hours. It is adopted by majority
>> > approval, with at least 3 PMC affirmative votes.
>> >
>> > For guidelines on how to try the release in your projects, check out our
>> > blog post at https://beam.apache.org/blog/validate-beam-release/.
>> >
>> > Thanks,
>> > Robert Burke
>> > Apache Beam 2.50.0 Release Manager
>> >
>> > [1] https://github.com/apache/beam/milestone/14
>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1355/
>> > [5] https://github.com/apache/beam/tree/v2.50.0-RC2
>> > [6] https://github.com/apache/beam/pull/28055
>> > [7] https://github.com/apache/beam-site/pull/648
>> > [8] https://pypi.org/project/apache-beam/2.50.0rc2/
>> > [9]
>> >
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0-RC2/go/pkg/beam
>> > [10]
>> >
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
>> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> > [12] https://github.com/apache/beam/pull/27962
>> > [13] https://lists.apache.org/thread/xgx49zshms7253lfx6d6lsnvwf7tyyfp
>> >
>>
>


Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-22 Thread Valentyn Tymofieiev via dev
Hi, thanks for reaching out.

I'd be curious to see whether the memory consumption patterns you observe
change if you switch the memory allocator library.

For example, you could try to use a custom container, install jemalloc and
enable it. See: https://beam.apache.org/documentation/runtime/environments
, https://cloud.google.com/dataflow/docs/guides/using-custom-containers

Your Dockerfile might look like the following:

FROM apache/beam_python3.10_sdk:2.49.0

# Prebuilt other dependencies
RUN apt-get update \
  && apt-get install -y libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so

# Set the entrypoint to the Apache Beam SDK launcher.
ENTRYPOINT ["/opt/apache/beam/boot"]


On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee  wrote:

> Hello!
>
> I'm an avid apache beam user (on Dataflow) and we use beam to stream
> blockchain data to various sinks. I recently noticed some memory issues
> across all our pipelines but have yet to be able to find the root cause and
> was hoping someone on your team might be able to help. If this isn't the
> right avenue for it, please let me know how I should reach out.
>
> The details are here in stackoverflow:
>
>
> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>
> Thanks,
> Chenghan
> CTO | Allium
>


Re: [VOTE] Release 2.50.0, release candidate #1

2023-08-21 Thread Valentyn Tymofieiev via dev
I tried running a Dataflow Python pipeline on RC1  and got an error:

Pipeline construction environment and pipeline runtime environment are not
compatible. If you use a custom container image, check that the Python
interpreter minor version and the Apache Beam version in your image match
the versions used at pipeline construction time. Submission environment:
beam:version:sdk_base:apache/beam_python3.11_sdk:2.50.0rc1. Runtime
environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.50.0.
Worker ID: beamapp-valentyn-08220117-08211817-m76c-harness-v38w

Opened https://github.com/apache/beam/issues/28084 to track.


On Mon, Aug 21, 2023 at 10:02 AM Robert Burke  wrote:

> Hi Beamers,
>
> Today I'm working on the aforementioned gaps in this RC blocking.
>
> However, it's still valuable to validate and vote on the remainder of the
> RC in order to ensure a timely 2.50.0 release, and finding whether we'll
> need an RC2 or not.
>
> Robert Burke
> Apache Beam 2.50.0 Release Manager
>
> On 2023/08/18 00:58:00 Robert Burke wrote:
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> 2.50.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > Reviewers are encouraged to test their own use cases with the release
> > candidate, and vote +1 if
> > no issues are found. Only PMC member votes will count towards the final
> > vote, but votes from all
> > community members is encouraged and helpful for finding regressions; you
> > can either test your own
> > use cases or use cases from the validation sheet [10].
> >
> > Additional notes about this RC:
> >
> > * There were issues in starting Dataflow clones portable containers to
> > Google Container Repository and Google Artifact Registry, so those images
> > may not yet be available at those locations, which may impact starting
> jobs
> > with the RC against Google Cloud Dataflow.
> >   * This may be worked around by explicitly setting the portable
> container
> > to use with the --sdkContainerImage flag for Java, or the
> > --environment_config flag for Python and Go.
> > * Due to an issue with my build environment, there were issues producing
> > two artifacts for this RC.
> >   * The Typescript SDK container has not yet been built or pushed. As an
> > experimental SDK this is not a release blocker. However, one will
> > eventually be published. In the meantime, the 2.49.0 container should be
> > sufficient.
> >   * Due to an issue with my build environment, the PyDocs are not
> currently
> > part of the Documentation PR update.  This will block the final release
> of
> > 2.50.0
> >   * The current plan is to spend improve the Github Actions for releases
> to
> > be able to provide these artifacts, instead of performing a local fix to
> my
> > environment, to simplify further releases.
> >
> >
> > The staging area is available for your review, which includes:
> > * GitHub Release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> [2],
> > which is signed with the key with fingerprint 02677FF4371A3756 (
> > lostl...@apache.org)  or D20316F712213422
> > (GitHub Action automated) [[3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.50.0-RC1" [5],
> > * website pull request listing the release [6], the blog post [6], and
> > publishing the API reference manual [7].
> > * Java artifacts were built with Gradle 7.5.1 and OpenJDK (Temurin)(build
> > 1.8.0_382-b05).
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org [2] and PyPI[8].
> > * Go artifacts and documentation are available at pkg.go.dev [9]
> > * Validation sheet with a tab for 2.50.0 release to help with validation
> > [10].
> > * Docker images published to Docker Hub [11].
> > * PR to run tests against release branch [12].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > For guidelines on how to try the release in your projects, check out our
> > blog post at https://beam.apache.org/blog/validate-beam-release/.
> >
> > Thanks,
> > Robert Burke
> > Apache Beam 2.50.0 Release Manager
> >
> > [1] https://github.com/apache/beam/milestone/14
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1353/
> > [5] https://github.com/apache/beam/tree/v2.50.0-RC1
> > [6] https://github.com/apache/beam/pull/28055
> > [7] https://github.com/apache/beam-site/pull/647
> > [8] https://pypi.org/project/apache-beam/2.50.0rc1/
> > [9]
> >
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0-RC1/go/pkg/beam
> > [10]
> >
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=
> .
> > ..
> > [11] 

Re: [RFC] Bootloader Buffered Logging

2023-08-16 Thread Valentyn Tymofieiev via dev
Thanks, Jack! left some comments, looking forward to this work!

On Wed, Aug 16, 2023 at 10:31 AM Robert Burke  wrote:

> I've added some comments but generally +1 on this.
>
> A later change might be able to build from this to ensure the various
> STDErr and STDOut logs from the SDK harness executions are always plumbed
> as described.
>
> But that would take more thought since other incidental logs from the
> users worker binary (sic) might be misconstrued as serious when they were
> largely benign noise previously ignored (since they were invisible).
>
> On Wed, Aug 16, 2023, 9:57 AM Jack McCluskey via dev 
> wrote:
>
>> Hey everyone,
>>
>> I've written a small design doc around implementing some buffered logging
>> for the Beam boot.go scripts that is available at
>> https://s.apache.org/beam-buffered-logging. This should help surface
>> errors that occur during worker set-up (like issues with dependency
>> installation) that tend to be logged improperly at INFO.
>>
>> Thanks,
>>
>> Jack McCluskey
>>
>> --
>>
>>
>> Jack McCluskey
>> SWE - DataPLS PLAT/ Dataflow ML
>> RDU
>> jrmcclus...@google.com
>>
>>
>>


Re: [RFC] Model Per Key RunInference

2023-07-27 Thread Valentyn Tymofieiev via dev
Thanks Danny! The narrative is well structured and easy to follow. I
encourage more folks to take a look. I left a couple of comments, mostly
about plans for memory management.

On Thu, Jul 20, 2023 at 7:47 AM Danny McCormick via dev 
wrote:

> Hey everyone! Today, many users have pipelines that choose a single model
> for inference from 100s or 1000s of models based on properties of the data.
> Unfortunately, RunInference does not support this use case. I put
> together a proposal for RunInference that allows a single keyed
> RunInference transform to serve a different model for each key. I'd
> appreciate any thoughts or comments!
>
>
> https://docs.google.com/document/d/1kj3FyWRbJu1KhViX07Z0Gk0MU0842jhYRhI-DMhhcv4/edit?usp=sharing
>
> Thanks,
> Danny
>


Re: [Feature Proposal] Add ARM Support to Beam SDK Container Images

2023-07-18 Thread Valentyn Tymofieiev via dev
Hi Celeste,

Thanks for the proposal and researching the options. Using multi-arch
images seems like a good way to reduce the complexity associated with
correctly selecting  the architecture on the runner. It sounds like there
may be implications for release process, which future release managers may
need to be aware of, and there might be an increase in some test suites
time now once we build ARM images.

Left a few comments on the doc and happy to help with PR review when it is
ready.

bcc'ing a few folks who might have feedback or to whom this proposal might
be of interest.

Valentyn



On Tue, Jul 18, 2023 at 3:12 PM Celeste Zeng 
wrote:

> Hi everyone,
>
> My name is Celeste. I work for the GCP Dataflow team and I am trying to
> add ARM support to Beam SDK container images. The ultimate goal is to make
> the released Beam SDK container images become multi-arch images, which
> support both x86 and ARM. I compiled the following doc to include the
> feature overview, my proposed implementation plan, as well as testing plan.
> And I appreciate any feedback!
>
>
> https://docs.google.com/document/d/1ikbEJNsFH1D9HqiMqiVyyMlNpDgSqxXK22nUoetzW6I/edit?usp=sharing
>
> Also, please refer to the pull request to see proposed changes:
> https://github.com/apache/beam/pull/27311
>
> Thanks a lot!
>
> Sincerely,
> Celeste Zeng
> celestezen...@gmail.com
>


Re: [VOTE] Release 2.49.0, release candidate #2

2023-07-14 Thread Valentyn Tymofieiev via dev
+1. Tested a few python pipelines on Dataflow Runner V1 and Runner V2.



On Thu, Jul 13, 2023 at 12:54 PM Svetak Sundhar via dev 
wrote:

> +1 (Non-Binding)
>
> Python quickstart Dataflow runner.
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Thu, Jul 13, 2023 at 5:03 AM Jan Lukavský  wrote:
>
>> +1 (binding)
>>
>> Tested Java SDK with FlinkRunner.
>>
>>  Jan
>> On 7/13/23 02:30, Bruno Volpato via dev wrote:
>>
>> +1 (non-binding).
>>
>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>> (Java SDK 11, Dataflow runner).
>>
>> Thanks Yi!
>>
>> On Tue, Jul 11, 2023 at 4:23 PM Yi Hu via dev 
>> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version
>>> 2.49.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if
>>> no issues are found. Only PMC member votes will count towards the final
>>> vote, but votes from all
>>> community members is encouraged and helpful for finding regressions; you
>>> can either test your own
>>> use cases or use cases from the validation sheet [10].
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with
>>> fingerprint either CB6974C8170405CB (y...@apache.org) or
>>> D20316F712213422 (GitHub Action automated) [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.49.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle GRADLE_VERSION and
>>> OpenJDK/Oracle JDK JDK_VERSION.
>>>
>> nit: versions were missing.

> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI [8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.49.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> blog post at /blog/validate-beam-release/.
>>>
>>> Thanks,
>>> Release Manager
>>>
>>> [1] https://github.com/apache/beam/milestone/13
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.49.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1349/
>>> [5] https://github.com/apache/beam/tree/v2.49.0-RC2
>>> [6] https://github.com/apache/beam/pull/27374 (unchanged since RC1)
>>> [7] https://github.com/apache/beam-site/pull/646  (unchanged since RC1)
>>> [8] https://pypi.org/project/apache-beam/2.49.0rc2/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.49.0-RC2/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=934901728
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/27307
>>>
>>> --
>>>
>>> Yi Hu, (he/him/his)
>>>
>>> Software Engineer
>>>
>>>
>>>


Re: Best patterns for a polling transform

2023-06-22 Thread Valentyn Tymofieiev via dev
> The below code runs fine with a single worker but with multiple workers
there are duplicate values.
> I’m using TimeDomain.WATERMARK here due to it simply not working when
using REAL_TIME. The docs seem to suggest REAL_TIME would be the way to do
this, however there seems to be no guarantee that a REAL_TIME callback will
run.

It seems that you are using Python direct runner for experimentation. The
streaming support in Python direct runner is currently rather limited:
https://github.com/apache/beam/issues/21987 , it is possible that direct
runner doesn't correctly implement the streaming semantics. It sounds like
we should identify whether this is a problem in the SDK or in the
DirectRunner implementation, and file issues accordingly. Streaming direct
runner issues use this umbrella issue:
https://github.com/apache/beam/issues/21987. I would also experiment with
FlinkRunner or DataflowRunner. Also the streaming semantics behavior should
be consistent across SDK, so different behavior between Python and  Java
SDK would implicate an SDK bug.


On Thu, Jun 22, 2023 at 10:00 AM Chad Dombrova  wrote:

> I’m also interested in the answer to this.  This is essential for reading
> from many types of data sources.
>
>
> On Tue, Jun 20, 2023 at 2:57 PM Sam Bourne  wrote:
>
>> +dev to see if anyone has any suggestions.
>>
>> On Fri, Jun 16, 2023 at 5:46 PM Sam Bourne  wrote:
>>
>>> Hello beam community!
>>>
>>> I’m having trouble coming up with the best pattern to *eagerly* poll.
>>> By eagerly, I mean that elements should be consumed and yielded as soon as
>>> possible. There are a handful of experiments that I’ve tried and my latest
>>> attempt using the timer API seems quite promising, but is operating in a
>>> way that I find rather unintuitive. My solution was to create a sort of
>>> recursive timer callback - which I found one example
>>> 
>>> of within the beam test code.
>>>
>>> I have a few questions:
>>>
>>> 1) The below code runs fine with a single worker but with multiple
>>> workers there are duplicate values. It seems that the callback and snapshot
>>> of the state is provided to multiple workers and the number of duplications
>>> increases with the number of workers. Is this due to the values being
>>> provided to timer.set?
>>>
>>> 2) I’m using TimeDomain.WATERMARK here due to it simply not working
>>> when using REAL_TIME. The docs
>>> 
>>> seem to suggest REAL_TIME would be the way to do this, however there
>>> seems to be no guarantee that a REAL_TIME callback will run. In this
>>> sample setting the timer to REAL_TIME will simply not ever fire the
>>> callback. Interestingly, if you call timer.set with any value less than
>>> the current time.time(), then the callback will run, however it seems
>>> to fire immediately regardless of the value (and in this sample will
>>> actually raise an AssertionError
>>> 
>>> ).
>>>
>>> I’m happy for suggestions!
>>> -Sam
>>>
>>> import randomimport threading
>>> import apache_beam as beamimport apache_beam.coders as codersimport 
>>> apache_beam.transforms.combiners as combinersimport 
>>> apache_beam.transforms.userstate as userstateimport 
>>> apache_beam.utils.timestamp as timestampfrom 
>>> apache_beam.options.pipeline_options import PipelineOptions
>>> class Log(beam.PTransform):
>>>
>>> lock = threading.Lock()
>>>
>>> @classmethod
>>> def _log(cls, element, label):
>>> with cls.lock:
>>> # This just colors the print in terminal
>>> print('\033[1m\033[92m{}\033[0m : {!r}'.format(label, element))
>>> return element
>>>
>>> def expand(self, pcoll):
>>> return pcoll | beam.Map(self._log, self.label)
>>> class EagerProcess(beam.DoFn):
>>>
>>> BUFFER_STATE = userstate.BagStateSpec('buffer', coders.PickleCoder())
>>> POLL_TIMER = userstate.TimerSpec('timer', beam.TimeDomain.WATERMARK)
>>>
>>> def process(
>>> self,
>>> element,
>>> buffer=beam.DoFn.StateParam(BUFFER_STATE),
>>> timer=beam.DoFn.TimerParam(POLL_TIMER),
>>> ):
>>> _, item = element
>>>
>>> for i in range(item):
>>> buffer.add(i)
>>>
>>> timer.set(timestamp.Timestamp.now() + 
>>> timestamp.Duration(seconds=10))
>>>
>>> @userstate.on_timer(POLL_TIMER)
>>> def flush(
>>> self,
>>> buffer=beam.DoFn.StateParam(BUFFER_STATE),
>>> timer=beam.DoFn.TimerParam(POLL_TIMER),
>>> ):
>>> cache = buffer.read()
>>> buffer.clear()
>>>
>>> requeue = False
>>> for item in cache:
>>> if random.random() < 0.1:
>>> yield item
>>> else:
>>>   

Re: [VOTE] Release 2.47.0, release candidate #3

2023-05-10 Thread Valentyn Tymofieiev via dev
+1.

Checked Python streaming wordcount, Dataflow containers and some
test results running on RC that I care aboutt.

On Wed, May 10, 2023 at 3:22 PM Ritesh Ghorse via dev 
wrote:

> +1 (non-binding)
>
> Validated Go SDK Quickstart on Direct and Dataflow runner
>
> On Wed, May 10, 2023 at 4:23 AM Jan Lukavský  wrote:
>
>> +1 (binding)
>>
>> Tested with Java SDK and FlinkRunner.
>>
>>  Jan
>> On 5/9/23 08:44, Chamikara Jayalath via dev wrote:
>>
>> Verified that new containers are valid. Changing my vote to +1
>>
>> Thanks for fixing this Jack.
>>
>> - Cham
>>
>> On Mon, May 8, 2023 at 2:05 PM Jack McCluskey 
>> wrote:
>>
>>> I've spent the day putting together an environment on a debian bullseye
>>> container to re-build containers with a matching Glibc version. The Java,
>>> Go, Python, and Typescript containers have all been re-built and pushed to
>>> Docker Hub. The underlying code did not change, which fortunately means we
>>> can dodge having to build an RC4 to fix this issue.
>>>
>>> The GCR copy of the Go container has already been updated, while the
>>> Java and Python containers are currently being copied over.
>>>
>>> On Mon, May 8, 2023 at 11:16 AM Robert Bradshaw 
>>> wrote:
>>>
 Thanks for catching this. This does seem severe enough that we need to
 fix it before the release.

 On Sat, May 6, 2023 at 10:15 PM Chamikara Jayalath via dev <
 dev@beam.apache.org> wrote:

> Seems like Python SDK harness containers built for the current RC are
> broken. Please see https://github.com/apache/beam/issues/26576 for
> updates.
>
> -1 for the current vote due to this.
>
> Seems like this can be addressed by reverting
> https://github.com/apache/beam/pull/26054 and re-building the
> containers.
>
> Thanks,
> Cham
>
> On Sat, May 6, 2023 at 8:00 AM Svetak Sundhar <
> svetaksund...@google.com> wrote:
>
>> +1 (Non-Binding)
>>
>> I tested Python Quick Start on Dataflow Runner as well
>>
>>
>>
>> Svetak Sundhar
>>
>>   Technical Solutions Engineer, Data
>> s vetaksund...@google.com
>>
>>
>>
>> On Sat, May 6, 2023 at 4:44 AM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>> I'm seeing a regression when running Java x-lang jobs using the RC.
>>> Created https://github.com/apache/beam/issues/26576.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Fri, May 5, 2023 at 11:11 PM Austin Bennett 
>>> wrote:
>>>
 +1 ( non-binding )

 On Fri, May 5, 2023 at 10:49 PM Jean-Baptiste Onofré <
 j...@nanthrax.net> wrote:

> +1 (binding)
>
> Regards
> JB
>
> On Fri, May 5, 2023 at 4:52 AM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #3 for the
>> version 2.47.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>>
>> Reviewers are encouraged to test their own use cases with the
>> release candidate, and vote +1 if no issues are found. *Non-PMC
>> members are allowed and encouraged to vote. Please help validate the
>> release for your use case!*
>>
>> The complete staging area is available for your review, which
>> includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with
>> fingerprint DF3CBA4F3F4199F4 [3],
>> * all artifacts to be deployed to the Maven Central Repository
>> [4],
>> * source code tag "v2.47.0-RC3" [5],
>> * website pull request listing the release [6], the blog post
>> [6], and publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle
>> JDK 8.0.322.
>> * Python artifacts are deployed along with the source release to
>> the dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.47.0 release to help with
>> validation [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> The GCR copies of the FnAPI containers are rolling out now, they
>> should be out within the next 8 hours or so.
>>
>> For guidelines on how to try the release in your projects, check
>> out our blog post at /blog/validate-beam-release/.
>>
>> 

Re: [PROPOSAL] Preparing for 2.48.0 Release

2023-05-09 Thread Valentyn Tymofieiev via dev
> Absent a compelling reason otherwise, my view would be to just stick with
the statement of dropping it as soon as it goes out of support

This is the process we agreed upon last time we discussed the version
support policy on dev@.

On Fri, May 5, 2023 at 6:18 PM Robert Bradshaw via dev 
wrote:

> On Fri, May 5, 2023 at 6:27 AM Anand Inguva via dev 
> wrote:
> >
> > >> Is there a significant gain in dropping 3.7 support before the cut?
> >
> > No, I think it is just a matter of how soon we want to do it.
>
> Absent a compelling reason otherwise, my view would be to just stick
> with the statement of dropping it as soon as it goes out of support,
> but not sooner. Yeah, it's only a matter of a couple of weeks, but the
> subsequent release isn't far behind.
>
> > On Thu, May 4, 2023 at 12:11 PM Ritesh Ghorse 
> wrote:
> >>
> >> +1 to get target for 2.48.0
> >>
> >> On Thu, May 4, 2023 at 11:33 AM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
> >>>
> >>> I'd suggest shooting for 2.48.0 so we're ahead of the end-of-support
> date. We're also supporting 5 different Python versions in 2.47.0, it's
> probably for the best to try and pare that down.
> >>>
> >>> On Thu, May 4, 2023 at 11:25 AM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
> 
>  Thanks Ritesh!!
> 
>  Python 3.7 support is going to end on June 27th 2023. Beam 2.48.0 may
> get released ~1-2 weeks earlier of that date.
> 
>  My question here is should we target 2.48.0 or 2.49.0 to stop
> supporting Python 3.7 for beam?
> 
>  Thanks,
>  Anand
> 
>  On Wed, May 3, 2023 at 10:25 PM Jeff Zhang  wrote:
> >
> > Thank you!  @Ahmet Altay
> >
> > On Thu, May 4, 2023 at 10:17 AM Ahmet Altay 
> wrote:
> >>
> >> It is every 6 weeks. There is also a published calendar for release
> branch cut dates:
> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
> >>
> >> On Wed, May 3, 2023 at 10:13 PM Jeff Zhang 
> wrote:
> >>>
> >>> I just saw another thread about the vote of 2.47.0 release, just
> curious to know what is beam's release cadence, is it monthly?
> >>>
> >>>
> >>> On Thu, May 4, 2023 at 1:58 AM Kenneth Knowles 
> wrote:
> 
>  Excellent, thank you!
> 
>  On Wed, May 3, 2023 at 7:21 AM Ahmet Altay via dev <
> dev@beam.apache.org> wrote:
> >
> > Thank you Ritesh!
> >
> > On Wed, May 3, 2023 at 10:00 AM Ritesh Ghorse via dev <
> dev@beam.apache.org> wrote:
> >>
> >> Hey everyone,
> >>
> >> The next release branch 2.48.0 cut is scheduled for May 17th,
> according to
> >> the release calendar [1].
> >>
> >> I would like to volunteer myself to do this release. I'll cut
> the branch on the scheduled date, and cherrypick release-blocking fixes
> later.
> >>
> >> Please help me make sure the release goes smoothly by:
> >> - Making sure that any unresolved release blocking issues for
> 2.48.0 have their "Milestone" marked as "2.48.0 Release".
> >> - Reviewing the current release blockers [2] and remove the
> Milestone if they don't meet the criteria at [3].
> >>
> >> [1]
> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
> >> [2] https://github.com/apache/beam/milestone/12
> >> [3] https://beam.apache.org/contribute/release-blocking/
> >>
> >> Thanks!
> >>
> >> --
> >> Regards,
> >> Ritesh Ghorse
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
>


Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2023-05-02 Thread Valentyn Tymofieiev via dev
Hi All,

just wanted to give a quick update on the effort discussed here:

The action items from the retrospective are tracked in
https://github.com/apache/beam/issues/25652.

Many outdated dependencies were updated in
https://github.com/apache/beam/pull/24599 by +Anand Inguva
  and remaining older libraries (dill, apitools,
pandas) are non-trival, well known, tracked separately and also made
progress, in particular @Bjorn Pedersen  is
working on removing aspects of apitools dependencies, and I took a stab at
updating dill .  We are
starting to test Beam actively against pre-released versions of our
dependencies (you may have seen threads from Anand about it) and I wrote
some guidelines to Python SDK maintainers pertaining to dependency
management in
https://docs.google.com/document/d/1euZogGjbW4VZNJMFrA5AL1keR5gZO5l45H9b9CoQ0SI/edit,
which I plan to merge in Beam website and/or wiki. Feel free to take a
look, especially if you are committing code to Python SDK.

Once again thanks to everyone who provided feedback so far.

Valentyn

On Fri, Aug 26, 2022 at 3:40 PM Kerry Donny-Clark 
wrote:

> Jarek, I really appreciate you sharing your experience and expertise here.
> I think Beam would benefit from adopting some of these practices.
> Kerry
>
> On Fri, Aug 26, 2022, 7:35 AM Jarek Potiuk  wrote:
>
>>
>>> I'm curious Jarek, does Airflow take any dependencies on popular
>>> libraries like pandas, numpy, pyarrow, scipy, etc... which users are likely
>>> to have their own dependency on? I think these dependencies are challenging
>>> in a different way than the client libraries - ideally we would support a
>>> wide version range so as not to require users to upgrade those libraries in
>>> lockstep with Beam. However in some cases our dependency is pretty tight
>>> (e.g. the DataFrame API's dependency on pandas), so we need to make sure to
>>> explicitly test with multiple different versions. Does Airflow have any
>>> similar issues?
>>>
>>
>> Yes we do (all of those I think :) ). Complete set of all our deps can be
>> found here
>> https://github.com/apache/airflow/blob/constraints-main/constraints-3.9.txt
>> (continuously updated and we have different sets for different python
>> versions).
>>
>> We took a rather interesting and unusual approach (more details in my
>> talk) - mainly because Airflow is both an application to install (for
>> users) and library to use (for DAG authors) and both have contradicting
>> expectations (installation stability versus flexibility in
>> upgrading/downgrading dependencies). Our approach is really smart in making
>> sure water and fire play well with each other.
>>
>> Most of those dependencies are coming from optional extras (list of all
>> extras here:
>> https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html).
>> More often than not the "problematic" dependencies you mention are
>> transitive dependencies through some client libraries we use (for example
>> Apache Beam SDK is a big contributor to those :).
>>
>> Airflow "core" itself has far less dependencies
>> https://github.com/apache/airflow/blob/constraints-main/constraints-no-providers-3.9.txt
>> (175 currently) and we actively made sure that all "pandas" of this world
>> are only optional extra deps.
>>
>> Now - the interesting thing is that we use "constraints'' (the links you
>> with dependencies that I posted are those constraints) to pin versions of
>> the dependencies that are "golden" - i.e. we test those continuously in our
>> CI and we automatically upgrade the constraints when all the unit and
>> integration tests pass.
>> There is a little bit of complexity and sometimes conflicts to handle (as
>> `pip` has to find the right set of deps that will work for all our optional
>> extras), but eventually we have really one "golden" set of constraints at
>> any moment in time main (or v2-x branch - we have a separate set for each
>> branch) that we are dealing with. And this is the only "set" of dependency
>> versions that Airflow gets tested with. Note - these are *constraints *not
>> *requirements *- that makes a whole world of difference.
>>
>> Then when we release airflow, we "freeze" the constraints with the
>> version tag. We know they work because all our tests pass with them in CI.
>>
>> Then we communicate to our users (and we use it in our Docker image) that
>> the only "supported" way of installing airflow is with using `pip` and
>> constraints
>> https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html.
>> And we do not support poetry, pipenv - we leave it up to users to handle
>> them (until poetry/pipenv will support constraints - which we are waiting
>> for and there is an issue where I explained  why it is useful). It looks
>> like that `pip install "apache-airflow==2.3.4" --constraint "
>> https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.9.txt"`
>> (different 

Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-26 Thread Valentyn Tymofieiev via dev
Thanks, Jack!

re [12]:

I am seeing some test errors - have they been investigated?
Also, did all test suites run? I think I am not seeing output of some of
the suites, like

Run Python Dataflow V2 ValidatesRunner



On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version 1.2.3,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint DF3CBA4F3F4199F4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.47.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
> 8.0.322.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> * Validation sheet with a tab for 2.47.0 release to help with validation
> [10].
> * Docker images published to Docker Hub [11].
> * PR to run tests against release branch [12].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> blog post at /blog/validate-beam-release/.
>
> *Note: Dataflow containers for Java are still being finalized. I will
> follow up once that is completed; however, this should not block validation
> for other SDKs and runners. *
>
> Thanks,
>
> Jack McCluskey
>
> [1] https://github.com/apache/beam/milestone/10
> [2] https://dist.apache.org/repos/dist/dev/beam/2.47.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1309/
> [5] https://github.com/apache/beam/tree/v2.47.0-RC1
> [6] https://github.com/apache/beam/pull/26439
> [7] https://github.com/apache/beam-site/pull/644
> [8] https://pypi.org/project/apache-beam/2.47.0rc1/
> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.47.0-RC1/go/pkg/beam
> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=.
> ..
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> [12] https://github.com/apache/beam/pull/26152
>


>
> --
>
>
> Jack McCluskey
> SWE - DataPLS PLAT/ Dataflow ML
> RDU
> jrmcclus...@google.com
>
>
>


Re: [ANNOUNCE] New committer: Anand Inguva

2023-04-21 Thread Valentyn Tymofieiev via dev
Congratulations!

On Fri, Apr 21, 2023 at 8:19 PM Jan Lukavský  wrote:

> Congrats Anand!
> On 4/21/23 20:05, Robert Burke wrote:
>
> Congratulations Anand!
>
> On Fri, Apr 21, 2023, 10:55 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Woohoo, congrats Anand! This is very well deserved!
>>
>> On Fri, Apr 21, 2023 at 1:54 PM Chamikara Jayalath 
>> wrote:
>>
>>> Hi all,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new committer: 
>>> Anand
>>> Inguva (ananding...@apache.org)
>>>
>>> Anand has been contributing to Apache Beam for more than a year and
>>> authored and reviewed more than 100 PRs. Anand has been a core contributor
>>> to Beam Python SDK and drove the efforts to support Python 3.10 and Python
>>> 3.11.
>>>
>>> Considering their contributions to the project over this timeframe, the
>>> Beam PMC trusts Anand with the responsibilities of a Beam committer. [1]
>>>
>>> Thank you Anand! And we are looking to see more of your contributions!
>>>
>>> Cham, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
>>> committer
>>>
>>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Valentyn Tymofieiev via dev
I think case in point dependency that would benefit from this testing is
grpcio, which includes pre-releases, and broke us and multiple of it's
released versions were yanked. https://pypi.org/project/grpcio/#history .

We can look at how grpcio affected Beam previously. Couple of issues:

- https://github.com/grpc/grpc/issues/30446 -- affected XLang tests
- https://github.com/apache/beam/issues/23734 -- affected MacOS suites
- https://github.com/apache/beam/issues/22159 -- (not detected by us, but
potentially could have affected a performance test).

I'm afraid a dedicated suite may not give us desired test coverage to catch
regression at RC stage.

On Wed, Apr 12, 2023 at 10:19 AM Yi Hu via dev  wrote:

> Thanks Anand,
>
> This would be very helpful to avoid experiencing multiple time (
> https://s.apache.org/beam-python-dependencies-pm). One thing to note is
> that Beam Jenkins CI is experiencing many issues recently, mostly due to
> that multiple Jenkins plugins does not scale (draining GitHub API call
> limit; disk usage, etc) so more PreCommit may add more pressures to Jenkins
> if going ahead with Option 1. As we have started GitHub Action migration,
> is it considered to add these new tests to GitHub Action?
>
> Best,
> Yi
>
> On Wed, Apr 12, 2023 at 10:46 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
>> the clear signal of the normal suite succeeding and the prerelease one
>> failing would be helpful and there shouldn't be too much additional code
>> necessary. That makes it really easy to treat the prerelease suite as a (at
>> least temporary) signal on needing upper bounds on our dependencies.
>>
>> Thanks,
>> Danny
>>
>> On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> For Apache Beam Python we are considering using pre-released
>>> dependencies for unit testing by using the --pre flag to install
>>> pre-released dependencies of packages.
>>>
>>> We believe that using pre-released dependencies may help us to identify
>>> and resolve bugs more quickly, and to take advantage of new features or bug
>>> fixes that are not yet available in stable releases. However, we also
>>> understand that using pre-released dependencies may introduce new risks and
>>> challenges, including potential code duplication and stability issues.
>>>
>>> Before proceeding, we wanted to get your feedback on this approach.
>>>
>>> 1. Create a new PreCommit test suite and a PostCommit test suite that
>>> runs tests by installing pre-released dependencies.
>>>
>>> Pros:
>>>
>>>- stable and pre-released test suites are separate and it will be
>>>easier to debug if the pre-released test suite fails.
>>>
>>> Cons:
>>>
>>>- More test infra code to maintain. More tests to monitor.
>>>
>>>
>>> 2. Make use of the current PreCommit and PostCommit test suite and
>>> modify it so that it installs pre-released dependencies.
>>>
>>> Pros:
>>>
>>>- Less infra code and less tests to monitor.
>>>
>>> Cons:
>>>
>>>- Leads to noisy test signals if the pre-release candidate is
>>>unstable.
>>>
>>> I am in favor of approach 1 since this approach would ensure that any
>>> issues encountered during pre-release testing do not impact the stable
>>> release environment, and vice versa.
>>>
>>> If you have experience or done any testing work using pre-released
>>> dependencies, please let me know if you took any different approaches. It
>>> will be really helpful.
>>>
>>> Thanks,
>>> Anand
>>>
>>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Valentyn Tymofieiev via dev
2. Make use of the current PreCommit and PostCommit test suite and modify
it so that it installs pre-released dependencies.

> Leads to noisy test signals if the pre-release candidate is unstable.

I am favor of option 2 since it's a simple solution that is easy to
implement and try out. The disadvantage rests on an assumption
that pre-released candidates would be unstable, which may not be the case.
We could try this and pivot if we find this create too much noise. @Jarek
Potiuk  - curious, from your experience with Airflow
dependency management and testing, which option do you use (if you have a
similar scenario)?

On Wed, Apr 12, 2023 at 7:45 AM Danny McCormick via dev 
wrote:

> Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
> the clear signal of the normal suite succeeding and the prerelease one
> failing would be helpful and there shouldn't be too much additional code
> necessary. That makes it really easy to treat the prerelease suite as a (at
> least temporary) signal on needing upper bounds on our dependencies.
>
> Thanks,
> Danny
>
> On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev 
> wrote:
>
>> Hi all,
>>
>> For Apache Beam Python we are considering using pre-released dependencies
>> for unit testing by using the --pre flag to install pre-released
>> dependencies of packages.
>>
>> We believe that using pre-released dependencies may help us to identify
>> and resolve bugs more quickly, and to take advantage of new features or bug
>> fixes that are not yet available in stable releases. However, we also
>> understand that using pre-released dependencies may introduce new risks and
>> challenges, including potential code duplication and stability issues.
>>
>> Before proceeding, we wanted to get your feedback on this approach.
>>
>> 1. Create a new PreCommit test suite and a PostCommit test suite that
>> runs tests by installing pre-released dependencies.
>>
>> Pros:
>>
>>- stable and pre-released test suites are separate and it will be
>>easier to debug if the pre-released test suite fails.
>>
>> Cons:
>>
>>- More test infra code to maintain. More tests to monitor.
>>
>>
>> 2. Make use of the current PreCommit and PostCommit test suite and modify
>> it so that it installs pre-released dependencies.
>>
>> Pros:
>>
>>- Less infra code and less tests to monitor.
>>
>> Cons:
>>
>>- Leads to noisy test signals if the pre-release candidate is
>>unstable.
>>
>> I am in favor of approach 1 since this approach would ensure that any
>> issues encountered during pre-release testing do not impact the stable
>> release environment, and vice versa.
>>
>> If you have experience or done any testing work using pre-released
>> dependencies, please let me know if you took any different approaches. It
>> will be really helpful.
>>
>> Thanks,
>> Anand
>>
>


Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-07 Thread Valentyn Tymofieiev via dev
+1. Verified the composition of Python containers and ran Python pipelines
on Dataflow runner v1 and runner v2.

On Tue, Mar 7, 2023 at 4:11 PM Ritesh Ghorse via dev 
wrote:

> +1 (non-binding)
> Validated Go SDK quickstart on direct and dataflow runner
>
> On Tue, Mar 7, 2023 at 10:54 AM Alexey Romanenko 
> wrote:
>
>> +1 (binding)
>>
>> Tested with  https://github.com/Talend/beam-samples/
>> (Java SDK v8/v11/v17, Spark 3.x runner).
>>
>> ---
>> Alexey
>>
>> On 7 Mar 2023, at 07:38, Ahmet Altay via dev  wrote:
>>
>> +1 (binding) - I validated python quickstarts on direct & dataflow
>> runners.
>>
>> Thank you for doing the release!
>>
>> On Sat, Mar 4, 2023 at 8:01 AM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (binding)
>>>
>>> Validated multi-language Java and Python pipelines.
>>>
>>> On Fri, Mar 3, 2023 at 1:59 PM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> > I have encountered a failure in a Python pipeline running with Runner
>>>> v1:
>>>>
>>>> > RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow
>>>> Python worker version 2.45.0. Please check Dataflow worker startup logs and
>>>> make sure that correct version of Beam SDK is installed.
>>>>
>>>> > We should understand why Python ValidatesRunner tests (which have
>>>> passed)  didn't catch this error.
>>>>
>>>> > This can be remediated in Dataflow containers without  changes to the
>>>> release candidate.
>>>>
>>>> Good catch! I've kicked off a release to fix this, it should be done
>>>> later this evening - I won't be available when it completes, but I would
>>>> expect it to be around 5:00 PST.
>>>>
>>>> On Fri, Mar 3, 2023 at 3:49 PM Danny McCormick <
>>>> dannymccorm...@google.com> wrote:
>>>>
>>>>> Hey Reuven, could you provide some more context on the bug/why it is
>>>>> important? Does it meet the standard in
>>>>> https://beam.apache.org/contribute/release-guide/#7-triage-release-blocking-issues-in-github
>>>>> ?
>>>>>
>>>>> The release branch was cut last Wednesday, so that is why it is not
>>>>> included.
>>>>>
>>>>
>>> Seems like this was a revert of a previous commit that was also not
>>> included in the 2.46.0 release branch (
>>> https://github.com/apache/beam/pull/25627) ?
>>>
>>> If so we might not need a new RC but good to confirm.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>>> On Fri, Mar 3, 2023 at 3:24 PM Reuven Lax  wrote:
>>>>>
>>>>>> If possible, I would like to see if we could include
>>>>>> https://github.com/apache/beam/pull/25642 as we believe this bug has
>>>>>> been impacting multiple users. This was merged 4 days ago, but this RC 
>>>>>> cut
>>>>>> does not seem to include it.
>>>>>>
>>>>>> On Fri, Mar 3, 2023 at 12:18 PM Valentyn Tymofieiev via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>>> I have encountered a failure in a Python pipeline running with
>>>>>>> Runner v1:
>>>>>>>
>>>>>>> RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow
>>>>>>> Python worker version 2.45.0. Please check Dataflow worker startup logs 
>>>>>>> and
>>>>>>> make sure that correct version of Beam SDK is installed.
>>>>>>>
>>>>>>> We should understand why Python ValidatesRunner tests (which have
>>>>>>> passed)  didn't catch this error.
>>>>>>>
>>>>>>> This can be remediated in Dataflow containers without  changes to
>>>>>>> the release candidate.
>>>>>>>
>>>>>>> On Fri, Mar 3, 2023 at 11:22 AM Robert Bradshaw via dev <
>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> +1 (binding).
>>>>>>>>
>>>>>>>> I verified that the artifacts and signatures all look good, all the
>>>>>>>> containers are pushed, and tested some pipelines with a fresh
&g

Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-03 Thread Valentyn Tymofieiev via dev
I have encountered a failure in a Python pipeline running with Runner v1:

RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow Python
worker version 2.45.0. Please check Dataflow worker startup logs and make
sure that correct version of Beam SDK is installed.

We should understand why Python ValidatesRunner tests (which have passed)
didn't catch this error.

This can be remediated in Dataflow containers without  changes to the
release candidate.

On Fri, Mar 3, 2023 at 11:22 AM Robert Bradshaw via dev 
wrote:

> +1 (binding).
>
> I verified that the artifacts and signatures all look good, all the
> containers are pushed, and tested some pipelines with a fresh install
> from one of the Python wheels.
>
> On Fri, Mar 3, 2023 at 11:13 AM Danny McCormick
>  wrote:
> >
> > > The released artifacts seem to be missing the last commit at
> > >
> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
> > > . Is this essential to the release, or just useful for validating it?
> >
> > It's strictly a test infrastructure change, it has no functional impact.
> For context, the changes included were from
> https://github.com/apache/beam/pull/25661 and
> https://github.com/apache/beam/pull/25654, both were keeping integration
> tests from running correctly.
>
> Thanks.
>
> > On Fri, Mar 3, 2023 at 2:09 PM Robert Bradshaw 
> wrote:
> >>
> >> The released artifacts seem to be missing the last commit at
> >>
> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
> >> . Is this essential to the release, or just useful for validating it?
> >>
> >> On Fri, Mar 3, 2023 at 11:02 AM Danny McCormick
> >>  wrote:
> >> >
> >> > Thanks for calling that out, and thanks for helping me fix it! We
> should be all set now
> >> >
> >> > On Fri, Mar 3, 2023 at 1:38 PM Robert Bradshaw 
> wrote:
> >> >>
> >> >> It appears your public key is not published in
> >> >> https://dist.apache.org/repos/dist/release/beam/KEYS .
> >> >>
> >> >> On Fri, Mar 3, 2023 at 8:33 AM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
> >> >> >
> >> >> > +1 (non-binding)
> >> >> > Tested python wordcount quick start
> https://beam.apache.org/get-started/quickstart-py/ on Direct Runner and
> Dataflow Runner.
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > On Fri, Mar 3, 2023 at 11:21 AM Bruno Volpato via dev <
> dev@beam.apache.org> wrote:
> >> >> >>
> >> >> >> +1 (non-binding)
> >> >> >>
> >> >> >> Tested with
> https://github.com/GoogleCloudPlatform/DataflowTemplates (Java SDK 11,
> Dataflow runner).
> >> >> >>
> >> >> >>
> >> >> >> Thanks Danny!
> >> >> >>
> >> >> >> On Thu, Mar 2, 2023 at 5:16 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
> >> >> >>>
> >> >> >>> Hi everyone,
> >> >> >>> Please review and vote on release candidate #1 for the version
> 2.46.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the
> release (please provide specific comments) Reviewers are encouraged to test
> their own use cases with the release candidate, and vote +1 if no issues
> are found. The complete staging area is available for your review, which
> includes: * GitHub Release notes [1], * the official Apache source release
> to be deployed to dist.apache.org [2], which is signed with the key with
> fingerprint FC383FCDE7D7E86699954EF2509872C8031C4DFB [3], * all artifacts
> to be deployed to the Maven Central Repository [4], * source code tag
> "v2.46.0-RC1" [5], * website pull request listing the release [6], the blog
> post [6], and publishing the API reference manual [7]. * Java artifacts
> were built with Gradle GRADLE_VERSION and OpenJDK/Oracle JDK JDK_VERSION. *
> Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8]. * Go artifacts and documentation are
> available at pkg.go.dev [9] * Validation sheet with a tab for 2.46.0
> release to help with validation [10]. * Docker images published to Docker
> Hub [11].
> >> >> >>> * PR to run tests against release branch [12]. The vote will be
> open for at least 72 hours. It is adopted by majority approval, with at
> least 3 PMC affirmative votes. For guidelines on how to try the release in
> your projects, check out our blog post at /blog/validate-beam-release/.
> Thanks, Danny [1] https://github.com/apache/beam/milestone/9 [2]
> https://dist.apache.org/repos/dist/dev/beam/2.46.0/ [3]
> https://dist.apache.org/repos/dist/release/beam/KEYS [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1306/
> [5] https://github.com/apache/beam/tree/v2.46.0-RC1 [6]
> https://github.com/apache/beam/pull/25693 [7]
> https://github.com/apache/beam-site/pull/641 [8]
> https://pypi.org/project/apache-beam/2.46.0rc1/ [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.46.0-RC1/go/pkg/beam
> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=247587190
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> >> >> >>> [12] 

Dependabot questions

2023-02-27 Thread Valentyn Tymofieiev via dev
I noticed that human-readable dependency reports are not being generated.
Can this functionality be replaced with Dependabot?

Does Dependabot provide a view of what is currently outdated from its
standpoint?

Also, I noticed that some dependencies are outdated, yet not updated by
Dependabot. Possibly, because a prior update PR was silenced. Is it
possible to see the state of which dependencies are currently opted out?


Thanks!


Re: Beam Release 2.46

2023-02-23 Thread Valentyn Tymofieiev via dev
Thanks for the update!

I'd like to suggest that we include in the release voting email template a
link to a PR that runs all tests against the release branch. I think we
used to include it, but I haven't seen it in recent voting threads.

Thanks,
Valentyn

On Thu, Feb 23, 2023 at 9:28 AM Danny McCormick via dev 
wrote:

> I cut the release branch yesterday at commit
> 4ce8eeda19699cc64ae8cf310267a478cfe9e4b8
> .
> There is currently one open release blocking issue
> :
>
> - #25601: [Failing Test]: Python PostCommit failing due to duplicate
> AvroSchemaIO autoservice 
>  - @Alexey Romanenko , @Yi Hu ,
> and Moritz Mack have been working together on resolving that issue (Alexey
> has a pr up here - #25611 ).
>
> Once that blocker is resolved/cherry picked, I will work on generating the
> first release candidate
>
> Thanks,
> Danny
>
>
>
>
> On Wed, Feb 15, 2023 at 10:05 AM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> > Do you mind if I shadow you while you do this?
>>
>> Sure!
>>
>> On Tue, Feb 14, 2023 at 12:32 PM Damon Douglas 
>> wrote:
>>
>>> Hello Danny,
>>>
>>> Do you mind if I shadow you while you do this?
>>>
>>> Best,
>>>
>>> Damon
>>>
>>> On Thu, Feb 9, 2023 at 3:17 PM Kenneth Knowles  wrote:
>>>
 Excellent! Keep that release train rolling.

 On Thu, Feb 9, 2023 at 9:28 AM Ahmet Altay via dev 
 wrote:

> Thank you Danny!
>
> On Wed, Feb 8, 2023 at 6:46 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Hey everyone, I would like to volunteer myself to do the 2.46.0
>> release.
>>
>> I will cut the branch Feb 22 [1], and cherrypick any blocking fixes
>> afterwards. Please review the current release blockers [2] and remove the
>> 2.46 milestone if they don't meet the criteria at [3].
>>
>> Thanks,
>> Danny
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/9
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>


Re: Python 3.11 support in Apache Beam

2023-02-21 Thread Valentyn Tymofieiev via dev
Thanks a lot Anand. I'll take a look at the PRs.

On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva  wrote:

> I was able to spin up a PR: https://github.com/apache/beam/pull/24599
> that updates the build dependencies of Apache Beam.
>
> Several GCP dependencies needed to be updated as well. I covered them in
> the PR: https://github.com/apache/beam/pull/24599
>
> On Thu, Feb 9, 2023 at 3:29 PM Anand Inguva 
> wrote:
>
>> Yes, we may need to update all of them
>> .
>> I can add more information once I dig into the issue(most likely next
>> week). I will comment on my findings on the issue:
>> https://github.com/apache/beam/issues/24569 and will periodically update
>> this thread.
>>
>> On Tue, Feb 7, 2023 at 5:47 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva 
>>> wrote:
>>>
 Yes, it is related to protobuf only. But I think the update of these
 dependencies are required for Python 3.11 since the newer versions have
 support for Python 3.11 wheels.

>>> Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for
>>> protobuf==3.x.x and that can cause friction.
>>> https://pypi.org/project/protobuf/3.20.3/#files
>>>
>>> I would probably narrow the problem further to demonstrate which stubs
>>> are not being generated, and if reason not obvious we can also ask for
>>> feedback from protobuf maintainers. Also - do we by chance need to
>>> update some other deps from
>>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
>>> for this to work?
>>>
>>> Also: tracking issue for protobuf4 support in Beam:
>>> https://github.com/apache/beam/issues/24569.
>>>
>>> If we use older versions of these packages, then we have to depend on
 installing those packages on Python 3.11 from source distributions which is
 not desired.

 I am working parallely on that issue in a different PR
 https://github.com/apache/beam/pull/24599 but I think this issue
 should be a blocker for Python 3.11 update.

 On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev 
 wrote:

> Hi Anand,
>
> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
>
>> Hi all,
>>
>> We are planning to work on adding support for Python 3.11[1] to
>> Apache Beam Python SDK.
>>
>> As part of this effort, we are going to update the python build
>> dependencies defined at [2].
>>
>> Right now, there is an error with the newer version of
>> protobuf(4.21.11). It is not generating _urn files.
>>
>> It can be reproduced by
>>
>
>> 1. python setup.py sdist
>> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
>> 3. switch to python interpreter and run import apache_beam as beam
>>
> I think the error you are describing is related to protobuf 4, so the
> repro should focus on the portion where generation of stubs is happening.
> Presumably some stubs are not generated on protobuf 4 + Python 3.11?
>
>
>>
>> will lead to *ImportError: cannot import name
>> 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'.  *Running
>> `python gen_protos.py` to forcefully generate files didn't help either.
>>
>> If you have encountered this error and found a resolution, please let
>> me know(that would be super helpful).
>>
>> I am going to work on this soon. Please let me know if you want to
>> collaborate.
>>
>> Thanks,
>> Anand Inguva
>>
>> *[1] *https://github.com/apache/beam/pull/24721
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>>
>


Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Valentyn Tymofieiev via dev
On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva  wrote:

> Yes, it is related to protobuf only. But I think the update of these
> dependencies are required for Python 3.11 since the newer versions have
> support for Python 3.11 wheels.
>
Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for
protobuf==3.x.x and that can cause friction.
https://pypi.org/project/protobuf/3.20.3/#files

I would probably narrow the problem further to demonstrate which stubs are
not being generated, and if reason not obvious we can also ask for feedback
from protobuf maintainers. Also - do we by chance need to update some other
deps from
https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
for this to work?

Also: tracking issue for protobuf4 support in Beam:
https://github.com/apache/beam/issues/24569.

If we use older versions of these packages, then we have to depend on
> installing those packages on Python 3.11 from source distributions which is
> not desired.
>
> I am working parallely on that issue in a different PR
> https://github.com/apache/beam/pull/24599 but I think this issue should
> be a blocker for Python 3.11 update.
>
> On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev 
> wrote:
>
>> Hi Anand,
>>
>> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev 
>> wrote:
>>
>>> Hi all,
>>>
>>> We are planning to work on adding support for Python 3.11[1] to Apache
>>> Beam Python SDK.
>>>
>>> As part of this effort, we are going to update the python build
>>> dependencies defined at [2].
>>>
>>> Right now, there is an error with the newer version of
>>> protobuf(4.21.11). It is not generating _urn files.
>>>
>>> It can be reproduced by
>>>
>>
>>> 1. python setup.py sdist
>>> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
>>> 3. switch to python interpreter and run import apache_beam as beam
>>>
>> I think the error you are describing is related to protobuf 4, so the
>> repro should focus on the portion where generation of stubs is happening.
>> Presumably some stubs are not generated on protobuf 4 + Python 3.11?
>>
>>
>>>
>>> will lead to *ImportError: cannot import name
>>> 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'.  *Running
>>> `python gen_protos.py` to forcefully generate files didn't help either.
>>>
>>> If you have encountered this error and found a resolution, please let me
>>> know(that would be super helpful).
>>>
>>> I am going to work on this soon. Please let me know if you want to
>>> collaborate.
>>>
>>> Thanks,
>>> Anand Inguva
>>>
>>> *[1] *https://github.com/apache/beam/pull/24721
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>>>
>>


Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Valentyn Tymofieiev via dev
Hi Anand,

On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev 
wrote:

> Hi all,
>
> We are planning to work on adding support for Python 3.11[1] to Apache
> Beam Python SDK.
>
> As part of this effort, we are going to update the python build
> dependencies defined at [2].
>
> Right now, there is an error with the newer version of protobuf(4.21.11).
> It is not generating _urn files.
>
> It can be reproduced by
>

> 1. python setup.py sdist
> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
> 3. switch to python interpreter and run import apache_beam as beam
>
I think the error you are describing is related to protobuf 4, so the repro
should focus on the portion where generation of stubs is happening.
Presumably some stubs are not generated on protobuf 4 + Python 3.11?


>
> will lead to *ImportError: cannot import name 'beam_runner_api_pb2_urns'
> from 'apache_beam.portability.api'.  *Running `python gen_protos.py` to
> forcefully generate files didn't help either.
>
> If you have encountered this error and found a resolution, please let me
> know(that would be super helpful).
>
> I am going to work on this soon. Please let me know if you want to
> collaborate.
>
> Thanks,
> Anand Inguva
>
> *[1] *https://github.com/apache/beam/pull/24721
> [2]
> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>


Re: Subscribe

2023-01-24 Thread Valentyn Tymofieiev via dev
Hello Alan,

To subscribe to the list, you should send an email to
dev-subscr...@beam.apache.org instead.

Best,
Valentyn

On Tue, Jan 24, 2023 at 5:19 PM Alan Zhang via dev 
wrote:

>


Re: [VOTE] Release 2.44.0, release candidate #1

2023-01-11 Thread Valentyn Tymofieiev via dev
+1. I validated that Dataflow and Beam Python containers include necessary
dependencies of Apache Beam and did additional validation (see inline).

On Wed, Jan 11, 2023 at 12:48 AM Ahmet Altay  wrote:

> I validated python quick starts (direct, dataflow) X (batch, streaming). I
> ran into an issue with the dataflow batch case, running the wordcount with
> the standard:
>
> python -m apache_beam.examples.wordcount \
> --output  \
> --staging_location  \
> --temp_location \
> --runner DataflowRunner \
> --job_name wordcount-$USER \
> --project  \
> --num_workers 1 \
> --region us-central1 \
> --sdk_location apache-beam-2.44.0.zip
>
> results in:
>
> "/usr/local/lib/python3.10/site-packages/dataflow_worker/shuffle.py", line
> 589, in __enter__ raise RuntimeError(_PYTHON_310_SHUFFLE_ERROR_MESSAGE)
> RuntimeError: This pipeline requires Dataflow Runner v2 in order to run
> with currently used version of Apache Beam on Python 3.10+. Please verify
> that the Dataflow Runner v2 is not disabled in the pipeline options or
> enable it explicitly via: --dataflow_service_option=use_runner_v2.
> Alternatively, downgrade to Python 3.9 to use Dataflow Runner v1.
>
> Questions:
> - I am not explicitly opting out of runner v2, and this is a standard
> wordcount example, I expected it to just work.
>
You are most likely using a google-internal project for which Runner v2 is
explicitly disabled, to enable  Runner v1 test coverage within Google. I
can repro this error as well (maybe on the same project as you), but don't
repro it on other projects, such as apache-beam-testing. Runner v1 is not
supported on Python 3.10 (this is documented).

Such behavior is WAI as far as Beam is concerned and the difference is due
to configuration details in Dataflow.



>
> Then I tried to add --dataflow_service_option=use_runner_v2 to the above
> wordcount command, which results in the following error:
>
> "message": "Dataflow Runner v2 requires a valid FnApi job, Please
> resubmit your job with a valid configuration. Note that if using Templates,
> you may need to regenerate your template with the '--use_runner_v2'."
>
> Maybe I am doing something wrong and it is an error on my end. It would be
> good for someone else with python experience to check this.
>
> /cc @Valentyn Tymofieiev 
>
> Ahmet
>
>
>
>
> On Tue, Jan 10, 2023 at 10:54 AM Kenneth Knowles  wrote:
>
>> I have published a new maven staging repository:
>> https://repository.apache.org/content/repositories/orgapachebeam-1290/
>>
>> It looks like it has everything, though I did not automate a check. At
>> least there were no errors during publish which I ran with --no-parallel
>> overnight, and some specific things that were missing from
>> orgapachebeam-1289 are present.
>>
>> I will restart the 72 hour waiting period, since the RC is only now
>> usable.
>>
>> Kenn
>>
>> On Mon, Jan 9, 2023 at 6:51 PM Kenneth Knowles  wrote:
>>
>>> I have discovered that many pom files are missing from the nexus
>>> repository. I should be able to re-publish a new one. It will take some
>>> time as this is one of the longest-running processes.
>>>
>>> On Mon, Jan 9, 2023 at 1:42 PM Kenneth Knowles  wrote:
>>>
 Correction: this is release candidate #1.

 On Mon, Jan 9, 2023 at 1:25 PM Kenneth Knowles  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version
> 2.44.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if
> no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.44.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_232.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI [8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> (waiting on these to appear)
> * Validation sheet with a tab for 2.44.0 release to help with
> validation [10].
> * Docker images published to Docker Hub [11].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out
> our blog post at /blog/validate-beam-release/.
>
> Thanks,
> Kenn
>
> [1] https://github.com/apache/beam/milestone/7
> [2] 

Re: [VOTE] Release 2.43.0, release candidate #1

2022-11-10 Thread Valentyn Tymofieiev via dev
-1.
It looks like the format of Python wheels has changed.
We should update the stager code and python container entrypoint code,
otherwise we will have a 2 min pipeline start time regression on some
runners.
Opened https://github.com/apache/beam/issues/24110

On Thu, Nov 10, 2022 at 11:10 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> Thanks folks.
>
> Blocking issues were https://github.com/apache/beam/issues/24065 and
> https://github.com/apache/beam/pull/24041.
>
> I'll build RC2 when fixes are cherry-picked.
>
> This vote is now closed.
>
> - Cham
>
> On Thu, Nov 10, 2022 at 11:03 AM Anand Inguva 
> wrote:
>
>> +1 (non-binding) validated Python SDK QuickStart, Beam RunInference
>> examples on Direct and Dataflow Runner. Also, verified the Python 3.10
>> artifacts.
>>
>>
>> On Wed, Nov 9, 2022 at 1:40 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Ack. There's another potential cherry-pick here:
>>> https://github.com/apache/beam/pull/24041
>>>
>>> This should not prevent validation against RC1 for any other potential
>>> regressions.
>>>
>>> I'll build a RC2 when cherry-picks are in.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Wed, Nov 9, 2022 at 9:30 AM Ritesh Ghorse via dev <
>>> dev@beam.apache.org> wrote:
>>>
 The Dataframe wrapper in Go SDK is failing because of
 https://github.com/apache/beam/issues/24065. I have a PR here
  to unblock the release.
 The current PR allows Dataframe wrapper to work as expected but proper fix
 should be added while merging RunInference wrapper.

 Thanks,
 Ritesh


 On Wed, Nov 9, 2022 at 8:40 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> +1 (binding)
>
> Tested with  https://github.com/Talend/beam-samples/
> (Java SDK v8 & v11, Spark 3 runner).
>
> ---
> Alexey
>
> On 9 Nov 2022, at 01:38, Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version
> 2.43.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if
> no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.43.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.5.1 and openjdk version
> 1.8.0_181-google-v7.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> * Validation sheet with a tab for 2.43.0 release to help with
> validation [10].
> * Docker images published to Docker Hub [11].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out
> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>
> Thanks,
> Cham
>
> [1] https://github.com/apache/beam/milestone/5
> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1287/
> [5] https://github.com/apache/beam/tree/v2.43.0-RC1
> [6] https://github.com/apache/beam/pull/24044
> [7] https://github.com/apache/beam-site/pull/635
> [8] https://pypi.org/project/apache-beam/2.43.0rc1/
> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC1/go/pkg/beam
> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>
>
>


Re: [ANNOUNCE] New committer: Yi Hu

2022-11-09 Thread Valentyn Tymofieiev via dev
I am with the Beam PMC on this, congratulations and very well deserved, Yi!

On Wed, Nov 9, 2022 at 11:08 AM Byron Ellis via dev 
wrote:

> Congratulations!
>
> On Wed, Nov 9, 2022 at 11:00 AM Pablo Estrada via dev 
> wrote:
>
>> +1 thanks Yi : D
>>
>> On Wed, Nov 9, 2022 at 10:47 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congrats Yi! I've really appreciated the ways you've consistently taken
>>> responsibility for improving our team's infra and working through sharp
>>> edges in the codebase that others have ignored. This is definitely well
>>> deserved!
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Wed, Nov 9, 2022 at 1:37 PM Anand Inguva via dev 
>>> wrote:
>>>
 Congratulations Yi!

 On Wed, Nov 9, 2022 at 1:35 PM Ritesh Ghorse via dev <
 dev@beam.apache.org> wrote:

> Congratulations Yi!
>
> On Wed, Nov 9, 2022 at 1:34 PM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Congrats Yi!
>>
>> On Wed, Nov 9, 2022 at 1:33 PM Sachin Agarwal via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations Yi!
>>>
>>> On Wed, Nov 9, 2022 at 10:32 AM Kenneth Knowles 
>>> wrote:
>>>
 Hi all,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Yi Hu (y...@apache.org)

 Yi started contributing to Beam in early 2022. Yi's contributions
 are very diverse! I/Os, performance tests, Jenkins, support for Schema
 logical types. Not only code but a very large amount of code review. 
 Yi is
 also noted for picking up smaller issues that normally would be left 
 on the
 backburner and filing issues that he finds rather than ignoring them.

 Considering their contributions to the project over this timeframe,
 the Beam PMC trusts Yi with the responsibilities of a Beam committer. 
 [1]

 Thank you Yi! And we are looking to see more of your contributions!

 Kenn, on behalf of the Apache Beam PMC

 [1]

 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

>>>


Re: github reviewer help / tips

2022-11-08 Thread Valentyn Tymofieiev via dev
I use Notifier for Github

Chrome
extension.

On Tue, Nov 8, 2022 at 10:29 AM Sachin Agarwal via dev 
wrote:

> Hey folks,
>
> I've found myself repeatedly being very untimely in providing reviews on
> PRs where I've been added as a reviewer.  (Mea culpa and thank you for your
> understanding to those who have tagged me and emailed me to nudge me along.)
>
> Does anyone have any great tips about how to be super on top of things in
> the Beam repos?  Any Github experts who can get my SLA from three weeks to
> a day or so would be great.
>
> Many thanks in advance -
>
> Cheers,
> Sachin
>


Re: Pipleline portable proto visualizaiton

2022-11-07 Thread Valentyn Tymofieiev via dev
Thanks a lot, sounds like that would help avoid reinventing the wheel.

On Mon, Nov 7, 2022 at 9:28 AM Robert Bradshaw  wrote:

> I've got one I use in Python too (including drilling down into
> composites). It's a portable runner. I should clean it up and make it
> generally available.
>
> On Mon, Nov 7, 2022 at 9:25 AM Robert Burke  wrote:
> >
> > The Go SDK has a "dot" runner to visualize pipeline protos as a dot
> graph but it's it's not set up as a portable runner. Probably wouldn't be
> too hard to get it to operate as a stand alone tool, for someone motivated
> enough.
> >
> > On Mon, Nov 7, 2022, 9:19 AM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
> >>
> >> I'd like to visualize a DAG for a Beam portable pipeline, from a .pb
> file or a textproto representation.
> >>
> >> Is some runner's UI readily available to make it possible (without
> executing the job)? I was thinking perhaps Apache Hop integration (if we
> have one) might be able to do that.
> >>
> >> If not, it should be fairly simple to convert the pipeline to dot
> format and use graphviz.
> >>
> >> Thanks,
> >> Valentyn
>


Pipleline portable proto visualizaiton

2022-11-07 Thread Valentyn Tymofieiev via dev
I'd like to visualize a DAG for a Beam portable pipeline, from a .pb file
or a textproto representation.

Is some runner's UI readily available to make it possible (without
executing the job)? I was thinking perhaps Apache Hop integration (if we
have one) might be able to do that.

If not, it should be fairly simple to convert the pipeline to dot format
and use graphviz.

Thanks,
Valentyn


Re: [VOTE] Release 2.42.0, release candidate #2

2022-10-14 Thread Valentyn Tymofieiev via dev
+1 based on prior validation i did and the RC1-RC2 Delta .

On Fri, Oct 14, 2022 at 10:22 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> +1 (binding)
>
> Thanks,
> Cham
>
> On Fri, Oct 14, 2022 at 5:43 AM Alexey Romanenko 
> wrote:
>
>> +1 (binding)
>>
>> Tested with  https://github.com/Talend/beam-samples/
>> (Java SDK v8 & v11, Spark 3 runner).
>>
>> ---
>> Alexey
>>
>> On 14 Oct 2022, at 05:17, Ahmet Altay via dev 
>> wrote:
>>
>> +1 (binding)
>>
>> Tested python quickstart examples on the direct runner. Thank you!
>>
>> On Thu, Oct 13, 2022 at 5:35 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (binding)
>>>
>>> Validated release artifacts and signatures. Tested a Python pipeline
>>> on a clean install.
>>>
>>> On Thu, Oct 13, 2022 at 1:22 PM Ritesh Ghorse via dev
>>>  wrote:
>>> >
>>> > +1 (non-binding)
>>> > Validated Go SDK Quickstart on Direct and Dataflow runner.
>>> >
>>> > Thanks,
>>> > Ritesh Ghorse
>>> >
>>> > On Thu, Oct 13, 2022 at 4:01 PM Pablo Estrada via dev <
>>> dev@beam.apache.org> wrote:
>>> >>
>>> >> +1 (binding)
>>> >>
>>> >> I've validated local/unit tests for existing dataflow templates. They
>>> look good!
>>> >> Best
>>> >> -P.
>>> >>
>>> >> On Thu, Oct 13, 2022 at 10:41 AM Ning Kang via dev <
>>> dev@beam.apache.org> wrote:
>>> >>>
>>> >>> +1 Thank you, Robert!
>>> >>>
>>> >>> On Thu, Oct 13, 2022 at 12:47 AM Robert Burke 
>>> wrote:
>>> 
>>>  Hi everyone,
>>>  Please review and vote on the release candidate #2 for the version
>>> 2.42.0, as follows:
>>>  [ ] +1, Approve the release
>>>  [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> 
>>>  Reviewers are encouraged to test their own use cases with the
>>> release candidate, and vote +1 if no issues are found.
>>> 
>>>  The complete staging area is available for your review, which
>>> includes:
>>>  * GitHub Release notes [1],
>>>  * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
>>>  * all artifacts to be deployed to the Maven Central Repository [4],
>>>  * source code tag "v2.42.0-RC2" [5],
>>>  * website pull request listing the release [6], the blog post [6],
>>> and publishing the API reference manual [7].
>>>  * Java artifacts were built with Gradle 7.5.1 and AdoptOpen JDK
>>> 1.8.0_292.
>>>  * Python artifacts are deployed along with the source release to
>>> the dist.apache.org [2] and PyPI [8]
>>>  * Go Package information and SDK RC [9]
>>>  * Validation sheet with a tab for 2.42.0 release to help with
>>> validation [10].
>>>  * Docker images published to Docker Hub [11]. (Soon)
>>> 
>>>  The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>> 
>>>  Updates from RC1 include a fix to SpannerIO backlog estimation [12]
>>> and a fix to the BigQueryIO interpretation of coders on an internal flatten
>>> [13]. Otherwise, previous validation should be unaffected.
>>> 
>>>  For guidelines on how to try the release in your projects, check
>>> out our blog post at https://beam.apache.org/blog/validate-beam-release/
>>> .
>>> 
>>>  Thanks,
>>>  Robert Burke
>>>  2.42.0 Release Manager
>>> 
>>>  [1] https://github.com/apache/beam/milestone/4
>>>  [2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/
>>>  [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>  [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1286/
>>>  [5] https://github.com/apache/beam/tree/v2.42.0-RC2
>>>  [6] https://github.com/apache/beam/pull/23406
>>>  [7] https://github.com/apache/beam-site/pull/634
>>>  [8] https://pypi.org/project/apache-beam/2.42.0rc2/
>>>  [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC2/go/pkg/beam
>>>  [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
>>>  [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>  [12] https://github.com/apache/beam/issues/23494
>>>  [13] https://github.com/apache/beam/issues/23561
>>> 
>>>
>>
>>


Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-03 Thread Valentyn Tymofieiev via dev
I validated that Dataflow and Beam Python containers have dependencies that
match Beam requirements.

I came across https://github.com/apache/beam/pull/23200 - there are failed
tests and I don't see test results for Python PostCommit suites. Do you
know what's the status of both?

Minor nits: missing substitution in  * Java artifacts were built with
Gradle GRADLE_VERSION and OpenJDK/Oracle JDK JDK_VERSION.

Thanks!



On Mon, Oct 3, 2022 at 7:21 AM Ritesh Ghorse via dev 
wrote:

> +1 (non-binding)
> Validated Go SDK Quickstart on Direct and Dataflow runner
>
>
> On Mon, Oct 3, 2022 at 9:38 AM Alexey Romanenko 
> wrote:
>
>> +1 (binding)
>>
>> Tested with  https://github.com/Talend/beam-samples/
>> (Java SDK v8 & v11, Spark 3 runner).
>>
>> ---
>> Alexey
>>
>> On 3 Oct 2022, at 14:32, Chamikara Jayalath via dev 
>> wrote:
>>
>> +1 (binding)
>>
>> Verified checksums and signatures of artifacts.
>> Validated some multi-language pipelines.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Sep 29, 2022 at 6:12 PM Robert Burke via dev 
>> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #1 for the version
>>> 2.42.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found.
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org [2],
>>> which is signed with the key with fingerprint
>>> A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.42.0-RC1" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle GRADLE_VERSION and
>>> OpenJDK/Oracle JDK JDK_VERSION.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI [8]
>>> * Go Package information and SDK RC  [9]
>>> * Validation sheet with a tab for 2.42.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>>
>>> Thanks,
>>> Robert Burke
>>> 2.42.0 Release Manager
>>>
>>> [1] https://github.com/apache/beam/milestone/4
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1285/
>>> [5] https://github.com/apache/beam/tree/v2.42.0-RC1
>>> [6] https://github.com/apache/beam/pull/23406
>>> [7] https://github.com/apache/beam-site/pull/634
>>> [8] https://pypi.org/project/apache-beam/2.42.0rc1/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC1/go/pkg/beam
>>>
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>
>>>
>>


Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-25 Thread Valentyn Tymofieiev via dev
Hi Jarek,

Thanks a lot for detailed feedback and sharing the Airflow story, this is
exactly what I was hoping to hear in response from the mailing list!

600+ dependencies is very impressive, so I'd be happy to chat more and
learn from your experience.

On Wed, Aug 24, 2022 at 5:50 AM Jarek Potiuk  wrote:

> Comment (from a bit outsider)
>
> Fantastic document Valentyn.
>
> Very, very insightful and interesting. We feel a lot of the same pain in
> Apache Airflow (actually even more because we have not 20 but 620+
> dependencies) but we are also a bit more advanced in the way how we are
> managing the dependencies - some of the ideas you had there are already
> tested and tried in Airflow, some of them are a bit different but we can
> definitely share "principles" and we are a little higher in the "supply
> chain" (i.e. Apache Beam Python SDK is our dependency).
>
> I left some suggestions and some comments describing in detail how the
> same problems look like in Airflow and how we addressed them (if we did)
> and I am happy to participate in further discussions. I am "the dependency
> guy" in Airflow and happy to share my experiences and help to work out some
> problems - and especially help to solve problems coming from using multiple
> google-client libraries and diamond dependencies (we are just now dealing
> with similar issue - where likely we will have to do a massive update of
> several of our clients - hopefully with the involvement of Composer team.
> And I'd love to be involved in a joint discussion with the google client
> team to work out some common and expectations that we can rely on when we
> define our future upgrade strategy for google clients.
>
> I will watch it here and be happy to spend quite some time on helping to
> hash it out.
>
> BTW. You can also watch my talk I gave last year at PyWaw about "Managing
> Python dependencies at Scale"
> https://www.youtube.com/watch?v=_SjMdQLP30s=2549s where I explain the
> approach we took, reasoning behind it etc.
>
> J.
>
>
> On Wed, Aug 24, 2022 at 2:45 AM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>>
>> Recently, several issues [1-3]  have highlighted outage risks and
>> developer inconveniences due to  dependency management practices in Beam
>> Python.
>>
>> With dependabot and other tooling  that we have integrated with Beam, one
>> of the missing pieces seems to be having a clear guideline of how we should
>> be specifying requirements for our dependencies and when and how we should
>> be updating them to have a sustainable process.
>>
>> As a conversation starter, I put together a retrospective
>> <https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit?resourcekey=0-XcHRyFh4KRPkA0GsdUmU3g#>[4]
>> covering a recent incident and would like to get community opinions on the
>> open questions.
>>
>> In particular, if you have experience managing dependencies for other
>> Python libraries with rich dependency chains, knowledge of available
>> tooling or first hand experience dealing with other dependency issues in
>> Beam, your input would be greatly appreciated.
>>
>> Thanks,
>> Valentyn
>>
>> [1] https://github.com/apache/beam/issues/22218
>> [2] https://github.com/apache/beam/pull/22550#issuecomment-1217348455
>> [3] https://github.com/apache/beam/issues/22533
>> [4]
>> https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit
>>
>


[DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-23 Thread Valentyn Tymofieiev via dev
Hi everyone,

Recently, several issues [1-3]  have highlighted outage risks and developer
inconveniences due to  dependency management practices in Beam Python.

With dependabot and other tooling  that we have integrated with Beam, one
of the missing pieces seems to be having a clear guideline of how we should
be specifying requirements for our dependencies and when and how we should
be updating them to have a sustainable process.

As a conversation starter, I put together a retrospective
[4]
covering a recent incident and would like to get community opinions on the
open questions.

In particular, if you have experience managing dependencies for other
Python libraries with rich dependency chains, knowledge of available
tooling or first hand experience dealing with other dependency issues in
Beam, your input would be greatly appreciated.

Thanks,
Valentyn

[1] https://github.com/apache/beam/issues/22218
[2] https://github.com/apache/beam/pull/22550#issuecomment-1217348455
[3] https://github.com/apache/beam/issues/22533
[4]
https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit