Re: [VOTE] Release 2.56.0, release candidate #2

2024-04-30 Thread Chamikara Jayalath via dev
+1 (binding)

Validated multi-lang Java/Python and the transform upgrade feature.

Thanks,
Cham

On Mon, Apr 29, 2024 at 12:57 AM Jan Lukavský  wrote:

> +1 (binding).
>
> Tested Java SDK with Flink runner.
>
>  Jan
> On 4/28/24 15:32, XQ Hu via dev wrote:
>
> +1 (non-binding). Tested it using the dataflow ML pipeline:
> https://github.com/google/dataflow-ml-starter/actions/runs/8862170843/job/24334816481
>
> On Sat, Apr 27, 2024 at 7:42 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version
>> 2.56.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>> count towards the final vote, but votes from all community members is
>> encouraged and helpful for finding regressions; you can either test your
>> own use cases [13] or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint D20316F712213422 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.56.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.56.0 release to help with validation
>> [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> RC testing guide [13].
>>
>> Thanks,
>> Danny
>>
>> [1] https://github.com/apache/beam/milestone/20
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.56.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1377/
>> [5] https://github.com/apache/beam/tree/v2.56.0-RC2
>> [6] https://github.com/apache/beam/pull/31094
>> [7] https://github.com/apache/beam-site/pull/665
>> [8] https://pypi.org/project/apache-beam/2.56.0rc2/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.56.0-RC2/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1992402651
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/31038
>> [13]
>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>
>


Re: Patch release proposal

2024-03-28 Thread Chamikara Jayalath via dev
On Thu, Mar 28, 2024 at 8:57 AM Jan Lukavský  wrote:

> +1 to either doing full release or deferring to 2.56.0.
>

+1. Given that validation/testing required for unupdated SDKs should be
minimum, I don't think a full release will be that much overhead compared
to just releasing Python SDK. Also this is a good opportunity to figure out
any friction when performing a patch release I believe.

- Cham



>  Jan
> On 3/28/24 16:52, Yi Hu via dev wrote:
>
> > Just releasing Python can break multi-lang by default (unless expansion
> service is overridden manually) since we match versions across languages
> when picking the default expansion service.
>
> Yes, that's why I proposed "the source code of release candidate (e.g.
> apache_beam/version.py) still reads 2.55.0. " Anyways it seems doing a full
> release is preferred as it reduces the risk of breakages.
>
> On Thu, Mar 28, 2024 at 11:38 AM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Thu, Mar 28, 2024 at 8:36 AM Chamikara Jayalath 
>> wrote:
>>
>>> Just releasing Python can break multi-lang by default (unless expansion
>>> service is overridden manually) since we match versions across languages
>>> when picking the default expansion service.
>>>
>>>
>>> https://github.com/apache/beam/blob/2f8854a3e34f31c1cc034f95ad36f317abc906ff/sdks/python/apache_beam/utils/subprocess_server.py#L42
>>>
>>
>> Correct link:
>> https://github.com/apache/beam/blob/2f8854a3e34f31c1cc034f95ad36f317abc906ff/sdks/python/apache_beam/utils/subprocess_server.py#L352
>>
>>
>>>
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Mar 28, 2024 at 8:26 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> > The patch itself [1] is trivial, however, the release process is not
>>>> trivial. There is little documentation nor practice for a patch release
>>>> process. I could imagine two options
>>>>
>>>> I think there's not a ton of documentation because we haven't done it,
>>>> but all the release workflows were authored in such a way that they should
>>>> "just work", outside of cutting the release branch itself. So the workflow
>>>> should be almost identical to the existing one, but with several steps
>>>> skipped (cherry picks, beam website, most validation). Notably, this
>>>> shouldn't be any easier/harder if we're doing it for one language or all 3.
>>>>
>>>> I can take that on if needed.
>>>>
>>>> > Besides, there should be a Beam YAML validation workflow and added in
>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>>>>
>>>> > If we do a patch release for Python SDK, let's also patch another
>>>> known issue for which fix is available:
>>>> https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1
>>>>
>>>> +1 to both of these
>>>>
>>>> On Thu, Mar 28, 2024 at 11:25 AM Yi Hu via dev 
>>>> wrote:
>>>>
>>>>> Thanks Valentyn for raising this. In this case, Python containers will
>>>>> also be included. Different from PyPI wheels, docker tag can override so 
>>>>> it
>>>>> can stay with 2.55.0
>>>>>
>>>>> On Thu, Mar 28, 2024 at 11:15 AM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> If we do a patch release for Python SDK, let's also patch another
>>>>>> known issue for which fix is available:
>>>>>> https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1
>>>>>>
>>>>>> On Thu, Mar 28, 2024 at 8:01 AM Yi Hu via dev 
>>>>>> wrote:
>>>>>>
>>>>>>> 2.55.0 release manager here
>>>>>>>
>>>>>>> The patch itself [1] is trivial, however, the release process is not
>>>>>>> trivial. There is little documentation nor practice for a patch release
>>>>>>> process. I could imagine two options
>>>>>>>
>>>>>>> 1. Do a full "2.55.1" release
>>>>>>>
>>>>>>> 2. Do a patch release only for Python SDK, that is
>>>>>>>   a. cherry-pick [1] into release-2.55.0 branch
>>>>>>>   b. tag a 2.55.1rc1 release candidate

Re: Patch release proposal

2024-03-28 Thread Chamikara Jayalath via dev
On Thu, Mar 28, 2024 at 8:36 AM Chamikara Jayalath 
wrote:

> Just releasing Python can break multi-lang by default (unless expansion
> service is overridden manually) since we match versions across languages
> when picking the default expansion service.
>
>
> https://github.com/apache/beam/blob/2f8854a3e34f31c1cc034f95ad36f317abc906ff/sdks/python/apache_beam/utils/subprocess_server.py#L42
>

Correct link:
https://github.com/apache/beam/blob/2f8854a3e34f31c1cc034f95ad36f317abc906ff/sdks/python/apache_beam/utils/subprocess_server.py#L352


>
>
> Thanks,
> Cham
>
> On Thu, Mar 28, 2024 at 8:26 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > The patch itself [1] is trivial, however, the release process is not
>> trivial. There is little documentation nor practice for a patch release
>> process. I could imagine two options
>>
>> I think there's not a ton of documentation because we haven't done it,
>> but all the release workflows were authored in such a way that they should
>> "just work", outside of cutting the release branch itself. So the workflow
>> should be almost identical to the existing one, but with several steps
>> skipped (cherry picks, beam website, most validation). Notably, this
>> shouldn't be any easier/harder if we're doing it for one language or all 3.
>>
>> I can take that on if needed.
>>
>> > Besides, there should be a Beam YAML validation workflow and added in
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>>
>> > If we do a patch release for Python SDK, let's also patch another known
>> issue for which fix is available:
>> https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1
>>
>> +1 to both of these
>>
>> On Thu, Mar 28, 2024 at 11:25 AM Yi Hu via dev 
>> wrote:
>>
>>> Thanks Valentyn for raising this. In this case, Python containers will
>>> also be included. Different from PyPI wheels, docker tag can override so it
>>> can stay with 2.55.0
>>>
>>> On Thu, Mar 28, 2024 at 11:15 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
>>>> If we do a patch release for Python SDK, let's also patch another known
>>>> issue for which fix is available:
>>>> https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1
>>>>
>>>> On Thu, Mar 28, 2024 at 8:01 AM Yi Hu via dev 
>>>> wrote:
>>>>
>>>>> 2.55.0 release manager here
>>>>>
>>>>> The patch itself [1] is trivial, however, the release process is not
>>>>> trivial. There is little documentation nor practice for a patch release
>>>>> process. I could imagine two options
>>>>>
>>>>> 1. Do a full "2.55.1" release
>>>>>
>>>>> 2. Do a patch release only for Python SDK, that is
>>>>>   a. cherry-pick [1] into release-2.55.0 branch
>>>>>   b. tag a 2.55.1rc1 release candidate - note that the source code of
>>>>> release candidate (e.g. apache_beam/version.py) still reads 2.55.0. This
>>>>> ensures Python SDK picks up the Java expansion service / job server of
>>>>> existing version (2.55.0). We did it once for Go SDK (
>>>>> https://github.com/apache/beam/tree/sdks/v2.48.2)
>>>>>   c. Build the release candidate for Python wheels (also Python
>>>>> containers? Not sure if it is needed)
>>>>>   d. send out the RC for validation
>>>>>   e. finalize the release
>>>>>
>>>>> If we decided to do a patch release I would prefer option 2. I can
>>>>> take on that if decided to do. However, if we decide do a full release (or
>>>>> both Java and Python) I would suggest defer to next release cycle, as the
>>>>> release process itself could take ~10 days minimum if there is single RC.
>>>>>
>>>>> Besides, there should be a Beam YAML validation workflow and added in
>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>>>>>
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/30780
>>>>>
>>>>> On Thu, Mar 28, 2024 at 10:22 AM Danny McCormick via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> +1 on a patch release - we've done a fair amount of work to make
>>>>>> releasing e

Re: Patch release proposal

2024-03-28 Thread Chamikara Jayalath via dev
Just releasing Python can break multi-lang by default (unless expansion
service is overridden manually) since we match versions across languages
when picking the default expansion service.

https://github.com/apache/beam/blob/2f8854a3e34f31c1cc034f95ad36f317abc906ff/sdks/python/apache_beam/utils/subprocess_server.py#L42

Thanks,
Cham

On Thu, Mar 28, 2024 at 8:26 AM Danny McCormick via dev 
wrote:

> > The patch itself [1] is trivial, however, the release process is not
> trivial. There is little documentation nor practice for a patch release
> process. I could imagine two options
>
> I think there's not a ton of documentation because we haven't done it, but
> all the release workflows were authored in such a way that they should
> "just work", outside of cutting the release branch itself. So the workflow
> should be almost identical to the existing one, but with several steps
> skipped (cherry picks, beam website, most validation). Notably, this
> shouldn't be any easier/harder if we're doing it for one language or all 3.
>
> I can take that on if needed.
>
> > Besides, there should be a Beam YAML validation workflow and added in
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>
> > If we do a patch release for Python SDK, let's also patch another known
> issue for which fix is available:
> https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1
>
> +1 to both of these
>
> On Thu, Mar 28, 2024 at 11:25 AM Yi Hu via dev 
> wrote:
>
>> Thanks Valentyn for raising this. In this case, Python containers will
>> also be included. Different from PyPI wheels, docker tag can override so it
>> can stay with 2.55.0
>>
>> On Thu, Mar 28, 2024 at 11:15 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> If we do a patch release for Python SDK, let's also patch another known
>>> issue for which fix is available:
>>> https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1
>>>
>>> On Thu, Mar 28, 2024 at 8:01 AM Yi Hu via dev 
>>> wrote:
>>>
 2.55.0 release manager here

 The patch itself [1] is trivial, however, the release process is not
 trivial. There is little documentation nor practice for a patch release
 process. I could imagine two options

 1. Do a full "2.55.1" release

 2. Do a patch release only for Python SDK, that is
   a. cherry-pick [1] into release-2.55.0 branch
   b. tag a 2.55.1rc1 release candidate - note that the source code of
 release candidate (e.g. apache_beam/version.py) still reads 2.55.0. This
 ensures Python SDK picks up the Java expansion service / job server of
 existing version (2.55.0). We did it once for Go SDK (
 https://github.com/apache/beam/tree/sdks/v2.48.2)
   c. Build the release candidate for Python wheels (also Python
 containers? Not sure if it is needed)
   d. send out the RC for validation
   e. finalize the release

 If we decided to do a patch release I would prefer option 2. I can take
 on that if decided to do. However, if we decide do a full release (or both
 Java and Python) I would suggest defer to next release cycle, as the
 release process itself could take ~10 days minimum if there is single RC.

 Besides, there should be a Beam YAML validation workflow and added in
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253


 [1] https://github.com/apache/beam/pull/30780

 On Thu, Mar 28, 2024 at 10:22 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> +1 on a patch release - we've done a fair amount of work to make
> releasing easier, and one of my hopes is that it will enable quick patches
> like this. I'd vote we try to fix the underlying Java piece as well,
> though, doing a patch release for one language shouldn't be significantly
> cheaper than doing it for multiple languages.
>
> Thanks,
> Danny
>
> On Wed, Mar 27, 2024 at 7:19 PM Robert Burke 
> wrote:
>
>> +1 to a targeted patch release.
>>
>> We did the same for the Go SDK a little while back. It would be good
>> to see what's different for a different SDK.
>>
>> On Wed, Mar 27, 2024, 4:01 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Given the severity of the breakage, and the simplicity of the
>>> workaround, I'm in favor of a patch release. I think we could do
>>> Python-only, which would make the process even more lightweight.
>>>
>>> On Wed, Mar 27, 2024 at 3:48 PM Jeff Kinard 
>>> wrote:
>>>
 Hi all,

 Beam 2.55 was released with a bug that causes WriteToJson on Beam
 YAML to fail when using the Java variant. This also affects any user
 attempting to use the Xlang JsonWriteTransformProvider -
 

Re: [VOTE] Release 2.55.0, release candidate #3

2024-03-20 Thread Chamikara Jayalath via dev
+1 (binding)

Tested multi-lang Java/Python pipelines and upgrading BQ/Kafka transforms
from 2.53.0 to 2.55.0 using the Transform Service.

Thanks,
Cham

On Tue, Mar 19, 2024 at 2:10 PM XQ Hu via dev  wrote:

> +1 (non-binding). Ran the simple ML pipeline without any issue:
> https://github.com/google/dataflow-ml-starter/actions/runs/8349158153
>
> On Tue, Mar 19, 2024 at 11:55 AM Ritesh Ghorse via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non-binding) - Ran a few python batch examples on Direct and Dataflow
>> runner.
>>
>> Thanks!
>>
>> On Tue, Mar 19, 2024 at 10:56 AM Yi Hu via dev 
>> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #3 for the version
>>> 2.55.0, as follows:
>>>
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if
>>> no issues are found. Only PMC member votes will count towards the final
>>> vote, but votes from all
>>> community members is encouraged and helpful for finding regressions; you
>>> can either test your own
>>> use cases [13] or use cases from the validation sheet [10].
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint D20316F712213422 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.55.0-RC3" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI [8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.55.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> RC testing guide [13].
>>>
>>> Thanks,
>>> Release Manager
>>>
>>> [1] https://github.com/apache/beam/milestone/19
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.55.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1373/
>>> [5] https://github.com/apache/beam/tree/v2.55.0-RC3
>>> [6] https://github.com/apache/beam/pull/30607
>>> [7] https://github.com/apache/beam-site/pull/661
>>> [8] https://pypi.org/project/apache-beam/2.55.0rc3/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.55.0-RC3/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1368030253
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/30569
>>> [13]
>>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>>
>>>
>>> --
>>>
>>> Yi Hu, (he/him/his)
>>>
>>> Software Engineer
>>>
>>>
>>>


Re: [API PROPOSAL] PTransform.getURN, toProto, etc, for Java

2024-02-14 Thread Chamikara Jayalath via dev
On Wed, Feb 14, 2024 at 10:28 AM Kenneth Knowles  wrote:

> Hi all,
>
> TL;DR I want to add some API like PTransform.getURN, toProto and
> fromProto, etc. to the Java SDK. I want to do this so that making a
> PTransform support portability is a natural part of writing the transform
> and not a totally separate thing with tons of boilerplate.
>
> What do you think?''
>

+1. Currently users have to look at two different places when it comes to
defining the transform and when it comes to defining the portabile
representation of the transform (urn, toProto etc.). It's much easier to
move these to a single interface given that we are fully committed to
portability.


>
> I think a particular API can be sorted out most easily in code (which I
> will prepare after gathering some feedback).
>

I think we basically want to move the API defined in the
TransformPayloadTranslator (or something similar to that) to the PTransform
class.
https://github.com/apache/beam/blob/bfa26a4d907d844aed4b938f88142ed0fc82c90f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L597

Python SDK already has toRunnerAPI/fromRunnerAPI interface methods defined
in the PTransform class.
https://github.com/apache/beam/blob/bfa26a4d907d844aed4b938f88142ed0fc82c90f/sdks/python/apache_beam/transforms/ptransform.py#L747

I would also like to call out the newly added PTransform constructor
toConfigRow/fromConfigRow interface methods which I think should also move
to the PTransform class.
https://github.com/apache/beam/blob/bfa26a4d907d844aed4b938f88142ed0fc82c90f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L634

Thanks,
Cham


> We already have all the translation logic written, and porting a couple
> transforms to it will ensure the API has everything we need. We can refer
> to Python and Go for API ideas as well.
>
> Lots of context below, but you can skip it...
>
> -
>
> When we first created the portability framework, we wanted the SDKs to be
> "standalone" and not depend on portability. We wanted portability to be an
> optional plugin that users could opt in to. That is totally the opposite
> now. We want portability to be the main place where Beam is defined, and
> then SDKs make that available in language idiomatic ways.
>
> Also when we first created the framework, we were experimenting with
> different serialization approaches and we wanted to be independent of
> protobuf and gRPC if we could. But now we are pretty committed and it would
> be a huge lift to use anything else.
>
> Finally, at the time we created the portability framework, we designed it
> to allow composites to have URNs and well-defined specs, rather than just
> be language-specific subgraphs, but we didn't really plan to make this easy.
>
> For all of the above, most users depend on portability and on proto. So
> separating them is not useful and just creates LOTS of boilerplate and
> friction for making new well-defined transforms.
>
> Kenn
>


Re: [ANNOUNCE] New Committer: Svetak Sundhar

2024-02-13 Thread Chamikara Jayalath via dev
Congrats Svetak!

On Tue, Feb 13, 2024 at 4:39 PM Svetak Sundhar via dev 
wrote:

> Thanks everyone!! Looking forward to the continued collaboration :)
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Mon, Feb 12, 2024 at 9:58 PM Byron Ellis via dev 
> wrote:
>
>> Congrats Svetak!
>>
>> On Mon, Feb 12, 2024 at 6:57 PM Shunping Huang via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations, Svetak!
>>>
>>> On Mon, Feb 12, 2024 at 9:50 PM XQ Hu via dev 
>>> wrote:
>>>
 Great job, Svetak! Thanks for all your contributions to Beam!!!

 On Mon, Feb 12, 2024 at 4:44 PM Valentyn Tymofieiev via dev <
 dev@beam.apache.org> wrote:

> Congrats, Svetak!
>
> On Mon, Feb 12, 2024 at 11:20 AM Kenneth Knowles 
> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Svetak Sundhar (sve...@apache.org).
>>
>> Svetak has been with Beam since 2021. Svetak has contributed code to
>> many areas of Beam, including notebooks, Beam Quest, dataframes, and IOs.
>> We also want to especially highlight the effort Svetak has put into
>> improving Beam's documentation, participating in release validation, and
>> evangelizing Beam.
>>
>> Considering his contributions to the project over this timeframe, the
>> Beam PMC trusts Svetak with the responsibilities of a Beam committer. [1]
>>
>> Thank you Svetak! And we are looking to see more of your
>> contributions!
>>
>> Kenn, on behalf of the Apache Beam PMC
>>
>> [1]
>>
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>


Re: [VOTE] Vendored Dependencies Release

2024-02-12 Thread Chamikara Jayalath via dev
+1 (binding)

Thanks,
Cham

On Fri, Feb 9, 2024 at 5:25 AM Sam Whittle  wrote:

> Please review the release of the following artifacts that we vendor,
> following the process [5]:
>
>  * beam-vendor-grpc-1-60-1:0.2
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version
> beam-vendor-grpc-1-60-1:0.2 as follows:
>
> [ ] +1, Approve the release
>
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
>
> * the official Apache source release to be deployed to dist.apache.org
> [1], which is signed with the key with fingerprint FCFD152811BF1578 [2],
>
> * all artifacts to be deployed to the Maven Central Repository [3],
>
> * commit hash "2d08b32e674a1046ba7be0ae5f1e4b7b05b73488" [4].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
>
> Sam
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1369/
>
> [4]
> https://github.com/apache/beam/commit/2d08b32e674a1046ba7be0ae5f1e4b7b05b73488
>
> [5] https://s.apache.org/beam-release-vendored-artifacts
>


Re: [VOTE] Release 2.54.0, release candidate #2

2024-02-08 Thread Chamikara Jayalath via dev
+1 (binding)

Tried out Java/Python multi-lang jobs and upgrading BQ/Kafka transforms
from 2.53.0 to 2.54.0 using the Transform Service.

Thanks,
Cham

On Wed, Feb 7, 2024 at 5:52 PM XQ Hu via dev  wrote:

> +1 (non-binding)
>
> Validated with a simple RunInference Python pipeline:
> https://github.com/google/dataflow-ml-starter/actions/runs/7821639833/job/21339032997
>
> On Wed, Feb 7, 2024 at 7:10 PM Yi Hu via dev  wrote:
>
>> +1 (non-binding)
>>
>> Validated with Dataflow Template:
>> https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1317
>>
>> Regards,
>>
>> On Wed, Feb 7, 2024 at 11:18 AM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Ran a few batch and streaming examples for Python SDK on Dataflow Runner
>>>
>>> Thanks!
>>>
>>> On Wed, Feb 7, 2024 at 4:08 AM Jan Lukavský  wrote:
>>>
 +1 (binding)

 Validated Java SDK with Flink runner.

  Jan
 On 2/7/24 06:23, Robert Burke via dev wrote:

 Hi everyone,
 Please review and vote on the release candidate #2 for the version
 2.54.0,
 as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)


 Reviewers are encouraged to test their own use cases with the release
 candidate, and vote +1 if
 no issues are found. Only PMC member votes will count towards the final
 vote, but votes from all
 community members is encouraged and helpful for finding regressions; you
 can either test your own
 use cases [13] or use cases from the validation sheet [10].

 The complete staging area is available for your review, which includes:
 * GitHub Release notes [1],
 * the official Apache source release to be deployed to dist.apache.org
 [2],
 which is signed with the key with fingerprint D20316F712213422 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.54.0-RC2" [5],
 * website pull request listing the release [6], the blog post [6], and
 publishing the API reference manual [7].
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2] and PyPI[8].
 * Go artifacts and documentation are available at pkg.go.dev [9]
 * Validation sheet with a tab for 2.54.0 release to help with validation
 [10].
 * Docker images published to Docker Hub [11].
 * PR to run tests against release branch [12].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 For guidelines on how to try the release in your projects, check out
 our RC
 testing guide [13].

 Thanks,
 Robert Burke
 Beam 2.54.0 Release Manager

 [1] https://github.com/apache/beam/milestone/18?closed=1
 [2] https://dist.apache.org/repos/dist/dev/beam/2.54.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1368/
 [5] https://github.com/apache/beam/tree/v2.54.0-RC2
 [6] https://github.com/apache/beam/pull/30201
 [7] https://github.com/apache/beam-site/pull/659
 [8] https://pypi.org/project/apache-beam/2.54.0rc2/
 [9]

 https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.54.0-RC2/go/pkg/beam
 [10]

 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=28763708
 [11] https://hub.docker.com/search?q=apache%2Fbeam=image
 [12] https://github.com/apache/beam/pull/30104
 [13]

 https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md




Re: [VOTE] Vendored Dependencies Release

2024-01-22 Thread Chamikara Jayalath via dev
+1 (binding).

Thanks,
Cham

On Mon, Jan 22, 2024 at 7:40 AM Yi Hu via dev  wrote:

> > Notably, the vendored artifact has no impact on the repo until the
> version used is also bumped.
>
> That is correct. The PR that actually bump the version then the change
> takes effect will be like https://github.com/apache/beam/pull/29976
>
> On Mon, Jan 22, 2024 at 10:11 AM Kenneth Knowles  wrote:
>
>> Notably, the vendored artifact has no impact on the repo until the
>> version used is also bumped, right? So the release is very low stakes.
>>
>> Kenn
>>
>> On Fri, Jan 19, 2024 at 4:55 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Thanks.
>>>
>>> +1
>>>
>>>
>>> On Fri, Jan 19, 2024 at 1:24 PM Yi Hu  wrote:
>>>
 The process I have been following is [1]. I have also suggested edits
 to the voting email template to include the self-link. However, does anyone
 can edit this doc so the change can be made? Otherwise we might better to
 migrate this doc to
 https://github.com/apache/beam/tree/master/contributor-docs

 [1] https://s.apache.org/beam-release-vendored-artifacts

 On Thu, Jan 18, 2024 at 2:56 PM Robert Bradshaw via dev <
 dev@beam.apache.org> wrote:

> Could you explain the process you used to produce these artifacts?
>
> On Thu, Jan 18, 2024 at 11:23 AM Kenneth Knowles 
> wrote:
>
>> +1
>>
>> On Wed, Jan 17, 2024 at 6:03 PM Yi Hu via dev 
>> wrote:
>>
>>> Hi everyone,
>>>
>>>
>>> Please review the release of the following artifacts that we vendor:
>>>
>>>  * beam-vendor-grpc-1_60_1
>>>
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 0.1, as follows:
>>>
>>> [ ] +1, Approve the release
>>>
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>>
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [1], which is signed with the key with fingerprint
>>> 8935B943A188DE65 [2],
>>>
>>> * all artifacts to be deployed to the Maven Central Repository [3],
>>>
>>> * commit hash "52b4a9cb58e486745ded7d53a5b6e2d2312e9551" [4],
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>>
>>> Release Manager
>>>
>>> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>>>
>>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>
>>> [3]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1366/
>>>
>>> [4]
>>> https://github.com/apache/beam/commits/52b4a9cb58e486745ded7d53a5b6e2d2312e9551/
>>>
>>>
>>> --
>>>
>>> Yi Hu, (he/him/his)
>>>
>>> Software Engineer
>>>
>>>
>>>


Re: Re: [YAML] ReadFromKafka with yaml

2024-01-11 Thread Chamikara Jayalath via dev
To use "ReadFromKafka" from Flink, you additionally need to
specify pipeline option "--experiments=use_deprecated_read" I believe. This
is due to a known issue: https://github.com/apache/beam/issues/20979

Thanks,
Cham

On Wed, Jan 10, 2024 at 9:56 PM Yarden BenMoshe  wrote:

> Thanks for the detailed answer.
> I forgot to mention that I am using FlinkRunner as my   Setup. Will this
> work with this runner as well?
>
>
> On 2024/01/10 13:34:28 Ferran Fernández Garrido wrote:
> > Hi Yarden,
> >
> > If you are using Dataflow as a runner, you can already use
> > ReadFromKafka (introduced originally in version 2.52). Dataflow will
> > handle the expansion service automatically, so you don't have to do
> > anything.
> >
> > If you want to run it locally for development purposes, you'll have to
> > build the Docker image. You can check out the project and run:
> >
> > ./gradlew :sdks:java:container:java8:docker
> > -Pdocker-repository-root=$DOCKER_ROOT -Pdocker-tag=latest (DOCKER ROOT
> > -> repo location)
> >
> > Then, for instance, if you want to run your custom Docker image in
> > Dataflow, you could do this:
> >
> > (Build the Python SDK -> python setup.py sdist to get
> > apache-beam-2.53.0.dev0.tar.gz)
> >
> > You'll have to build the expansion service that Kafka uses (in case
> > you've changed something in the KafkaIO) : ./gradlew
> > :sdks:java:io:expansion-service:build
> >
> > python3 -m apache_beam.yaml.main --runner=DataflowRunner
> > --project=project_id --region=region --temp_location=temp_location
> > --pipeline_spec_file=yaml_pipeline.yml
> > --staging_location=staging_location
> > --sdk_location="path/apache-beam-2.53.0.dev0.tar.gz"
> > --sdk_harness_container_image_overrides=".*java.*,$DOCKER_ROOT:latest"
> > --streaming
> >
> > This is an example of how to read JSON events from Kafka in Beam YAML:
> >
> > - type: ReadFromKafka
> > config:
> > topic: 'TOPIC_NAME'
> > format: JSON
> > bootstrap_servers: 'BOOTSTRAP_SERVERS'
> > schema: 'JSON_SCHEMA'
> >
> > Best,
> > Ferran
> >
> > El mié, 10 ene 2024 a las 14:11, Yarden BenMoshe
> > () escribió:
> > >
> > > Hi,
> > >
> > > I am trying to consume a kafka topic using ReadFromKafka transform.
> > >
> > > If i got it right, since ReadFromKafka is originally written in java,
> an expansion service is needed and default env is set to DOCKER, and in
> current implementation I can see that expansion service field is not
> adjustable (im not able to pass it as part of the transform's config).
> > > Is there currently a way to ReadFromKafka from a pipeline written with
> yaml api? If so, an explanation would be much appreciated.
> > >
> > > I saw there's some workaround suggested online of using
> Docker-in-Docker but would prefer to avoid it.
> > >
> > > Thanks
> > > Yarden
> >
>


Re: [PROPOSAL] Upgrade vendor grpc

2024-01-11 Thread Chamikara Jayalath via dev
Sounds good and thanks for doing this :)

- Cham

On Thu, Jan 11, 2024 at 8:06 AM Yi Hu via dev  wrote:

> Hi everyone,
>
> I would like to volunteer to upgrade the Beam vendored grpc, as requested
> by the GitHub Issue [1]. The last update was in Apr 2023 [2]. There have
> been vulnerabilities in its dependencies as well as potential oom issues
> found since then (see [1]), and also to include grpc-alts [2].
>
> My plan is to follow the release process [3, 4], which involves preparing
> for the release, building a candidate, voting and finalizing the release.
> Then the vendored artifact is targeted to be integrated by Beam v2.54.0
> onwards (cut date Jan 24, 2024).
>
> Please let me know if you have any comments/objections/questions.
>
> Thanks,
>
> Yi
>
> [1] https://github.com/apache/beam/issues/29861
> [2] https://github.com/apache/beam/issues/25746
> [3] https://github.com/apache/beam/tree/master/vendor
> [4]
> https://docs.google.com/document/d/1ztEoyGkqq9ie5riQxRtMuBu3vb6BUO91mSMn1PU0pDA/edit#heading=h.vhcuqlttpnog
> --
>
> Yi Hu, (he/him/his)
>
> Software Engineer
>
>
>


Re: [VOTE] Release 2.53.0, release candidate #2

2024-01-03 Thread Chamikara Jayalath via dev
+1 (binding)

Validated Java/Python x-lang jobs.

- Cham

On Tue, Jan 2, 2024 at 7:35 AM Jack McCluskey via dev 
wrote:

> Happy New Year, everyone!
>
> Now that we're through the holidays I just wanted to bump the voting
> thread so we can keep the RC moving.
>
> Thanks,
>
> Jack McCluskey
>
> On Fri, Dec 29, 2023 at 11:58 AM Johanna Öjeling via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non-binding).
>>
>> Tested Go SDK with Dataflow on own use cases.
>>
>> On Fri, Dec 29, 2023 at 2:57 AM Yi Hu via dev 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Tested with Beam GCP IOs benchmarking (
>>> https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/main/it/google-cloud-platform
>>> )
>>>
>>> On Thu, Dec 28, 2023 at 11:36 AM Svetak Sundhar via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 (non binding)

 Tested with Healthcare notebooks.


 Svetak Sundhar

   Data Engineer
 s vetaksund...@google.com



 On Thu, Dec 28, 2023 at 3:52 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Tested Java SDK with Flink Runner.
>
>  Jan
> On 12/27/23 14:13, Danny McCormick via dev wrote:
>
> +1 (non-binding)
>
> Tested with some example ML notebooks.
>
> Thanks,
> Danny
>
> On Tue, Dec 26, 2023 at 6:41 PM XQ Hu via dev 
> wrote:
>
>> +1 (non-binding)
>>
>> Tested with the simple RunInference pipeline:
>> https://github.com/google/dataflow-ml-starter/actions/runs/7332832875/job/19967521369
>>
>> On Tue, Dec 26, 2023 at 3:29 PM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Happy holidays everyone,
>>>
>>> Please review and vote on the release candidate #2 for the version
>>> 2.53.0, as follows:
>>>
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the
>>> release candidate, and vote +1 if no issues are found. Only PMC member
>>> votes will count towards the final vote, but votes from all community
>>> members are encouraged and helpful for finding regressions; you can 
>>> either
>>> test your own use cases [13] or use cases from the validation sheet 
>>> [10].
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> DF3CBA4F3F4199F4 (D20316F712213422 if automated) [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v1.2.3-RC3" [5],
>>> * website pull request listing the release [6], the blog post [6],
>>> and publishing the API reference manual [7].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.53.0 release to help with
>>> validation [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out
>>> our RC testing guide [13].
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> [1] https://github.com/apache/beam/milestone/17
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.53.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1365/
>>> [5] https://github.com/apache/beam/tree/v2.53.0-RC2
>>> [6] https://github.com/apache/beam/pull/29856
>>> [7] https://github.com/apache/beam-site/pull/657
>>> [8] https://pypi.org/project/apache-beam/2.53.0rc2/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.53.0-RC2/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1290249774
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/29758
>>> [13]
>>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>>
>>>
>>> --
>>>
>>>
>>> Jack McCluskey
>>> SWE - DataPLS PLAT/ Dataflow ML
>>> RDU
>>> jrmcclus...@google.com
>>>
>>>
>>>


Re: How do side inputs relate to stage fusion?

2023-12-15 Thread Chamikara Jayalath via dev
Created related feature request https://github.com/apache/beam/issues/29789

We have to put more thought into exactly how to come up with merged
environments that do not result in conflicts. I prefer trying to
automatically do this on the SDK side instead of pushing the complexity to
the user (for example, isolating dependencies within the same environment
using classloaders for Java).

Thanks,
Cham

On Fri, Dec 15, 2023 at 1:36 PM Joey Tran  wrote:

> Yeah, we already have `ResourceHint.get_merged_value(cls, outer_value,
> inner_value)` for reconciling resources within a composite, in the future
> we could possibly just have another similar method and have the environment
> merging logic hook into that.
>
> On Fri, Dec 15, 2023 at 3:53 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> There is definitely a body of future work in intelligently merging
>> compatible-but-not-equal environments. (Dataflow does this for example.)
>> Defining/detecting compatibility is not always easy, but sometimes is, and
>> we should at least cover those cases and grow them over time.
>>
>> On Fri, Dec 15, 2023 at 5:57 AM Joey Tran 
>> wrote:
>>
>>> Yeah I can confirm for the python runners (based on my reading of the
>>> translations.py [1]) that only identical environments are merged together.
>>>
>>> The funny thing is that we _originally_ implemented this hint as an
>>> annotation but then changed it to hint because it semantically felt more
>>> correct. I think we might go back to that since the environment merging
>>> logic isn't too flexible / easy to customize. Our type of hint is a bit
>>> unlike other hints anyways. Unlike resources like MinRam, these resources
>>> are additive (e.g. you can merge an environment that requires license A and
>>> an environment that requires license B into an environment that requires
>>> both A and B)
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/5fb4db31994d7c2c1e04d32a4b153bc83d739f36/sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py#L4
>>>
>>> On Fri, Dec 15, 2023 at 8:43 AM Robert Burke  wrote:
>>>
 That would do it. We got so tunnel visioned on side inputs we missed
 that!

 IIRC the python local runner and Prism both only fuse transforms in
 identical environments together. So any environmental diffs will prevent
 fusion.

 Runners as a rule are usually free to ignore/manage hints as they like.
 Transform annotations might be an alternative, but how those are managed
 would be more SDK specific.

 On Fri, Dec 15, 2023, 5:21 AM Joey Tran 
 wrote:

> I figured out my issue. I thought side inputs were breaking up my
> pipeline but after experimenting with my transforms I now realize what was
> actually breaking it up was different transform environments that weren't
> considered compatible.
>
> We have a custom resource hint (for specifying whether a transform
> needs access to some software license) that we use with our transforms and
> that's what was preventing the fusion I was expecting. I'm I'm looking 
> into
> how to make these hints mergeable now.
>
> On Thu, Dec 14, 2023 at 7:46 PM Robert Burke 
> wrote:
>
>> Building on what Robert Bradshaw has said, basically, if these fusion
>> breaks don't exist, the pipeline can live lock, because the side input is
>> unable to finish computing for a given input element's window.
>>
>> I have recently added fusion to the Go Prism runner based on the
>> python side input semantics, and i was surprised that there are basically
>> two rules for fusion. The side input one, and for handling Stateful
>> processing.
>>
>>
>> This code here is the greedy fusion algorithm that Python uses, but a
>> less set based, so it might be easier to follow:
>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/preprocess.go#L513
>>
>> From the linked code comment:
>>
>> Side Inputs: A transform S consuming a PCollection as a side input
>> can't
>>  be fused with the transform P that produces that PCollection.
>> Further,
>> no transform S+ descended from S, can be fused with transform P.
>>
>> Ideally I'll add visual representations of the graphs in the test
>> suite here, that validates the side input dependency logic:
>>
>>
>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/preprocess_test.go#L398
>>
>> (Note, that test doesn't validate expected fusion results, Prism is a
>> work in progress).
>>
>>
>> As for the Stateful rule, this is largely an implementation
>> convenience for runners to ensure correct execution.
>> If your pipeline also uses Stateful transforms, or SplittableDoFns,
>> those are usually relegated to the root of a fused stage, and avoids
>> fusions with each other. 

Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-10 Thread Chamikara Jayalath via dev
+1 (binding).

Tested multi-lang Java/Python jobs.

Thanks,
Cham

On Fri, Nov 10, 2023, 12:28 PM Svetak Sundhar via dev 
wrote:

> +1 Non Binding -- tested Python SDK batch.
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Fri, Nov 10, 2023 at 2:58 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > Note: the release guide
>> 
>>  and blog post
>> 
>>  say
>> the RC image has a tag "${RELEASE_VERSION}_rc{RC_NUM}", whereas the actual
>> tags on Docker Hub are mostly "${RELEASE_VERSION}rc{RC_NUM}" without the
>> "_" since 2.40.0. If this is the new standard we may want to update all
>> places where this is stated?
>>
>> Yep, we should update! If you put up a PR I'm happy to approve :)
>> otherwise I can loop it into my post release docs update.
>>
>> Thanks,
>> Danny
>>
>> On Fri, Nov 10, 2023 at 2:00 PM Johanna Öjeling via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Tested the Go SDK on Dataflow with own use cases.
>>>
>>> Note: the release guide
>>> 
>>>  and blog post
>>> 
>>>  say
>>> the RC image has a tag "${RELEASE_VERSION}_rc{RC_NUM}", whereas the actual
>>> tags on Docker Hub are mostly "${RELEASE_VERSION}rc{RC_NUM}" without the
>>> "_" since 2.40.0. If this is the new standard we may want to update all
>>> places where this is stated?
>>>
>>> Johanna
>>>
>>> On Fri, Nov 10, 2023 at 5:56 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 (binding)

 Artifacts and signatures look good, validated one of the Python wheels
 in a fresh install.

 On Fri, Nov 10, 2023 at 7:23 AM Alexey Romanenko
  wrote:
 >
 > +1 (binding)
 >
 > Java SDK with Spark runner
 >
 > —
 > Alexey
 >
 > On 9 Nov 2023, at 16:44, Ritesh Ghorse via dev 
 wrote:
 >
 > +1 (non-binding)
 >
 > Validated Python SDK quickstart batch and streaming.
 >
 > Thanks!
 >
 > On Thu, Nov 9, 2023 at 9:25 AM Jan Lukavský  wrote:
 >>
 >> +1 (binding)
 >>
 >> Validated Java SDK with Flink runner on own use cases.
 >>
 >>  Jan
 >>
 >> On 11/9/23 03:31, Danny McCormick via dev wrote:
 >>
 >> Hi everyone,
 >> Please review and vote on the release candidate #3 for the version
 2.52.0, as follows:
 >> [ ] +1, Approve the release
 >> [ ] -1, Do not approve the release (please provide specific comments)
 >>
 >>
 >> Reviewers are encouraged to test their own use cases with the
 release candidate, and vote +1 if no issues are found. Only PMC member
 votes will count towards the final vote, but votes from all community
 members is encouraged and helpful for finding regressions; you can either
 test your own use cases or use cases from the validation sheet [10].
 >>
 >> The complete staging area is available for your review, which
 includes:
 >>
 >> GitHub Release notes [1]
 >> the official Apache source release to be deployed to dist.apache.org
 [2], which is signed with the key with fingerprint D20316F712213422 [3]
 >> all artifacts to be deployed to the Maven Central Repository [4]
 >> source code tag "v2.52.0-RC3" [5]
 >> website pull request listing the release [6], the blog post [6], and
 publishing the API reference manual [7]
 >> Python artifacts are deployed along with the source release to the
 dist.apache.org [2] and PyPI[8].
 >> Go artifacts and documentation are available at pkg.go.dev [9]
 >> Validation sheet with a tab for 2.52.0 release to help with
 validation [10]
 >> Docker images published to Docker Hub [11]
 >> PR to run tests against release branch [12]
 >>
 >>
 >> The vote will be open for at least 72 hours. It is adopted by
 majority approval, with at least 3 PMC affirmative votes.
 >>
 >> For guidelines on how to try the release in your projects, check out
 our blog post at https://beam.apache.org/blog/validate-beam-release/.
 >>
 >> Thanks,
 >> Danny
 >>
 >> [1] https://github.com/apache/beam/milestone/16
 >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
 >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 >> [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1361/
 >> [5] https://github.com/apache/beam/tree/v2.52.0-RC3
 >> [6] https://github.com/apache/beam/pull/29331
 >> [7] https://github.com/apache/beam-site/pull/653
 >> [8] 

Re: [Discuss] Idea to increase RC voting participation

2023-10-24 Thread Chamikara Jayalath via dev
+1 for going by the commits since this is what matters at the end of the
day. Also, many issues may not get tagged correctly for a given release due
to either the contributor not tagging the issue or due to commits for the
issue spanning multiple Beam releases.

For example,

For all commits in a given release RC:
  * If we find a Github issue for the commit: add a notice to the Github
issue
  * Else: add the notice to a generic issue for the release including tags
for the commit ID, PR author, and the committer who merged the PR.

Thanks,
Cham




On Mon, Oct 23, 2023 at 11:49 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> I'd probably vote to include both the issue filer and the contributor. It
> is pretty equally straightforward - one way to do this would be using all
> issues related to that release's milestone and extracting the issue author
> and the issue closer.
>
> This does leave out the (unfortunately sizable) set of contributions that
> don't have an associated issue; if we're worried about that, we could
> always fall back to anyone with a commit in the last release who doesn't
> have an associated issue (aka what I thought we were initially proposing
> and what I think Airflow does today).
>
> I'm pretty much +1 on any sort of automation here, and it certainly can
> come in stages :)
>
> On Mon, Oct 23, 2023 at 1:50 PM Johanna Öjeling via dev <
> dev@beam.apache.org> wrote:
>
>> Yes that's a good point to include also those who created the issue.
>>
>> On Mon, Oct 23, 2023, 19:18 Robert Bradshaw via dev 
>> wrote:
>>
>>> On Mon, Oct 23, 2023 at 7:26 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 So to summarize, I think there's broad consensus (or at least lazy
 consensus) around the following:

 - (1) Updating our release email/guidelines to be more specific about
 what we mean by release validation/how to be helpful during this process.
 This includes both encouraging validation within each user's own code base
 and encouraging people to document/share their process of validation and
 link it in the release spreadsheet.
 - (2) Doing something like what Airflow does (#29424
 ) and creating an
 issue asking people who have contributed to the current release to help
 validate their changes.

 I'm also +1 on doing both of these. The first bit (updating our
 guidelines) is relatively easy - it should just require updating
 https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#vote-and-validate-the-release-candidate
 .

 I took a look at the second piece (copying what Airflow does) to see if
 we could just copy their automation, but it looks like it's tied to
 airflow breeze
 
 (their repo-specific automation tooling), so we'd probably need to build
 the automation ourselves. It shouldn't be terrible, basically we'd want a
 GitHub Action that compares the current release tag with the last release
 tag, grabs all the commits in between, parses them to get the author, and
 creates an issue with that data, but it does represent more effort than
 just updating a markdown file. There might even be an existing Action that
 can help with this, I haven't looked too hard.

>>>
>>> I was thinking along the lines of a script that would scrape the issues
>>> resolved in a given release and add a comment to them noting that the
>>> change is in release N and encouraging (with clear instructions) how this
>>> can be validated. Creating a "validate this release" issue with all
>>> "contributing" participants could be an interesting way to do this as well.
>>> (I think it'd be valuable to get those who filed the issue, not just those
>>> who fixed it, to validate.)
>>>
>>>
 As our next release manager, I'm happy to review PRs for either of
 these if anyone wants to volunteer to help out. If not, I'm happy to update
 the guidelines, but I probably won't have time to add the commit inspection
 tooling (I'm planning on throwing any extra time towards continuing to
 automate release candidate creation which is currently a more impactful
 problem IMO). I would very much like it if both of these things happened
 though :)

 Thanks,
 Danny

 On Mon, Oct 23, 2023 at 10:05 AM XQ Hu  wrote:

> +1. This is a great idea to try. @Danny McCormick
>  FYI as our next release manager.
>
> On Wed, Oct 18, 2023 at 2:30 PM Johanna Öjeling via dev <
> dev@beam.apache.org> wrote:
>
>> When I have contributed to Apache Airflow, they have tagged all
>> contributors concerned in a GitHub issue when the RC is available and 
>> asked
>> us to validate it. Example: #29424
>> 

Re: [ANNOUNCE] New Committer: Byron Ellis

2023-10-16 Thread Chamikara Jayalath via dev
Congrats Byron!

On Mon, Oct 16, 2023 at 9:32 AM Kenneth Knowles  wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Byron Ellis (b...@apache.org).
>
> Byron has been with Beam for over a year now. You may all know him as the
> guy who just decided to write a Swift SDK :-). In addition to that big
> contribution Byron has also fixed plenty of bugs, prototyped DBT-tyle
> pipeline authoring, and participated in our collective decision-making
> process.
>
> Considering his contributions to the project over this timeframe, the
> Beam PMC trusts Byron with the responsibilities of a Beam committer. [1]
>
> Thank you Byron! And we are looking to see more of your contributions!
>
> Kenn, on behalf of the Apache Beam PMC
>
> [1]
>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>


Re: [ANNOUNCE] New Committer: Sam Whittle

2023-10-16 Thread Chamikara Jayalath via dev
Congrats Sam!

On Mon, Oct 16, 2023 at 9:32 AM Kenneth Knowles  wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Sam Whittle (scwhit...@apache.org).
>
> Sam has been contributing to Beam since 2016! In particular, he
> specializes in streaming and the Dataflow Java worker but his contributions
> expand naturally from there to the Java SDK, IOs, and even a bit of Python
> :-). Sam has contributed a ton of code over the years and is generous in
> code review and sharing his expertise.
>
> Considering his contributions to the project over this timeframe, the
> Beam PMC trusts Sam with the responsibilities of a Beam committer. [1]
>
> Thank you Sam! And we are looking to see more of your contributions!
>
> Kenn, on behalf of the Apache Beam PMC
>
> [1]
>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>


Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-13 Thread Chamikara Jayalath via dev
On Thu, Oct 12, 2023 at 4:59 PM Robert Bradshaw  wrote:

> OK, so how about this for a concrete proposal:
>
> sink:
>   type: WriteToParquet
>   config:
> path:
> "/beam/filesytem/{record.my_col}-{timestamp.year}{timestamp.month}{timestamp.day}"
>

This is an example, right ? So basically the path can be parameterized
using the fields of the Beam schema of input elements ?


> suffix: ".parquet"
>
> The eventual path would be . The suffix
> would be optional, and there could be a default for the specific file
> format. A file format could inspect a provided suffix like ".csv.gz" to
> infer compression as well.
>
> Note that this doesn't have any special indicators for being dynamic other
> than the {}'s. Also, my_col would be written as part of the data (but we
> could add an extra "elide" config parameter that takes a list of columns to
> exclude if desired).
>

I think this is already dynamic given that the path can be
parameterized using the input. For example,

path:
"/beam/filesytem/{record.destination_col}/-{timestamp.year}{timestamp.month}{timestamp.day}"


> We could call this "prefix" rather than path. (Path is symmetric with
> reading, but prefix is a bit more direct.) Anyone want to voice
> their opinion here?
>

I'm fine with either.

Thanks,
Cham


>
>
>
>
> On Wed, Oct 11, 2023 at 9:01 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Wed, Oct 11, 2023 at 6:55 AM Kenneth Knowles  wrote:
>>
>>> So, top-posting because the threading got to be a lot for me and I think
>>> it forked a bit too... I may even be restating something someone said, so
>>> apologies for that.
>>>
>>> Very very good point about *required* parameters where if you don't use
>>> them then you will end up with two writers writing to the same file. The
>>> easiest example to work with might be if you omitted SHARD_NUM so all
>>> shards end up clobbering the same file.
>>>
>>> I think there's a unifying perspective between prefix/suffix and the
>>> need to be sure to include critical sharding variables. Essentially it is
>>> my point about it being a "big data fileset". It is perhaps unrealistic but
>>> ideally the user names the big data fileset and then the mandatory other
>>> pieces are added outside of their control. For example if I name my big
>>> data fileset "foo" then that implicitly means that "foo" consists of all
>>> the files named "foo/${SHARD_NUM}-of-${SHARD_TOTAL}". And yes now that I
>>> re-read I see you basically said the same thing. In some cases the required
>>> fields will include $WINDOW, $KEY, and $PANE_INDEX, yes? Even though the
>>> user can think of it as a textual template, if we can use a library that
>>> yields an abstract syntax tree for the expression we can easily check these
>>> requirements in a robust way - or we could do it in a non-robust way be
>>> string-scraping ourselves.
>>>
>>
>> Yes. I think we are talking about the same thing. Users should not have
>> full control over the filename since that could lead to conflicts and data
>> loss when data is being written in parallel from multiple workers. Users
>> can refer to the big data fileset being written using the glob "/**".
>> In addition users have control over the filename  and 
>> (file extension, for example) which can be useful for some downstream
>> use-cases. Rest of the filename will be filled out by the SDK (window, pane
>> etc.) to make sure that the files written by different workers do not
>> conflict.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> We actually are very close to this in FileIO. I think the interpretation
>>> of "prefix" is that it is the filename "foo" as above, and "suffix" is
>>> really something like ".txt" that you stick on the end of everything for
>>> whatever reason.
>>>
>>> Kenn
>>>
>>> On Tue, Oct 10, 2023 at 7:12 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> On Tue, Oct 10, 2023 at 4:05 PM Chamikara Jayalath <
>>>> chamik...@google.com> wrote:
>>>>
>>>>>
>>>>> On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw 
>>>>> wrote:
>>>>>
>>>>>> On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath <
>>>>>> chamik...@google.com> wrote:
>>>>>>
>>>&g

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-11 Thread Chamikara Jayalath via dev
On Wed, Oct 11, 2023 at 6:55 AM Kenneth Knowles  wrote:

> So, top-posting because the threading got to be a lot for me and I think
> it forked a bit too... I may even be restating something someone said, so
> apologies for that.
>
> Very very good point about *required* parameters where if you don't use
> them then you will end up with two writers writing to the same file. The
> easiest example to work with might be if you omitted SHARD_NUM so all
> shards end up clobbering the same file.
>
> I think there's a unifying perspective between prefix/suffix and the need
> to be sure to include critical sharding variables. Essentially it is my
> point about it being a "big data fileset". It is perhaps unrealistic but
> ideally the user names the big data fileset and then the mandatory other
> pieces are added outside of their control. For example if I name my big
> data fileset "foo" then that implicitly means that "foo" consists of all
> the files named "foo/${SHARD_NUM}-of-${SHARD_TOTAL}". And yes now that I
> re-read I see you basically said the same thing. In some cases the required
> fields will include $WINDOW, $KEY, and $PANE_INDEX, yes? Even though the
> user can think of it as a textual template, if we can use a library that
> yields an abstract syntax tree for the expression we can easily check these
> requirements in a robust way - or we could do it in a non-robust way be
> string-scraping ourselves.
>

Yes. I think we are talking about the same thing. Users should not have
full control over the filename since that could lead to conflicts and data
loss when data is being written in parallel from multiple workers. Users
can refer to the big data fileset being written using the glob "/**".
In addition users have control over the filename  and 
(file extension, for example) which can be useful for some downstream
use-cases. Rest of the filename will be filled out by the SDK (window, pane
etc.) to make sure that the files written by different workers do not
conflict.

Thanks,
Cham


>
> We actually are very close to this in FileIO. I think the interpretation
> of "prefix" is that it is the filename "foo" as above, and "suffix" is
> really something like ".txt" that you stick on the end of everything for
> whatever reason.
>
> Kenn
>
> On Tue, Oct 10, 2023 at 7:12 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> On Tue, Oct 10, 2023 at 4:05 PM Chamikara Jayalath 
>> wrote:
>>
>>>
>>> On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw 
>>> wrote:
>>>
>>>> On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath <
>>>> chamik...@google.com> wrote:
>>>>
>>>>>
>>>>> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax  wrote:
>>>>>
>>>>>> I suspect some simple pattern templating would solve most use cases.
>>>>>> We probably would want to support timestamp formatting (e.g. $ $M $D)
>>>>>> as well.
>>>>>>
>>>>>> On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw 
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath <
>>>>>>> chamik...@google.com> wrote:
>>>>>>>
>>>>>>>> I would say:
>>>>>>>>
>>>>>>>> sink:
>>>>>>>>   type: WriteToParquet
>>>>>>>>   config:
>>>>>>>> path: /beam/filesytem/dest
>>>>>>>> prefix: 
>>>>>>>> suffix: 
>>>>>>>>
>>>>>>>> Underlying SDK will add the middle part of the file names to make
>>>>>>>> sure that files generated by various bundles/windows/shards do not 
>>>>>>>> conflict.
>>>>>>>>
>>>>>>>
>>>>>>> What's the relationship between path and prefix? Is path the
>>>>>>> directory part of the full path, or does prefix precede it?
>>>>>>>
>>>>>>
>>>>> prefix would be the first part of the file name so each shard will be
>>>>> named.
>>>>> /--
>>>>>
>>>>> This is similar to what we do in existing SDKS. For example, Java
>>>>> FileIO,
>>>>>
>>>>>
>>>>> https://github.com/apache/beam/blob/65eaf45026e9eeb61a9e05412488e5858faec6de/sdks/java/core/src/main/java/org/ap

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Chamikara Jayalath via dev
On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw  wrote:

> On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath 
> wrote:
>
>>
>> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax  wrote:
>>
>>> I suspect some simple pattern templating would solve most use cases. We
>>> probably would want to support timestamp formatting (e.g. $ $M $D) as
>>> well.
>>>
>>> On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw 
>>> wrote:
>>>
>>>> On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath 
>>>> wrote:
>>>>
>>>>> I would say:
>>>>>
>>>>> sink:
>>>>>   type: WriteToParquet
>>>>>   config:
>>>>> path: /beam/filesytem/dest
>>>>> prefix: 
>>>>> suffix: 
>>>>>
>>>>> Underlying SDK will add the middle part of the file names to make sure
>>>>> that files generated by various bundles/windows/shards do not conflict.
>>>>>
>>>>
>>>> What's the relationship between path and prefix? Is path the
>>>> directory part of the full path, or does prefix precede it?
>>>>
>>>
>> prefix would be the first part of the file name so each shard will be
>> named.
>> /--
>>
>> This is similar to what we do in existing SDKS. For example, Java FileIO,
>>
>>
>> https://github.com/apache/beam/blob/65eaf45026e9eeb61a9e05412488e5858faec6de/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L187
>>
>
> Yeah, although there's no distinction between path and prefix.
>

Ah, for FIleIO, path comes from the "to" call.

https://github.com/apache/beam/blob/65eaf45026e9eeb61a9e05412488e5858faec6de/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1125


>
>
>>>>
>>>>> This will satisfy the vast majority of use-cases I believe. Fully
>>>>> customizing the file pattern sounds like a more advanced use case that can
>>>>> be left for "real" SDKs.
>>>>>
>>>>
>>>> Yea, we don't have to do everything.
>>>>
>>>>
>>>>> For dynamic destinations, I think just making the "path" component
>>>>> support  a lambda that is parameterized by the input should be adequate
>>>>> since this allows customers to direct files written to different
>>>>> destination directories.
>>>>>
>>>>> sink:
>>>>>   type: WriteToParquet
>>>>>   config:
>>>>> path: 
>>>>> prefix: 
>>>>> suffix: 
>>>>>
>>>>> I'm not sure what would be the best way to specify a lambda here
>>>>> though. Maybe a regex or the name of a Python callable ?
>>>>>
>>>>
>>>> I'd rather not require Python for a pure Java pipeline, but some kind
>>>> of a pattern template may be sufficient here.
>>>>
>>>>
>>>>> On Mon, Oct 9, 2023 at 2:06 PM Robert Bradshaw via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> .On Mon, Oct 9, 2023 at 1:49 PM Reuven Lax  wrote:
>>>>>>
>>>>>>> Just FYI - the reason why names (including prefixes) in
>>>>>>> DynamicDestinations were parameterized via a lambda instead of just 
>>>>>>> having
>>>>>>> the user add it via MapElements is performance. We discussed something
>>>>>>> along the lines of what you are suggesting (essentially having the user
>>>>>>> create a KV where the key contained the dynamic information). The 
>>>>>>> problem
>>>>>>> was that often the size of the generated filepath was often much larger
>>>>>>> (sometimes by 2 OOM) than the information in the record, and there was a
>>>>>>> desire to avoid record blowup. e.g. the record might contain a single
>>>>>>> integer userid, and the filepath prefix would then be
>>>>>>> /long/path/to/output/users/. This was especially bad in cases where 
>>>>>>> the
>>>>>>> data had to be shuffled, and the existing dynamic destinations method
>>>>>>> allowed extracting the filepath only _after_  the shuffle.
>>>>>>>
>>>>>>
>>>>>> That is a considera

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Chamikara Jayalath via dev
On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax  wrote:

> I suspect some simple pattern templating would solve most use cases. We
> probably would want to support timestamp formatting (e.g. $ $M $D) as
> well.
>
> On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw 
> wrote:
>
>> On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath 
>> wrote:
>>
>>> I would say:
>>>
>>> sink:
>>>   type: WriteToParquet
>>>   config:
>>> path: /beam/filesytem/dest
>>> prefix: 
>>> suffix: 
>>>
>>> Underlying SDK will add the middle part of the file names to make sure
>>> that files generated by various bundles/windows/shards do not conflict.
>>>
>>
>> What's the relationship between path and prefix? Is path the
>> directory part of the full path, or does prefix precede it?
>>
>
prefix would be the first part of the file name so each shard will be named.
/--

This is similar to what we do in existing SDKS. For example, Java FileIO,

https://github.com/apache/beam/blob/65eaf45026e9eeb61a9e05412488e5858faec6de/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L187


>
>>
>>> This will satisfy the vast majority of use-cases I believe. Fully
>>> customizing the file pattern sounds like a more advanced use case that can
>>> be left for "real" SDKs.
>>>
>>
>> Yea, we don't have to do everything.
>>
>>
>>> For dynamic destinations, I think just making the "path" component
>>> support  a lambda that is parameterized by the input should be adequate
>>> since this allows customers to direct files written to different
>>> destination directories.
>>>
>>> sink:
>>>   type: WriteToParquet
>>>   config:
>>> path: 
>>> prefix: 
>>> suffix: 
>>>
>>> I'm not sure what would be the best way to specify a lambda here though.
>>> Maybe a regex or the name of a Python callable ?
>>>
>>
>> I'd rather not require Python for a pure Java pipeline, but some kind of
>> a pattern template may be sufficient here.
>>
>>
>>> On Mon, Oct 9, 2023 at 2:06 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> .On Mon, Oct 9, 2023 at 1:49 PM Reuven Lax  wrote:
>>>>
>>>>> Just FYI - the reason why names (including prefixes) in
>>>>> DynamicDestinations were parameterized via a lambda instead of just having
>>>>> the user add it via MapElements is performance. We discussed something
>>>>> along the lines of what you are suggesting (essentially having the user
>>>>> create a KV where the key contained the dynamic information). The problem
>>>>> was that often the size of the generated filepath was often much larger
>>>>> (sometimes by 2 OOM) than the information in the record, and there was a
>>>>> desire to avoid record blowup. e.g. the record might contain a single
>>>>> integer userid, and the filepath prefix would then be
>>>>> /long/path/to/output/users/. This was especially bad in cases where 
>>>>> the
>>>>> data had to be shuffled, and the existing dynamic destinations method
>>>>> allowed extracting the filepath only _after_  the shuffle.
>>>>>
>>>>
>>>> That is a consideration I hadn't thought much of, thanks for
>>>> bringing this up.
>>>>
>>>>
>>>>> Now there may not be any good way to keep this benefit in a
>>>>> declarative approach such as YAML (or at least a good easy way - we could
>>>>> always allow the user to pass in a SQL expression to extract the filename
>>>>> from the record!), but we should keep in mind that this might mean that
>>>>> YAML-generated pipelines will be less efficient for certain use cases.
>>>>>
>>>>
>>>> Yep, it's not as straightforward to do in a declarative way. I would
>>>> like to avoid mixing UDFs (with their associated languages and execution
>>>> environments) if possible. Though I'd like the performance of a
>>>> "straightforward" YAML pipeline to be that which one can get writing
>>>> straight-line Java (and possibly better, if we can leverage the structure
>>>> of schemas everywhere) this is not an absolute requirement for all
>>>> features.
>>>>
>>>> I wonder if 

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Chamikara Jayalath via dev
I would say:

sink:
  type: WriteToParquet
  config:
path: /beam/filesytem/dest
prefix: 
suffix: 

Underlying SDK will add the middle part of the file names to make sure that
files generated by various bundles/windows/shards do not conflict.

This will satisfy the vast majority of use-cases I believe. Fully
customizing the file pattern sounds like a more advanced use case that can
be left for "real" SDKs.

For dynamic destinations, I think just making the "path" component support
a lambda that is parameterized by the input should be adequate since this
allows customers to direct files written to different destination
directories.

sink:
  type: WriteToParquet
  config:
path: 
prefix: 
suffix: 

I'm not sure what would be the best way to specify a lambda here though.
Maybe a regex or the name of a Python callable ?

Thanks,
Cham










On Mon, Oct 9, 2023 at 2:06 PM Robert Bradshaw via dev 
wrote:

> .On Mon, Oct 9, 2023 at 1:49 PM Reuven Lax  wrote:
>
>> Just FYI - the reason why names (including prefixes) in
>> DynamicDestinations were parameterized via a lambda instead of just having
>> the user add it via MapElements is performance. We discussed something
>> along the lines of what you are suggesting (essentially having the user
>> create a KV where the key contained the dynamic information). The problem
>> was that often the size of the generated filepath was often much larger
>> (sometimes by 2 OOM) than the information in the record, and there was a
>> desire to avoid record blowup. e.g. the record might contain a single
>> integer userid, and the filepath prefix would then be
>> /long/path/to/output/users/. This was especially bad in cases where the
>> data had to be shuffled, and the existing dynamic destinations method
>> allowed extracting the filepath only _after_  the shuffle.
>>
>
> That is a consideration I hadn't thought much of, thanks for bringing this
> up.
>
>
>> Now there may not be any good way to keep this benefit in a
>> declarative approach such as YAML (or at least a good easy way - we could
>> always allow the user to pass in a SQL expression to extract the filename
>> from the record!), but we should keep in mind that this might mean that
>> YAML-generated pipelines will be less efficient for certain use cases.
>>
>
> Yep, it's not as straightforward to do in a declarative way. I would like
> to avoid mixing UDFs (with their associated languages and execution
> environments) if possible. Though I'd like the performance of a
> "straightforward" YAML pipeline to be that which one can get writing
> straight-line Java (and possibly better, if we can leverage the structure
> of schemas everywhere) this is not an absolute requirement for all
> features.
>
> I wonder if separating out a constant prefix vs. the dynamic stuff could
> be sufficient to mitigate the blow-up of pre-computing this in most cases
> (especially in the context of a larger pipeline). Alternatively, rather
> than just a sharding pattern, one could have a full filepattern that
> includes format parameters for dynamically computed bits as well as the
> shard number, windowing info, etc. (There are pros and cons to this.)
>
>
>> On Mon, Oct 9, 2023 at 12:37 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Currently the various file writing configurations take a single
>>> parameter, path, which indicates where the (sharded) output should be
>>> placed. In other words, one can write something like
>>>
>>>   pipeline:
>>> ...
>>> sink:
>>>   type: WriteToParquet
>>>   config:
>>> path: /beam/filesytem/dest
>>>
>>> and one gets files like "/beam/filesystem/dest-X-of-N"
>>>
>>> Of course, in practice file writing is often much more complicated than
>>> this (especially when it comes to Streaming). For reference, I've included
>>> links to our existing offerings in the various SDKs below. I'd like to
>>> start a discussion about what else should go in the "config" parameter and
>>> how it should be expressed in YAML.
>>>
>>> The primary concern is around naming. This can generally be split into
>>> (1) the prefix, which must be provided by the users (2) the sharing
>>> information, includes both shard counts (e.g. (the -X-of-N suffix) but also
>>> windowing information (for streaming pipelines) which we may want to allow
>>> the user to customize the formatting of, and (3) a suffix like .json or
>>> .avro that is useful for both humans and tooling and can often be inferred
>>> but should allow customization as well.
>>>
>>> An interesting case is that of dynamic destinations, where the prefix
>>> (or other parameters) may themselves be functions of the records
>>> themselves. (I am excluding the case where the format itself is
>>> variable--such cases are probably better handled by explicitly partitioning
>>> the data and doing multiple writes, as this introduces significant
>>> complexities and the set of 

Re: CoderProviderRegistrar class not found

2023-10-09 Thread Chamikara Jayalath via dev
On Thu, Oct 5, 2023 at 2:05 PM L. C.  wrote:

> I'm getting class not found error while running the word count example on
> Dataproc 2.1 with Beam 2.50.0.   The class exists under the jar. Does
> anyone know how to resolve this?
>
> This is a list of dependency versions:
> 2.50.0
>
> v2-rev20230520-2.0.0
> 2.0.0
> 32.1.2-jre
> 2.1
> 2.14.1
> 2.10.10
> 4.13.1
> 2.4.1
> 26.22.0
> 3.7.0
> 1.6.0
> 3.0.2
> 3.1.0
> 3.7.7
> v1-rev20220904-2.0.0
> 1.7.30
> 3.2.2
> 2.10.2
> 3.0.0-M5
> 0.1
> beam-runners-flink-1.16
>
>
> I used this to build a shaded jar:
> $ mvn compile -Pspark-runner package
>
> Here's the stack trace:
>

Given that this raised NoClassDefFoundError (and
not  ClassNotFoundException) it's possible that the class initialization
failed. Is there another exception before this one (may be at the first
occurrence of NoClassDefFoundError) ?


> Waiting for job output...
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/beam/sdk/coders/CoderProviderRegistrar
> at java.base/java.lang.ClassLoader.defineClass1(Native Method)
> at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
> at
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
> at
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
> at
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
> at
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
> at
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
> at
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1210)
> at
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1221)
> at
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
> at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
> at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
> at
> org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterators.addAll(Iterators.java:366)
> at
> org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Lists.newArrayList(Lists.java:146)
> at
> org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Lists.newArrayList(Lists.java:132)
> at
> org.apache.beam.sdk.coders.CoderRegistry.(CoderRegistry.java:168)
> at org.apache.beam.sdk.Pipeline.getCoderRegistry(Pipeline.java:334)
> at
> org.apache.beam.sdk.values.PCollection.finishSpecifyingOutput(PCollection.java:94)
> at
> org.apache.beam.sdk.runners.TransformHierarchy.setOutput(TransformHierarchy.java:173)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:546)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:479)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:44)
> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:175)
> at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:150)
> at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:134)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:545)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:496)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
> at org.apache.beam.sdk.io.TextIO$Read.expand(TextIO.java:413)
> at org.apache.beam.sdk.io.TextIO$Read.expand(TextIO.java:275)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:545)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:496)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:190)
> at org.apache.beam.examples.WordCount.runWordCount(WordCount.java:201)
> at org.apache.beam.examples.WordCount.main(WordCount.java:213)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> 

Re: [ANNOUNCE] New PMC Member: Alex Van Boxel

2023-10-04 Thread Chamikara Jayalath
Congrats Alex!

On Wed, Oct 4, 2023 at 1:43 AM Jan Lukavský  wrote:

> Congrats Alex!
> On 10/4/23 10:29, Alexey Romanenko wrote:
>
> Congrats Alex, very well deserved!
>
> —
> Alexey
>
> On 4 Oct 2023, at 00:38, Austin Bennett 
>  wrote:
>
> Thanks for all you do, @Alex Van Boxel  !
>
> On Tue, Oct 3, 2023 at 12:50 PM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations!
>>
>> On Tue, Oct 3, 2023 at 3:48 PM Byron Ellis via dev 
>> wrote:
>>
>>> Congrats!
>>>
>>> On Tue, Oct 3, 2023 at 12:40 PM Danielle Syse via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congratulations Alex!! Definitely well deserved!

 On Tue, Oct 3, 2023 at 2:57 PM Ahmet Altay via dev 
 wrote:

> Congratulations Alex! Well deserved!
>
> On Tue, Oct 3, 2023 at 11:54 AM Ritesh Ghorse via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations Alex!
>>
>> On Tue, Oct 3, 2023 at 2:54 PM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congrats Alex, this is well deserved!
>>>
>>> On Tue, Oct 3, 2023 at 2:50 PM Jack McCluskey via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats, Alex!

 On Tue, Oct 3, 2023 at 2:49 PM XQ Hu via dev 
 wrote:

> Configurations, Alex!
>
> On Tue, Oct 3, 2023 at 2:40 PM Kenneth Knowles 
> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming Alex Van
>> Boxel  as our newest PMC member.
>>
>> Alex has been with Beam since 2016, very early in the life of the
>> project. Alex has contributed code, design ideas, and perhaps most
>> importantly been a huge part of organizing Beam Summits, and of 
>> course
>> presenting at them as well. Alex really brings the ASF community 
>> spirit to
>> Beam.
>>
>> Congratulations Alex and thanks for being a part of Apache Beam!
>>
>> Kenn, on behalf of the Beam PMC (which now includes Alex)
>>
>
>


Re: [ANNOUNCE] New PMC Member: Robert Burke

2023-10-04 Thread Chamikara Jayalath
Congrats Rebo!

On Wed, Oct 4, 2023 at 1:42 AM Jan Lukavský  wrote:

> Congrats Robert!
> On 10/4/23 10:29, Alexey Romanenko wrote:
>
> Congrats Robert, very well deserved!
>
> —
> Alexey
>
> On 4 Oct 2023, at 00:39, Austin Bennett 
>  wrote:
>
> Thanks for all you do @Robert Burke  !
>
> On Tue, Oct 3, 2023 at 12:53 PM Ahmed Abualsaud 
> wrote:
>
>> Congrats Rebo!
>>
>> On 2023/10/03 18:39:47 Kenneth Knowles wrote:
>> > Hi all,
>> >
>> > Please join me and the rest of the Beam PMC in welcoming Robert Burke <
>> > lostl...@apache.org> as our newest PMC member.
>> >
>> > Robert has been a part of the Beam community since 2017. He is our
>> resident
>> > Gopher, producing the Go SDK and most recently the local, portable,
>> Prism
>> > runner. Robert has presented on Beam many times, having written not just
>> > core Beam code but quite interesting pipelines too :-)
>> >
>> > Congratulations Robert and thanks for being a part of Apache Beam!
>> >
>> > Kenn, on behalf of the Beam PMC (which now includes Robert)
>> >
>>
>
>


Re: [ANNOUNCE] New PMC Member: Valentyn Tymofieiev

2023-10-04 Thread Chamikara Jayalath
Congrats Valentyn!

On Wed, Oct 4, 2023 at 1:42 AM Jan Lukavský  wrote:

> Congrats Valentyn!
> On 10/4/23 10:26, Alexey Romanenko wrote:
>
> Congrats Valentyn, very well deserved!
>
> —
> Alexey
>
> On 4 Oct 2023, at 00:39, Austin Bennett 
>  wrote:
>
> Thanks for everything @Valentyn Tymofieiev  !
>
> On Tue, Oct 3, 2023 at 12:53 PM Ahmed Abualsaud 
> wrote:
>
>> Congrats Valentyn!
>>
>> On 2023/10/03 18:39:49 Kenneth Knowles wrote:
>> > Hi all,
>> >
>> > Please join me and the rest of the Beam PMC in welcoming Valentyn
>> > Tymofieiev  as our newest PMC member.
>> >
>> > Valentyn has been contributing to Beam since 2017. Notable highlights
>> > include his work on the Python SDK and also in our container management.
>> > Valentyn also is involved in many discussions around Beam's
>> infrastructure
>> > and community processes. If you look through Valentyn's history, you
>> will
>> > see an abundance of the most critical maintenance work that is the
>> beating
>> > heart of any project.
>> >
>> > Congratulations Valentyn and thanks for being a part of Apache Beam!
>> >
>> > Kenn, on behalf of the Beam PMC (which now includes Valentyn)
>> >
>>
>
>


Re: User-facing website vs. contributor-facing website

2023-09-21 Thread Chamikara Jayalath via dev
I might be wrong but I think of wiki as a more volatile and a less reliable
place than the Website (can be updated without a review by any committer
and we do that quite often). I think things in the contribution guide are
key to a healthy Beam community so I'd like them to be in a more stable
place that gets reviewed appropriately when updated.

Thanks,
Cham

On Thu, Sep 21, 2023 at 9:14 AM Danny McCormick via dev 
wrote:

> +1 on moving the release guide. I'd argue that everything under the
> `contribute` tag other than the main page (
> https://beam.apache.org/contribute/) and the link to CONTRIBUTING.md
>  makes more
> sense on the wiki (we can keep the section with the sidebar links just
> redirecting to the wiki). I don't think it makes sense to move anything
> else, but the contributing section is inherently "dev focused".
>
> Thanks,
> Danny
>
> On Thu, Sep 21, 2023 at 11:58 AM Kenneth Knowles  wrote:
>
>> Hello!
>>
>> I am reviving a discussion that began at
>> https://lists.apache.org/thread/w4g8xpg4215nlq86hxbd6n3q7jfnylny when we
>> started our Confluence wiki and has even been revived once before.
>>
>> The conclusion of that thread was basically "yes, let us separate the
>> contributor-facing stuff to a different site". It also was the boot up of
>> the Confluence wiki but I want to not discuss tech/hosting for this thread.
>> I want to focus on the issue of having a separate user-facing website vs a
>> contributor-facing website. Some things like issue priorities are
>> user-and-dev facing and they require review for changes and should stay on
>> the user site. I also don't want to get into those more complex cases.
>>
>> We are basically in a halfway state today because I didn't have enough
>> volunteer time to finish everything and I did not wrangle enough help.
>>
>> So now I am release manager and encountering the docs more closely again.
>> The release docs really blend stuff.
>>
>>   - The main release guide is on the website.
>>  - Some steps, though, are GitHub Issues that we push along from release
>> Milestone to the next one.
>>  - The actual technical bits to do the steps are sometimes on the
>> confluence wiki
>>  - I expect I will also be touching README files in various folders of
>> the repo
>>
>> So I just want to make some more steps, and I wanted to ask the community
>> for their current thoughts. I think one big step could be to move the
>> release guide itself to the dev site, which is currently the wiki.
>>
>> What do you think? Are there any other areas of the website that you
>> think could just move to the wiki today?
>>
>> Kenn
>>
>> p.s. Some time in the past I saw an upper right corner fold (like
>> https://www.istockphoto.com/illustrations/paper-corner-fold) that took
>> you to the dev site that looked the same with different color scheme. That
>> was fun :-)
>>
>


Re: [Request for Feedback] Swift SDK Prototype

2023-09-20 Thread Chamikara Jayalath via dev
on I
>>> believe?
>>>
>>> On Fri, Aug 25, 2023 at 2:04 PM Byron Ellis 
>>> wrote:
>>>
>>>> Okay, after a brief detour through "get this working in the Flink
>>>> Portable Runner" I think I have something pretty workable.
>>>>
>>>> PInput and POutput can actually be structs rather than protocols, which
>>>> simplifies things quite a bit. It also allows us to use them with property
>>>> wrappers for a SwiftUI-like experience if we want when defining DoFns
>>>> (which is what I was originally intending to use them for). That also means
>>>> the function signature you use for closures would match full-fledged DoFn
>>>> definitions for the most part which is satisfying.
>>>>
>>>>
>>>>
>>>> On Thu, Aug 24, 2023 at 5:55 PM Byron Ellis 
>>>> wrote:
>>>>
>>>>> Okay, I tried a couple of different things.
>>>>>
>>>>> Implicitly passing the timestamp and window during iteration did not
>>>>> go well. While physically possible it introduces an invisible side effect
>>>>> into loop iteration which confused me when I tried to use it and I
>>>>> implemented it. Also, I'm pretty sure there'd end up being some sort of
>>>>> race condition nightmare continuing down that path.
>>>>>
>>>>> What I decided to do instead was the following:
>>>>>
>>>>> 1. Rename the existing "pardo" functions to "pstream" and require that
>>>>> they always emit a window and timestamp along with their value. This
>>>>> eliminates the side effect but lets us keep iteration in a bundle where
>>>>> that might be convenient. For example, in my cheesy GCS implementation it
>>>>> means that I can keep an OAuth token around for the lifetime of the bundle
>>>>> as a local variable, which is convenient. It's a bit more typing for users
>>>>> of pstream, but the expectation here is that if you're using pstream
>>>>> functions You Know What You Are Doing and most people won't be using it
>>>>> directly.
>>>>>
>>>>> 2. Introduce a new set of pardo functions (I didn't do all of them
>>>>> yet, but enough to test the functionality and decide I liked it) which 
>>>>> take
>>>>> a function signature of (any PInput,any POutput).
>>>>> PInput takes the (InputType,Date,Window) tuple and converts it into a
>>>>> struct with friendlier names. Not strictly necessary, but makes the code
>>>>> nicer to read I think. POutput introduces emit functions that optionally
>>>>> allow you to specify a timestamp and a window. If you don't for either one
>>>>> it will take the timestamp and/or window of the input.
>>>>>
>>>>> Trying to use that was pretty pleasant to use so I think we should
>>>>> continue down that path. If you'd like to see it in use, I reimplemented
>>>>> map() and flatMap() in terms of this new pardo functionality.
>>>>>
>>>>> Code has been pushed to the branch/PR if you're interested in taking a
>>>>> look.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 24, 2023 at 2:15 PM Byron Ellis 
>>>>> wrote:
>>>>>
>>>>>> Gotcha, I think there's a fairly easy solution to link input and
>>>>>> output streams Let me try it out... might even be possible to have 
>>>>>> both
>>>>>> element and stream-wise closure pardos. Definitely possible to have that 
>>>>>> at
>>>>>> the DoFn level (called SerializableFn in the SDK because I want to
>>>>>> use @DoFn as a macro)
>>>>>>
>>>>>> On Thu, Aug 24, 2023 at 1:09 PM Robert Bradshaw 
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Aug 24, 2023 at 12:58 PM Chamikara Jayalath <
>>>>>>> chamik...@google.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 24, 2023 at 12:27 PM Robert Bradshaw <
>>>>>>>> rober...@google.com> wrote:
>>>>>>>>
>>>>>>>>&

Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-12 Thread Chamikara Jayalath via dev
Thanks Mazlum, this sounds great. I think there are two ways we can proceed
if we decide to integrate the Asgarde library into Beam.

(1) Directly import the code into Beam without significant modifications
and/or a review (though we may add tests).

(2) Go through a design/code review to determine whether this is the best
approach for implementing error handling / DLQ in Beam transforms or
whether there are other alternatives/modifications to Asgarde we want to
consider.

If we do (1) I prefer adding Asgarde as a separate Gradle module in Beam.
We can later integrate it into the core module after a design/code review.

Thank,
Cham



On Tue, Sep 12, 2023 at 10:26 AM Mazlum TOSUN 
wrote:

> Hello Austin and everyone,
>
> I am open for discussion.
>
> My first intention with Asgarde was to help the Beam community, because
> Dead Letter Queue is so important in Beam and all the data pipeline
> frameworks.
> When I worked with Beam on production with my customers, we needed to
> catch errors with side outputs and dead letter queue.
>
> This library really helped us to keep a less verbose code while applying
> all the error handling logic, that is error prone and verbose if it is
> repeated.
>
> As Kennet said, my intention was to stay as close as possible to Beam,
> with a Wrapper and a Failure Monad on top of a PCollection, to handle all
> the code and complexity for try catch blocks and side output.
>
> For the governance, even if I am the creator of this library, the most
> important isn't me but the community and to help the community.
> If the best solution to help the community is including the library
> directly on Beam, we can go in this direction, with of course your reviews
> and recommendations.
>
> Then the library will belong to the community and we will continue to
> improve it.
>
> For the decision about the best place, I will comply with the majority.
>
> Best regards,
>
> Mazlum
>
> On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett  wrote:
>
>> @Mazlum TOSUN  --  you and I have spoken a few
>> times about this.  it'd be good for you to comment here on list, on any of
>> your concerns with governance, and/or other thoughts.  Ex: if you think
>> contributing asgarde directly is the thing [ or perhaps expressing any
>> interest helping write/contribute the relevant functionality into beam ...
>> it is possible that by adding the actual functionality into beam - like
>> Kenn's mentioned 'other place' we could make asgarde as an separate add-on
>> obsolete ].
>>
>>
>>
>> On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles  wrote:
>>
>>> For anyone who hasn't clicked over the Asgarde, my TL;DR description of
>>> it is that it adds the "failure monad" aka "andThen" style error/result
>>> handling on top of chaining of PCollections. So it is at a similar level of
>>> abstraction of our basic transforms and generally useful for chaining
>>> dead-letter side outputs. It is no more or less appropriate for the core
>>> SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we
>>> actually aspired to have a thin core with the accessories like that in
>>> another place, then it should go to that other place.
>>>
>>> Kenn
>>>
>>> On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev <
>>> dev@beam.apache.org> wrote:
>>>
 > until we *require* Asgard on a core transform, it shouldn't be in the
 main repo

 I don't think this is necessarily true if it solves end user use cases.
 If there is a specific transform that solves a specific use case, we could
 include it in the transforms folder for end-users, even if it isn't
 utilized in the I/Os at present. Hence the suggestion to take the most
 promising transforms and propose adding them with documentation, apis and
 rationale.

 -Daniel

 On Fri, Sep 8, 2023 at 11:20 AM Robert Burke 
 wrote:

> I would say until we *require* Asgard on a core transform, it
> shouldn't be in the main repo.
>
> Incorporating something before there's a need for it is premature
> abstraction. We can't do things because they *might* be useful. Let's see
> concrete places where they are useful, or we're already having a similar
> need solved a different way.
>
> Beam is complicated by itself, and we do encourage multiple ways of
> solving problems, but that says to me that having an out of repo ecosystem
> is the right path, rather than incorporation.
>
> On Fri, Sep 8, 2023, 8:14 AM Daniel Collins via dev <
> dev@beam.apache.org> wrote:
>
>> I think there are a lot of interesting and relatively isolated
>> components of the project, it might make sense to write per-transform one
>> pagers for isolated things like the most useful pieces (just basically
>> copying the documentation and justifying the API) instead of doing a
>> one-shot import or having it live forever in an external project.
>>
>> -Daniel
>>
>> On Fri, Sep 

Upgrade transforms without upgrading the pipeline

2023-08-29 Thread Chamikara Jayalath via dev
Hi All,

We recently announced the availability of the Beam Transform Service [1].
One of the features that this will allow us to do is upgrading transforms
of pipelines to new Beam versions without upgrading the full pipeline [2].

I authored a PoC PR that addresses this for Java SDK:
https://github.com/apache/beam/pull/28210

This introduces new pipeline options (in ExternalTranslationOptions) that
allows Beam users to exactly control which transforms to upgrade and to
which Beam version. More specifically,

* transformsToOverride: this accepts a list of URNs that uniquely
identifies the transforms to upgrade.
* transformServiceBeamVersion: this takes a new Beam version.
Implementation will automatically startup a transform service for this Beam
version and will upgrade the transforms identified in the
'transformsToOverride' option to this version.

To implement this, I'm extending the existing "TransformPayloadTranslator"
[3] interface so that transform construction can be performed using a
construction schema (this is partially what schema-aware transforms already
do but this allows us to upgrade existing transforms that do not take
PCollection as input and output).

Please take a look and let me know if you have any comments (here or in the
PR).

Thanks,
Cham

[1] https://lists.apache.org/thread/j0bhcsn7dvdv4wch5rb1z1qbnxmt70r9
[2] https://github.com/apache/beam/issues/27943
[3]
https://github.com/apache/beam/pull/28210/files#diff-58ff54e017947d68975c0c1ce419545c500112afe9b6718b2f5935cb971702dbL512


Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-28 Thread Chamikara Jayalath via dev
+1 (binding)

Validated by running some multi-lang jobs.

Thanks,
Cham

On Mon, Aug 28, 2023 at 10:40 AM Yi Hu via dev  wrote:

> +1 (non-binding)
>
> Verified Java IO load tests (TextIO, BigQuery, Bigtable) on Dataflow
> runner (legacy and V2) using https://github.com/apache/beam/tree/master/it
>
> On Mon, Aug 28, 2023 at 1:13 PM Ahmet Altay via dev 
> wrote:
>
>> +1 (binding).
>>
>> I validated python quick starts on direct and dataflow runners. Thank you
>> for working on the release!
>>
>> On Mon, Aug 28, 2023 at 8:48 AM Robert Burke  wrote:
>>
>>> Good morning!
>>>
>>> RC2 validation and vote is still open!
>>>
>>> On Sun, Aug 27, 2023, 1:28 PM XQ Hu via dev  wrote:
>>>
 +1
 Ran the simple Dataflow ML GPU batch job using
 https://github.com/google/dataflow-ml-starter with Python 2.50.0rc2 to
 validate the RC works well.

 On Sat, Aug 26, 2023 at 12:16 AM Valentyn Tymofieiev via dev <
 dev@beam.apache.org> wrote:

> +1
>
> Verified that the issue detected in RC0 has been resolved.
> Successfully ran a Python pipeline on ARM Dataflow workers.
>
> Noted that Dataflow runner logs became less verbose as the result of
> https://github.com/apache/beam/pull/27788. One line that I often pay
> attention to no longer appears at the default  INFO log level:
>
> ```
> INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-26T03:45:35.126Z:
> JOB_MESSAGE_DETAILED: All workers have finished the startup processes and
> began to receive work requests.
> ```
>
> Dataflow service can be adjusted to compensate for this (internal
> change: http://cl/560265419 ).
>
> On Fri, Aug 25, 2023 at 3:05 PM Bruno Volpato via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non-binding).
>>
>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>> (Java SDK 11, Dataflow runner).
>>
>> Thanks Robert!
>>
>> On Thu, Aug 24, 2023 at 7:12 PM Robert Burke 
>> wrote:
>>
>>> Two minor erata from the previous email:
>>>
>>> The validation spreadsheet link should be:
>>>
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
>>>
>>> And the source code tag is: "v2.50.0-RC2"
>>>
>>> On 2023/08/24 23:09:23 Robert Burke wrote:
>>> > Hi everyone,
>>> > Please review and vote on the release candidate #2 for the version
>>> 2.50.0,
>>> > as follows:
>>> > [ ] +1, Approve the release
>>> > [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> >
>>> >
>>> > Reviewers are encouraged to test their own use cases with the
>>> release
>>> > candidate, and vote +1 if
>>> > no issues are found. Only PMC member votes will count towards the
>>> final
>>> > vote, but votes from all
>>> > community members is encouraged and helpful for finding
>>> regressions; you
>>> > can either test your own
>>> > use cases or use cases from the validation sheet [10].
>>> >
>>> > Issues noted in RC1 vote proposal [13] have now been resolved.
>>> >
>>> > The staging area is available for your review, which includes:
>>> > * GitHub Release notes [1],
>>> > * the official Apache source release to be deployed to
>>> dist.apache.org [2],
>>> > which is signed with the key with fingerprint 02677FF4371A3756 (
>>> > lostl...@apache.org) or D20316F712213422
>>> > (GitHub Action automated) [[3],
>>> > * all artifacts to be deployed to the Maven Central Repository [4],
>>> > * source code tag "v2.50.0-RC2" [5],
>>> > * website pull request listing the release [6], the blog post [6],
>>> and
>>> > publishing the API reference manual [7].
>>> > * Java artifacts were built with Gradle 7.5.1 and OpenJDK
>>> (Temurin)(build
>>> > 1.8.0_382-b05).
>>> > * Python artifacts are deployed along with the source release to
>>> the
>>> > dist.apache.org [2] and PyPI[8].
>>> > * Go artifacts and documentation are available at pkg.go.dev [9]
>>> > * Validation sheet with a tab for 2.50.0 release to help with
>>> validation
>>> > [10].
>>> > * Docker images published to Docker Hub [11].
>>> > * PR to run tests against release branch [12].
>>> >
>>> > The vote will be open for at least 72 hours. It is adopted by
>>> majority
>>> > approval, with at least 3 PMC affirmative votes.
>>> >
>>> > For guidelines on how to try the release in your projects, check
>>> out our
>>> > blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> >
>>> > Thanks,
>>> > Robert Burke
>>> > Apache Beam 2.50.0 Release Manager
>>> >
>>> > [1] https://github.com/apache/beam/milestone/14
>>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
>>> > [3] 

Re: [ANNOUNCE] New committer: Ahmed Abualsaud

2023-08-24 Thread Chamikara Jayalath via dev
Congrats Ahmed!!

On Thu, Aug 24, 2023 at 4:06 PM Bruno Volpato via dev 
wrote:

> Congratulations, Ahmed!
>
> Very well deserved!
>
>
> On Thu, Aug 24, 2023 at 6:09 PM XQ Hu via dev  wrote:
>
>> Congratulations, Ahmed!
>>
>> On Thu, Aug 24, 2023, 5:49 PM Ahmet Altay via dev 
>> wrote:
>>
>>> Hi all,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Ahmed Abualsaud (ahmedabuals...@apache.org).
>>>
>>> Ahmed has been part of the Beam community since January 2022, working
>>> mostly on IO connectors, made a large amount of contributions to make Beam
>>> IOs more usable, performant, and reliable. And at the same time Ahmed was
>>> active in the user list and at the Beam summit helping users by sharing his
>>> knowledge.
>>>
>>> Considering their contributions to the project over this timeframe, the
>>> Beam PMC trusts Ahmed with the responsibilities of a Beam committer. [1]
>>>
>>> Thank you Ahmed! And we are looking to see more of your contributions!
>>>
>>> Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>>
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>
>>>


Re: [Request for Feedback] Swift SDK Prototype

2023-08-24 Thread Chamikara Jayalath via dev
ut and errors are both
>>>>> output streams. In theory you can have as many output streams as you like
>>>>> though at the moment there's a compiler bug in the new type pack feature
>>>>> that limits it to "as many as I felt like supporting". Presumably this 
>>>>> will
>>>>> get fixed before the official 5.9 release which will probably be in the
>>>>> October timeframe if history is any guide)
>>>>>
>>>>> If you had parameterization you wanted to send that would look like
>>>>> pardo("Parameter") { param,filenames,output,error in ... } where "param"
>>>>> would take on the value of "Parameter." All of this is being typechecked 
>>>>> at
>>>>> compile time BTW.
>>>>>
>>>>>
>>>>> the (filename,_,_) is a tuple spreading construct like you have in ES6
>>>>> and other things where "_" is Swift for "ignore." In this case
>>>>> PCollectionStreams have an element signature of (Of,Date,Window) so you 
>>>>> can
>>>>> optionally extract the timestamp and the window if you want to manipulate
>>>>> it somehow.
>>>>>
>>>>> That said it would also be natural to provide elementwise pardos---
>>>>> that would probably mean having explicit type signatures in the closure. I
>>>>> had that at one point, but it felt less natural the more I used it. I'm
>>>>> also slowly working towards adding a more "traditional" DoFn 
>>>>> implementation
>>>>> approach where you implement the DoFn as an object type. In that case it
>>>>> would be very very easy to support both by having a default stream
>>>>> implementation call the equivalent of processElement. To make that
>>>>> performant I need to implement an @DoFn macro and I just haven't gotten to
>>>>> it yet.
>>>>>
>>>>> It's a bit more work and I've been prioritizing implementing composite
>>>>> and external transforms for the reasons you suggest. :-) I've got the
>>>>> basics of a composite transform (there's an equivalent wordcount example)
>>>>> and am hooking it into the pipeline generation, which should also give me
>>>>> everything I need to successfully hook in external transforms as well. 
>>>>> That
>>>>> will give me the jump on IOs as you say. I can also treat the pipeline
>>>>> itself as a composite transform which lets me get rid of the Pipeline {
>>>>> pipeline in ... } and just instead have things attach themselves to the
>>>>> pipeline implicitly.
>>>>>
>>>>> That said, there are some interesting IO possibilities that would be
>>>>> Swift native. In particularly, I've been looking at the native Swift
>>>>> binding for DuckDB (which is C++ based). DuckDB is SQL based but not
>>>>> distributed in the same was as, say, Beam SQL... but it would allow for 
>>>>> SQL
>>>>> statements on individual files with projection pushdown supported for
>>>>> things like Parquet which could have some cool and performant data lake
>>>>> applications. I'll probably do a couple of the simpler IOs as
>>>>> well---there's a Swift AWS SDK binding that's pretty good that would give
>>>>> me S3 and there's a Cloud auth library as well that makes it pretty easy 
>>>>> to
>>>>> work with GCS.
>>>>>
>>>>> In any case, I'm updating the branch as I find a minute here and
>>>>> there.
>>>>>
>>>>> Best,
>>>>> B
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 23, 2023 at 5:02 PM Robert Bradshaw 
>>>>> wrote:
>>>>>
>>>>>> Neat.
>>>>>>
>>>>>> Nothing like writing and SDK to actually understand how the FnAPI
>>>>>> works :). I like the use of groupBy. I have to admit I'm a bit mystified 
>>>>>> by
>>>>>> the syntax for parDo (I don't know swift at all which is probably 
>>>>>> tripping
>>>>>> me up). The addition of external (cross-language) transforms could let 
>>>>>> you
>>>

Re: [Request for Feedback] Swift SDK Prototype

2023-08-17 Thread Chamikara Jayalath via dev
Thanks Byron. This sounds great. I wonder if there is interest in Swift SDK
from folks currently subscribed to the +user  list.

On Wed, Aug 16, 2023 at 6:53 PM Byron Ellis via dev 
wrote:

> Hello everyone,
>
> A couple of months ago I decided that I wanted to really understand how
> the Beam FnApi works and how it interacts with the Portable Runner. For me
> at least that usually means I need to write some code so I can see things
> happening in a debugger and to really prove to myself I understood what was
> going on I decided I couldn't use an existing SDK language to do it since
> there would be the temptation to read some code and convince myself that I
> actually understood what was going on.
>
> One thing led to another and it turns out that to get a minimal FnApi
> integration going you end up writing a fair bit of an SDK. So I decided to
> take things to a point where I had an SDK that could execute a word count
> example via a portable runner backend. I've now reached that point and
> would like to submit my prototype SDK to the list for feedback.
>
> It's currently living in a branch on my fork here:
>
> https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift
>
> At the moment it runs via the most recent XCode Beta using Swift 5.9 on
> Intel Macs, but should also work using beta builds of 5.9 for Linux running
> on Intel hardware. I haven't had a chance to try it on ARM hardware and
> make sure all of the endian checks are complete. The
> "IntegrationTests.swift" file contains a word count example that reads some
> local files (as well as a missing file to exercise DLQ functionality) and
> output counts through two separate group by operations to get it past the
> "map reduce" size of pipeline. I've tested it against the Python Portable
> Runner. Since my goal was to learn FnApi there is no Direct Runner at this
> time.
>
> I've shown it to a couple of folks already and incorporated some of that
> feedback already (for example pardo was originally called dofn when
> defining pipelines). In general I've tried to make the API as "Swift-y" as
> possible, hence the heavy reliance on closures and while there aren't yet
> composite PTransforms there's the beginnings of what would be needed for a
> SwiftUI-like declarative API for creating them.
>
> There are of course a ton of missing bits still to be implemented, like
> counters, metrics, windowing, state, timers, etc.
>

This should be fine and we can get the code documented without these
features. I think support for composites and adding an external transform
(see, Java
,
Python
,
Go
,
TypeScript
)
to add support for multi-lang will bring in a lot of features (for example,
I/O connectors) for free.


>
> Any and all feedback welcome and happy to submit a PR if folks are
> interested, though the "Swift Way" would be to have it in its own repo so
> that it can easily be used from the Swift Package Manager.
>

+1 for creating a PR (may be as a draft initially). Also it'll be easier to
comment on a PR :)

- Cham

[1]
[2]
[3]


>
> Best,
> B
>
>
>


[ANNOUNCE] Transform Service

2023-08-10 Thread Chamikara Jayalath via dev
Hi All,

We recently added a Docker Compose based service named Transform Service to
Beam.

Transform service includes a number of transforms released with Beam and
provides a single endpoint for accessing them via the Beam's multi-language
pipelines framework.

I've updated Beam Java/Python SDKs to automatically use this service to
expand cross-language transforms used by multi-lang pipelines
when possible. This means that Beam pipelines can use cross-language
transforms without installing other language runtimes if they have Docker
(and Docker Compose which comes with Docker) available locally at job
submission. Go SDK updates are in development.

Users also have the option to manually startup a Transform Service with
utilities provided with Beam SDKs if needed.

For mode details regarding the Transform Service please see the
documentation here

.

A list of transforms currently included with the Transform Service is
available here
.

Please see here
 for a
previous discussion on this and please let me know if you have any
questions.

Thanks,
Cham


Re: [VOTE] Vendored Dependency guava 32.1.2-jre Release

2023-08-04 Thread Chamikara Jayalath via dev
+1 (binding)

Verified the signature and checksum of artifacts.

Thanks,
Cham

On Thu, Aug 3, 2023 at 12:02 PM Yi Hu via dev  wrote:

> Hi everyone,
>
>
> Please review the release of the following artifacts that we vendor:
>
>
> * beam-vendor-guava-32_1_2-jre
>
>
> Please review and vote on the release candidate 1 for the version 0.1, as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org
> [1], which is signed with the key with fingerprint 170405CB [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * commit hash "ef2ca7a" [4],
>
> * Testing PR on the vendored dependency [5]
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1350/
> [4]
> https://github.com/apache/beam/commit/ef2ca7aa49ec75c3ab4e3a94b7ad8162e1c81c1e
> [5] https://github.com/apache/beam/pull/27825
>
> Regards,
> Yi
>
> --
>
> Yi Hu, (he/him/his)
>
> Software Engineer
>
>
>


Re: [RFC] Throttle Time Counters

2023-07-31 Thread Chamikara Jayalath via dev
Thanks for writing this. +1 for standardizing (and documenting) these
metrics.

- Cham

On Thu, Jul 27, 2023 at 1:51 PM Yasha Ravindra via dev 
wrote:

> Hello everyone,
>
> Throttle time counters were introduced to give clients the option to self
> regulate when the service is overwhelmed by requests.
> Currently, we have different namespaces for each IO and languages. We
> would instead like to have a dedicated namespace for this counter.
> We have put together a proposal for the same and would appreciate any
> feedback or comments.
>
>
> https://docs.google.com/document/d/1hUufb3L5jURGeFLaQKeQbPlYK-B2wbaLHtqNVETSOOk/edit?usp=sharing
>
>
> Thank you
>
> Warm Regards,
> Yasha Ravindra
>


Re: ByteBuddy ClassLoadingStrategy.Default.INJECTION vs getClassLoadingStrategy

2023-07-21 Thread Chamikara Jayalath via dev
Seems like https://github.com/apache/beam/pull/27606 fixes this (at least
for me locally on Java 11). +Liam Miller-Cushon 

On Fri, Jul 21, 2023 at 9:51 AM Reuven Lax via dev 
wrote:

> Curious why these failing tests didn't block submission.
>

Sounds like we don't run (all ?) unit tests for affected Java versions ?

Thanks,
Cham


>
> For now rollback seems to be the simplest option. However is there a path
> forward on Java 11, or is our model irretrievably broken on Java 11?
>
> On Fri, Jul 21, 2023 at 8:57 AM Kenneth Knowles  wrote:
>
>> This is a tricky situation that I don't know how to resolve best. Here
>> are some pieces of information I know:
>>
>> 1. The reason we put certain generated classes in the same package is
>> because of Java's package-private access restriction. If they are in
>> another package the generated wrapper won't be able to invoke the needed
>> functions. I know this applies to a generated DoFnInvoker. I don't know if
>> it applies here.
>>
>> 2. The current status for Beam is that Beam itself is only
>> expected/required to be able to build with Java 8 and/or produce Java 8
>> compatible bytecode, but users should be able to use it with their own Java
>> 11 or Java 17 code. This makes the testing scenario a bit tricky. We do
>> have tests that model this scenario but they did not catch this I guess.
>>
>> On Mon, Jul 17, 2023 at 1:19 AM Damon Douglas 
>> wrote:
>>
>>> Good day, everyone,
>>>
>>> For clarity, I organize the following into situation, background,
>>> assessment, and proposal.
>>>
>>> Best,
>>>
>>> Damon
>>>
>>> -
>>>
>>> Situation
>>>
>>> Issue #26981 reports an IllegalArgumentException associated with the
>>> ByteBuddy dependency throwing the message " must be defined in
>>> the same package as "[1]. I personally discovered this
>>> error blocking my own Schema-related tests.
>>>
>>> Background
>>>
>>> *1. PR #25578 introduced the error*
>>>
>>> As Issue #26981 reports[1], the error seems to be introduced with 2.48.
>>> Comparing v2.47.0 and v2.48.0[2] reveals that PR #25578 may have introduced
>>> this breaking change[3]. Said PR replaced ByteBuddy's
>>> ClassLoadingStrategy.Default.INJECTION[4] with getClassLoadingStrategy[5].
>>>
>>> *2. Reverting PR #25578 resolves the error*
>>>
>>> To test this hypothesis, I cloned 41e6628 and ran:
>>>
>>> ./gradlew :sdks:java:core:check
>>>
>>> revealing several failing tests (see *Failing :sdks:java:core:check at
>>> 41e6628* below), some of which contained the
>>> familiar IllegalArgumentException " must be defined in the same
>>> package as " message.
>>>
>>> After reverting changes found in #25578, the failing tests and the
>>> IllegalArgumentException were resolved.
>>>
>>> *3. Code related to PR #25578 has a back and forth history*
>>>
>>> There seems to be a back and forth removal and replacement history[6]
>>> between ByteBuddy's ClassLoadingStrategy.Default.INJECTION
>>> and getClassLoadingStrategy most recently PR #25578. Said PR's motivation
>>> is to prepare Beam for Java 17 compatibility, which explains the
>>> re-introduction of the breaking changes.
>>>
>>> *4. PR #25578 GitHub Actions checks pass*
>>>
>>> Examining the GitHub actions run reveals that PR #25578 checks
>>> passed[7]. However, examining the setup[8] more closely reveals that Java
>>> tests are executed using Java Version 8. The same is true in the
>>> latest 41e6628 commit[9].  To test whether the version of Java drives Issue
>>> #26981's error, I submitted a draft PR[10] with the version of Java set to
>>> 11 and found that the same errors resulted[11] as I found on my machine
>>> using the same Java version.
>>>
>>> Assessment
>>>
>>> My main impression is that:
>>>
>>>1. checks did not reveal PR #25578's breaking changes[7] because the
>>>environment[8] used Java 8 instead of 11
>>>2. the back and forth removal and addition of PR #25578's changes
>>>does not solve current and future Java version compatibilities
>>>
>>> Proposal
>>>
>>> May we consider:
>>>
>>>1. If not already planned, set
>>>.github/actions/setup-self-hosted-action/action.yml's Java version[12] to
>>>11.
>>>2. arriving at a consensus regarding PR #25578's breaking changes
>>>and what we need to do today and in the future; I don't have anything
>>>practical to propose or recommend
>>>
>>> References
>>>
>>>1. https://github.com/apache/beam/issues/26981
>>>2. https://github.com/apache/beam/compare/v2.47.0...v2.48.0
>>>3. https://github.com/apache/beam/pull/25578
>>>4.
>>>
>>> https://javadoc.io/static/net.bytebuddy/byte-buddy/1.12.23/net/bytebuddy/dynamic/loading/ClassLoadingStrategy.Default.html#INJECTION
>>>5.
>>>
>>> https://github.com/apache/beam/blob/68e19a596a5d0136ba4592be01888f487463c2f3/sdks/java/core/src/main/java/org/apache/beam/sdk/util/ByteBuddyUtils.java#L32
>>>6.
>>>
>>> 

Re: [VOTE] Release 2.49.0, release candidate #2

2023-07-18 Thread Chamikara Jayalath via dev
Thanks Yi. PMC finalization should be done.

- Cham

On Tue, Jul 18, 2023 at 6:51 AM Yi Hu via dev  wrote:

> Sorry for bothering,
>
> It appears the email delivery issue still exists. In case the last emails
> are not received, check the complete thread here:
> https://lists.apache.org/thread/r7r5q5mq7rqjrfbf8nj90smrdkss0sbf
>
> Still waiting for PMC finalization, mainly deploy the source release from
> staging (https://dist.apache.org/repos/dist/dev/beam/2.49.0/) to release
> (will be https://dist.apache.org/repos/dist/release/beam/2.49.0/). Thanks!
>
> Regards,
> Yi
>
> On Mon, Jul 17, 2023 at 10:27 AM Yi Hu  wrote:
>
>> Could a PMC member please help finalizing the release (
>> https://beam.apache.org/contribute/release-guide/#pmc-only-finalization),
>> mainly deploy the source release from staging (
>> https://dist.apache.org/repos/dist/dev/beam/2.49.0/) to release (will be
>> https://dist.apache.org/repos/dist/release/beam/2.49.0/). Thanks!
>>
>>
>> On Mon, Jul 17, 2023 at 7:28 AM Yi Hu  wrote:
>>
>>> I'm happy to announce that we have unanimously approved this release.
>>>
>>> There are 8 approving votes, 4 of which are binding:
>>> * approver 1: Jan Lukavský
>>> * approver 2: Robert Bradshaw
>>> * approver 3: Chamikara Jayalath
>>> * approver 4: Ahmet Altay
>>>
>>> There are no disapproving votes.
>>>
>>> Thanks everyone!
>>>
>>> Note: there is an ongoing issue such that some reply emails not get
>>> delivered to certain email address (like gmail). Check the complete thread
>>> here: https://lists.apache.org/thread/r7r5q5mq7rqjrfbf8nj90smrdkss0sbf
>>>
>>>


Re: [VOTE] Release 2.49.0, release candidate #2

2023-07-13 Thread Chamikara Jayalath via dev
+1 (binding)

Validated Java/Python multi-lang scenarios and updated the spreadsheet.

Thanks,
Cham

On Thu, Jul 13, 2023 at 12:54 PM Svetak Sundhar via dev 
wrote:

> +1 (Non-Binding)
>
> Python quickstart Dataflow runner.
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Thu, Jul 13, 2023 at 5:03 AM Jan Lukavský  wrote:
>
>> +1 (binding)
>>
>> Tested Java SDK with FlinkRunner.
>>
>>  Jan
>> On 7/13/23 02:30, Bruno Volpato via dev wrote:
>>
>> +1 (non-binding).
>>
>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>> (Java SDK 11, Dataflow runner).
>>
>> Thanks Yi!
>>
>> On Tue, Jul 11, 2023 at 4:23 PM Yi Hu via dev 
>> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version
>>> 2.49.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if
>>> no issues are found. Only PMC member votes will count towards the final
>>> vote, but votes from all
>>> community members is encouraged and helpful for finding regressions; you
>>> can either test your own
>>> use cases or use cases from the validation sheet [10].
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with
>>> fingerprint either CB6974C8170405CB (y...@apache.org) or
>>> D20316F712213422 (GitHub Action automated) [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.49.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle GRADLE_VERSION and
>>> OpenJDK/Oracle JDK JDK_VERSION.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI [8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.49.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> blog post at /blog/validate-beam-release/.
>>>
>>> Thanks,
>>> Release Manager
>>>
>>> [1] https://github.com/apache/beam/milestone/13
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.49.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1349/
>>> [5] https://github.com/apache/beam/tree/v2.49.0-RC2
>>> [6] https://github.com/apache/beam/pull/27374 (unchanged since RC1)
>>> [7] https://github.com/apache/beam-site/pull/646  (unchanged since RC1)
>>> [8] https://pypi.org/project/apache-beam/2.49.0rc2/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.49.0-RC2/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=934901728
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/27307
>>>
>>> --
>>>
>>> Yi Hu, (he/him/his)
>>>
>>> Software Engineer
>>>
>>>
>>>


Re: [VOTE] Release 2.48.0 release candidate #2

2023-05-30 Thread Chamikara Jayalath via dev
Nvm, I was running the Kafka cluster and the job in two different projects.
It's working as expected.

+1 (binding) for the release.

Thanks,
Cham


On Tue, May 30, 2023 at 6:09 PM Chamikara Jayalath 
wrote:

> I'm seeing a potential regression when running Python x-lang Kafka jobs on
> Datafllow.
>
>
> https://pantheon.corp.google.com/dataflow/jobs/us-central1/2023-05-30_16_31_32-1219154560944228293;step=;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;logsSeverity=INFO;graphView=0?project=google.com:clouddfe=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))
>
> "Topic kafka_taxirides_realtime not present in metadata after 6 ms"
>
> Currently not sure if this is due to my Kafka cluster setup or not.
>
> Thanks,
> Cham
>
>
>
>
>
> On Tue, May 30, 2023 at 5:52 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (binding)
>>
>> On Tue, May 30, 2023 at 5:42 PM Robert Bradshaw 
>> wrote:
>>
>>> On Tue, May 30, 2023 at 2:01 PM Ritesh Ghorse via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Thanks Danny and Jack! Dataflow containers are up!
>>>>
>>>> Only PMC votes count but feel free to test your use cases and vote on
>>>> this thread!
>>>>
>>>
>>> While we need at least 3 affirmative PMC votes to formally do a release,
>>> it is definitely the case that all votes are valuable input and are taken
>>> into consideration when deciding to do so.
>>>
>>>
>>>> On Tue, May 30, 2023 at 11:26 AM Alexey Romanenko <
>>>> aromanenko@gmail.com> wrote:
>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Tested with  https://github.com/Talend/beam-samples/
>>>>> (Java SDK v8/v11/v17, Spark 3.x runner).
>>>>>
>>>>> On 27 May 2023, at 19:38, Bruno Volpato via dev 
>>>>> wrote:
>>>>>
>>>>> I was able to check that containers are all there and complete
>>>>> my validation.
>>>>>
>>>>> +1 (non-binding).
>>>>>
>>>>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates (Java
>>>>> SDK 11, Dataflow runner).
>>>>>
>>>>>
>>>>> Thanks Ritesh and Danny!
>>>>>
>>>>> On Fri, May 26, 2023 at 10:09 AM Danny McCormick via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> It looks like some Dataflow containers didn't get published, so some
>>>>>> jobs using the legacy runner (runner v2 disabled) will fail. I kicked off
>>>>>> the container release, so that should hopefully be available later today.
>>>>>>
>>>>>> Thanks,
>>>>>> Danny
>>>>>>
>>>>>> On Thu, May 25, 2023 at 11:19 PM Ritesh Ghorse via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>> Please review and vote on the release candidate #2 for the version
>>>>>>> 2.48.0, as follows:
>>>>>>> [ ] +1, Approve the release
>>>>>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>>>>>
>>>>>>>
>>>>>>> Reviewers are encouraged to test their own use cases with the
>>>>>>> release candidate, and vote +1 if no issues are found. Only PMC member
>>>>>>> votes will count towards the final vote, but votes from all community
>>>>>>> members are encouraged and helpful for finding regressions; you can 
>>>>>>> either
>>>>>>> test your own use cases or use cases from the validation sheet [10].
>>>>>>>
>>>>>>> The complete staging area is available for your review, which
>>>>>>> includes:
>>>>>>> * GitHub Release notes [1],
>>>>>>> * the official Apache source release to be deployed to
>>>>>>> dist.apache.org [2], which is signed with the key with fingerprint
>>>>>>> E4C74BEC861570F5A3E44E46280A0AC32DBAE62B [3],
>>>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>>>> * source code tag "v2.48.0-RC2" [5],
>>>>>>> * website pull request listing the release [6], the blog post [6],
>>>>>>> and publishing the API refer

Re: [VOTE] Release 2.48.0 release candidate #2

2023-05-30 Thread Chamikara Jayalath via dev
I'm seeing a potential regression when running Python x-lang Kafka jobs on
Datafllow.

https://pantheon.corp.google.com/dataflow/jobs/us-central1/2023-05-30_16_31_32-1219154560944228293;step=;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;logsSeverity=INFO;graphView=0?project=google.com:clouddfe=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))

"Topic kafka_taxirides_realtime not present in metadata after 6 ms"

Currently not sure if this is due to my Kafka cluster setup or not.

Thanks,
Cham





On Tue, May 30, 2023 at 5:52 PM Robert Bradshaw via dev 
wrote:

> +1 (binding)
>
> On Tue, May 30, 2023 at 5:42 PM Robert Bradshaw 
> wrote:
>
>> On Tue, May 30, 2023 at 2:01 PM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Thanks Danny and Jack! Dataflow containers are up!
>>>
>>> Only PMC votes count but feel free to test your use cases and vote on
>>> this thread!
>>>
>>
>> While we need at least 3 affirmative PMC votes to formally do a release,
>> it is definitely the case that all votes are valuable input and are taken
>> into consideration when deciding to do so.
>>
>>
>>> On Tue, May 30, 2023 at 11:26 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 +1 (binding)

 Tested with  https://github.com/Talend/beam-samples/
 (Java SDK v8/v11/v17, Spark 3.x runner).

 On 27 May 2023, at 19:38, Bruno Volpato via dev 
 wrote:

 I was able to check that containers are all there and complete
 my validation.

 +1 (non-binding).

 Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates (Java
 SDK 11, Dataflow runner).


 Thanks Ritesh and Danny!

 On Fri, May 26, 2023 at 10:09 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> It looks like some Dataflow containers didn't get published, so some
> jobs using the legacy runner (runner v2 disabled) will fail. I kicked off
> the container release, so that should hopefully be available later today.
>
> Thanks,
> Danny
>
> On Thu, May 25, 2023 at 11:19 PM Ritesh Ghorse via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version
>> 2.48.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>> count towards the final vote, but votes from all community members are
>> encouraged and helpful for finding regressions; you can either test your
>> own use cases or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which
>> includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> E4C74BEC861570F5A3E44E46280A0AC32DBAE62B [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.48.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6],
>> and publishing the API reference manual [7] (to be generated).
>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
>> 8.0.322.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.48.0 release to help with
>> validation [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out
>> our blog post at /blog/validate-beam-release/.
>>
>> *NOTE: Dataflow containers for Python are not finalized yet (likely
>> to happen on tuesday). I will follow up on this thread once that is done.
>> Feel free to test it on other runners until then. *
>>
>> Thanks,
>> Ritesh Ghorse
>>
>> [1] https://github.com/apache/beam/milestone/12
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.48.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1346/
>> [5] https://github.com/apache/beam/tree/v2.48.0-RC2
>> [6] https://github.com/apache/beam/pull/26903
>> [7] https://github.com/apache/beam-site/pull/645
>> [8] https://pypi.org/project/apache-beam/2.48.0rc2/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.48.0-RC2/go/pkg/beam
>> 

Re: Proposal to reduce the steps to make a Java transform portable

2023-05-30 Thread Chamikara Jayalath via dev
Input/output PCollection types at least have to be portable Beam types [1]
for cross-language to work.

I think we restricted schema-aware transforms to PCollection since Row
was expected to be an efficient replacement for arbitrary portable Beam
types (not sure how true that is in practice currently).

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/b9730952a7abf60437ee85ba2df6dd30556d6560/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L829

On Tue, May 30, 2023 at 1:54 PM Byron Ellis  wrote:

> Is it actually necessary for a PTransform that is configured via the
> Schema mechanism to also be one that uses RowCoder? Those strike me as two
> separate concerns and unnecessarily limiting.
>
> On Tue, May 30, 2023 at 1:29 PM Chamikara Jayalath 
> wrote:
>
>> +1 for the simplification.
>>
>> On Tue, May 30, 2023 at 12:33 PM Robert Bradshaw 
>> wrote:
>>
>>> Yeah. Essentially one needs do (1) name the arguments and (2) implement
>>> the transform. Hopefully (1) could be done in a concise way that allows for
>>> easy consumption from both Java and cross-language.
>>>
>>
>> +1 but I think the hard part today is to convert existing PTransforms to
>> be schema-aware transform compatible (for example, change input/output
>> types and make sure parameters take Beam Schema compatible types). But this
>> makes sense for new transforms.
>>
>>
>>
>>> On Tue, May 30, 2023 at 12:25 PM Byron Ellis 
>>> wrote:
>>>
>>>> Or perhaps the other way around? If you have a Schema we can
>>>> auto-generate the associated builder on the PTransform? Either way, more
>>>> DRY.
>>>>
>>>> On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> +1 to this simplification, it's a historical artifact that provides no
>>>>> value.
>>>>>
>>>>> I would love it if we also looked into ways to auto-generate the
>>>>> SchemaTransformProvider (e.g. via introspection if a transform takes a
>>>>> small number of arguments, or uses the standard builder pattern...),
>>>>> ideally with something as simple as adding a decorator to the PTransform
>>>>> itself.
>>>>>
>>>>>
>>>>> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> I was looking at how we use SchemaTransforms in our expansion
>>>>>> service. From what I see, there may be a redundant step in developing
>>>>>> SchemaTransforms. Currently, we have 3 pieces:
>>>>>> - SchemaTransformProvider [1]
>>>>>> - A configuration object
>>>>>> - SchemaTransform [2]
>>>>>>
>>>>>> The API is generally used like this:
>>>>>> 1. The SchemaTransformProvider takes a configuration object and
>>>>>> returns a SchemaTransform
>>>>>> 2. The SchemaTransform is used to build a PTransform according to the
>>>>>> configuration
>>>>>>
>>>>>> In these steps, the SchemaTransform class seems unnecessary. We can
>>>>>> combine the two steps if we have SchemaTransformProvider return the
>>>>>> PTransform directly.
>>>>>>
>>>>>> We can then remove the SchemaTransform class as it will be obsolete.
>>>>>> This should be safe to do; the only place it's used in our API is here 
>>>>>> [3],
>>>>>> and that can be simplified if we make this change (we'd just trim `
>>>>>> .buildTransform()` off the end as `provider.from(configRow)` will
>>>>>> directly return the PTransform).
>>>>>>
>>>>>> I'd like to first mention that I was not involved in the design
>>>>>> process of this API so I may be missing some information on why it was 
>>>>>> set
>>>>>> up this way.
>>>>>>
>>>>>> A few developers already raised questions about how there's seemingly
>>>>>> unnecessary boilerplate involved in making a Java transform portable. I
>>>>>> wasn't involved in the design process of this API so I may be missing 
>>>>>> some
>>>>>> information, but my assumption is this was designed to follow the patt

Re: Proposal to reduce the steps to make a Java transform portable

2023-05-30 Thread Chamikara Jayalath via dev
+1 for the simplification.

On Tue, May 30, 2023 at 12:33 PM Robert Bradshaw 
wrote:

> Yeah. Essentially one needs do (1) name the arguments and (2) implement
> the transform. Hopefully (1) could be done in a concise way that allows for
> easy consumption from both Java and cross-language.
>

+1 but I think the hard part today is to convert existing PTransforms to be
schema-aware transform compatible (for example, change input/output types
and make sure parameters take Beam Schema compatible types). But this makes
sense for new transforms.



> On Tue, May 30, 2023 at 12:25 PM Byron Ellis 
> wrote:
>
>> Or perhaps the other way around? If you have a Schema we can
>> auto-generate the associated builder on the PTransform? Either way, more
>> DRY.
>>
>> On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 to this simplification, it's a historical artifact that provides no
>>> value.
>>>
>>> I would love it if we also looked into ways to auto-generate the
>>> SchemaTransformProvider (e.g. via introspection if a transform takes a
>>> small number of arguments, or uses the standard builder pattern...),
>>> ideally with something as simple as adding a decorator to the PTransform
>>> itself.
>>>
>>>
>>> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hey everyone,

 I was looking at how we use SchemaTransforms in our expansion service.
 From what I see, there may be a redundant step in developing
 SchemaTransforms. Currently, we have 3 pieces:
 - SchemaTransformProvider [1]
 - A configuration object
 - SchemaTransform [2]

 The API is generally used like this:
 1. The SchemaTransformProvider takes a configuration object and returns
 a SchemaTransform
 2. The SchemaTransform is used to build a PTransform according to the
 configuration

 In these steps, the SchemaTransform class seems unnecessary. We can
 combine the two steps if we have SchemaTransformProvider return the
 PTransform directly.

 We can then remove the SchemaTransform class as it will be obsolete.
 This should be safe to do; the only place it's used in our API is here [3],
 and that can be simplified if we make this change (we'd just trim `
 .buildTransform()` off the end as `provider.from(configRow)` will
 directly return the PTransform).

 I'd like to first mention that I was not involved in the design process
 of this API so I may be missing some information on why it was set up this
 way.

 A few developers already raised questions about how there's seemingly
 unnecessary boilerplate involved in making a Java transform portable. I
 wasn't involved in the design process of this API so I may be missing some
 information, but my assumption is this was designed to follow the pattern
 of the previous iteration of this API (SchemaIO): SchemaIOProvider[4] ->
 SchemaIO[5] -> PTransform. However, with the newer
 SchemaTransformProvider API, we dropped a few methods and reduced the
 SchemaTransform class to have a generic buildTransform() method. See the
 example of PubsubReadSchemaTransformProvider [6], where the
 SchemaTransform interface and buildTransform method are implemented
 just to satisfy the requirement that SchemaTransformProvider::from
 return a SchemaTransform.

 I'm bringing this up because if we are looking to encourage
 contribution to cross-language use cases, we should make it simpler and
 less convoluted to develop portable transforms.

 There are a number of SchemaTransforms already developed, but applying
 these changes to them should be straightforward. If people think this is a
 good idea, I can open a PR and implement them.

 Best,
 Ahmed

 [1]
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java
 [2]
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransform.java
 [3]
 https://github.com/apache/beam/blob/d7ded3f07064919c202c81a2c786910e20a834f9/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java#L138
 [4]
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java
 [5]
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java
 [6]
 https://github.com/apache/beam/blob/ed1a297904d5f5c743a6aca1a7648e3fb8f02e18/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadSchemaTransformProvider.java#L133-L137

>>>


Re: [Release-2.48.0] Uploading images to dockerhub

2023-05-24 Thread Chamikara Jayalath via dev
Thanks Danny!

On Wed, May 24, 2023 at 9:04 AM Danny McCormick 
wrote:

> I pushed the missing containers as well, so we should be good to go.
>
> I also created a cherry pick PR, that only needs to be merged if we make
> it to RC2 though (hopefully not!) -
> https://github.com/apache/beam/pull/26871
>
> Thanks,
> Danny
>
> On Wed, May 24, 2023 at 11:48 AM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> https://github.com/apache/beam/pull/26869 should fix
>>
>> On Wed, May 24, 2023 at 11:45 AM Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Wed, May 24, 2023 at 8:41 AM Danny McCormick <
>>> dannymccorm...@google.com> wrote:
>>>
>>>> This has finished. One thing I did notice: we requested that Infra
>>>> create repos for some transform service docker images (context
>>>> <https://issues.apache.org/jira/browse/INFRA-24629>), but those didn't
>>>> seem to get published. @Chamikara Jayalath  should
>>>> they be added to
>>>> https://github.com/apache/beam/blob/dcf0c0f88054e149a34bb39510bdcab0581da982/build.gradle.kts#L595
>>>> or are they not getting published yet?
>>>>
>>>
>>> Ah, yes.  I'll update that target.
>>>
>>>>
>>>> We can manually publish them this time if needed. I also don't think
>>>> this should block us from getting an RC out for validation though since it
>>>> just breaks a small set of use cases which we can validate once the
>>>> containers are pushed if needed.
>>>>
>>>
>>> It's great if you can  manually push. Agree that this should not be
>>> a blocker. Repositories were already created by INFRA.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>> Thanks,
>>>> Danny
>>>>
>>>> On Wed, May 24, 2023 at 9:35 AM Danny McCormick <
>>>> dannymccorm...@google.com> wrote:
>>>>
>>>>> I'm currently in the process of publishing the containers, I will let
>>>>> you know when it completes.
>>>>>
>>>>> Thanks,
>>>>> Danny
>>>>>
>>>>> On Tue, May 23, 2023 at 8:09 PM Ritesh Ghorse via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> I'm at the stage of pushing docker containers to the apache
>>>>>> repository of [Dockerhub](
>>>>>> https://hub.docker.com/search?q=apache%2Fbeam=image). Since I'm
>>>>>> not a part of the `beammaintainers` group, I'm getting permission denied.
>>>>>>
>>>>>> For the people in the `beammaintainers` group, please let me know how
>>>>>> to proceed with this one.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>


Re: [Release-2.48.0] Uploading images to dockerhub

2023-05-24 Thread Chamikara Jayalath via dev
On Wed, May 24, 2023 at 8:41 AM Danny McCormick 
wrote:

> This has finished. One thing I did notice: we requested that Infra create
> repos for some transform service docker images (context
> <https://issues.apache.org/jira/browse/INFRA-24629>), but those didn't
> seem to get published. @Chamikara Jayalath  should
> they be added to
> https://github.com/apache/beam/blob/dcf0c0f88054e149a34bb39510bdcab0581da982/build.gradle.kts#L595
> or are they not getting published yet?
>

Ah, yes.  I'll update that target.

>
> We can manually publish them this time if needed. I also don't think this
> should block us from getting an RC out for validation though since it just
> breaks a small set of use cases which we can validate once the containers
> are pushed if needed.
>

It's great if you can  manually push. Agree that this should not be
a blocker. Repositories were already created by INFRA.

Thanks,
Cham


> Thanks,
> Danny
>
> On Wed, May 24, 2023 at 9:35 AM Danny McCormick 
> wrote:
>
>> I'm currently in the process of publishing the containers, I will let you
>> know when it completes.
>>
>> Thanks,
>> Danny
>>
>> On Tue, May 23, 2023 at 8:09 PM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey everyone,
>>>
>>> I'm at the stage of pushing docker containers to the apache repository
>>> of [Dockerhub](https://hub.docker.com/search?q=apache%2Fbeam=image).
>>> Since I'm not a part of the `beammaintainers` group, I'm getting permission
>>> denied.
>>>
>>> For the people in the `beammaintainers` group, please let me know how to
>>> proceed with this one.
>>>
>>> Thanks!
>>>
>>


Re: [VOTE] Release 2.47.0, release candidate #3

2023-05-09 Thread Chamikara Jayalath via dev
Verified that new containers are valid. Changing my vote to +1

Thanks for fixing this Jack.

- Cham

On Mon, May 8, 2023 at 2:05 PM Jack McCluskey 
wrote:

> I've spent the day putting together an environment on a debian bullseye
> container to re-build containers with a matching Glibc version. The Java,
> Go, Python, and Typescript containers have all been re-built and pushed to
> Docker Hub. The underlying code did not change, which fortunately means we
> can dodge having to build an RC4 to fix this issue.
>
> The GCR copy of the Go container has already been updated, while the Java
> and Python containers are currently being copied over.
>
> On Mon, May 8, 2023 at 11:16 AM Robert Bradshaw 
> wrote:
>
>> Thanks for catching this. This does seem severe enough that we need to
>> fix it before the release.
>>
>> On Sat, May 6, 2023 at 10:15 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Seems like Python SDK harness containers built for the current RC are
>>> broken. Please see https://github.com/apache/beam/issues/26576 for
>>> updates.
>>>
>>> -1 for the current vote due to this.
>>>
>>> Seems like this can be addressed by reverting
>>> https://github.com/apache/beam/pull/26054 and re-building the
>>> containers.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Sat, May 6, 2023 at 8:00 AM Svetak Sundhar 
>>> wrote:
>>>
>>>> +1 (Non-Binding)
>>>>
>>>> I tested Python Quick Start on Dataflow Runner as well
>>>>
>>>>
>>>>
>>>> Svetak Sundhar
>>>>
>>>>   Technical Solutions Engineer, Data
>>>> s vetaksund...@google.com
>>>>
>>>>
>>>>
>>>> On Sat, May 6, 2023 at 4:44 AM Chamikara Jayalath via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> I'm seeing a regression when running Java x-lang jobs using the RC.
>>>>> Created https://github.com/apache/beam/issues/26576.
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> On Fri, May 5, 2023 at 11:11 PM Austin Bennett 
>>>>> wrote:
>>>>>
>>>>>> +1 ( non-binding )
>>>>>>
>>>>>> On Fri, May 5, 2023 at 10:49 PM Jean-Baptiste Onofré 
>>>>>> wrote:
>>>>>>
>>>>>>> +1 (binding)
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On Fri, May 5, 2023 at 4:52 AM Jack McCluskey via dev <
>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Please review and vote on the release candidate #3 for the version
>>>>>>>> 2.47.0, as follows:
>>>>>>>> [ ] +1, Approve the release
>>>>>>>> [ ] -1, Do not approve the release (please provide specific
>>>>>>>> comments)
>>>>>>>>
>>>>>>>> Reviewers are encouraged to test their own use cases with the
>>>>>>>> release candidate, and vote +1 if no issues are found. *Non-PMC
>>>>>>>> members are allowed and encouraged to vote. Please help validate the
>>>>>>>> release for your use case!*
>>>>>>>>
>>>>>>>> The complete staging area is available for your review, which
>>>>>>>> includes:
>>>>>>>> * GitHub Release notes [1],
>>>>>>>> * the official Apache source release to be deployed to
>>>>>>>> dist.apache.org [2], which is signed with the key with fingerprint
>>>>>>>> DF3CBA4F3F4199F4 [3],
>>>>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>>>>> * source code tag "v2.47.0-RC3" [5],
>>>>>>>> * website pull request listing the release [6], the blog post [6],
>>>>>>>> and publishing the API reference manual [7].
>>>>>>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle
>>>>>>>> JDK 8.0.322.
>>>>>>>> * Python artifacts are deployed along with the source release to
>>>>>>>> the dist.apache.org [2] and PyPI[8].
>&g

Re: [VOTE] Release 2.47.0, release candidate #3

2023-05-06 Thread Chamikara Jayalath via dev
Seems like Python SDK harness containers built for the current RC are
broken. Please see https://github.com/apache/beam/issues/26576 for updates.

-1 for the current vote due to this.

Seems like this can be addressed by reverting
https://github.com/apache/beam/pull/26054 and re-building the containers.

Thanks,
Cham

On Sat, May 6, 2023 at 8:00 AM Svetak Sundhar 
wrote:

> +1 (Non-Binding)
>
> I tested Python Quick Start on Dataflow Runner as well
>
>
>
> Svetak Sundhar
>
>   Technical Solutions Engineer, Data
> s vetaksund...@google.com
>
>
>
> On Sat, May 6, 2023 at 4:44 AM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> I'm seeing a regression when running Java x-lang jobs using the RC.
>> Created https://github.com/apache/beam/issues/26576.
>>
>> Thanks,
>> Cham
>>
>> On Fri, May 5, 2023 at 11:11 PM Austin Bennett  wrote:
>>
>>> +1 ( non-binding )
>>>
>>> On Fri, May 5, 2023 at 10:49 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Fri, May 5, 2023 at 4:52 AM Jack McCluskey via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Please review and vote on the release candidate #3 for the version
>>>>> 2.47.0, as follows:
>>>>> [ ] +1, Approve the release
>>>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>>>
>>>>> Reviewers are encouraged to test their own use cases with the release
>>>>> candidate, and vote +1 if no issues are found. *Non-PMC members are
>>>>> allowed and encouraged to vote. Please help validate the release for your
>>>>> use case!*
>>>>>
>>>>> The complete staging area is available for your review, which includes:
>>>>> * GitHub Release notes [1],
>>>>> * the official Apache source release to be deployed to dist.apache.org 
>>>>> [2],
>>>>> which is signed with the key with fingerprint DF3CBA4F3F4199F4 [3],
>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>> * source code tag "v2.47.0-RC3" [5],
>>>>> * website pull request listing the release [6], the blog post [6], and
>>>>> publishing the API reference manual [7].
>>>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
>>>>> 8.0.322.
>>>>> * Python artifacts are deployed along with the source release to the
>>>>> dist.apache.org [2] and PyPI[8].
>>>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>>>> * Validation sheet with a tab for 2.47.0 release to help with
>>>>> validation [10].
>>>>> * Docker images published to Docker Hub [11].
>>>>> * PR to run tests against release branch [12].
>>>>>
>>>>> The vote will be open for at least 72 hours. It is adopted by majority
>>>>> approval, with at least 3 PMC affirmative votes.
>>>>>
>>>>> The GCR copies of the FnAPI containers are rolling out now, they
>>>>> should be out within the next 8 hours or so.
>>>>>
>>>>> For guidelines on how to try the release in your projects, check out
>>>>> our blog post at /blog/validate-beam-release/.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jack McCluskey
>>>>>
>>>>> [1] https://github.com/apache/beam/milestone/10
>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.47.0/
>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>>> [4]
>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1322/
>>>>> [5] https://github.com/apache/beam/tree/v2.47.0-RC3
>>>>> [6] https://github.com/apache/beam/pull/26439
>>>>> [7] https://github.com/apache/beam-site/pull/644
>>>>> [8] https://pypi.org/project/apache-beam/2.47.0rc3/
>>>>> [9]
>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.47.0-RC3/go/pkg/beam
>>>>> [10]
>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=.
>>>>> ..
>>>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>>> [12] https://github.com/apache/beam/pull/26152
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Jack McCluskey
>>>>> SWE - DataPLS PLAT/ Dataflow ML
>>>>> RDU
>>>>> jrmcclus...@google.com
>>>>>
>>>>>
>>>>>


Re: [VOTE] Release 2.47.0, release candidate #3

2023-05-06 Thread Chamikara Jayalath via dev
I'm seeing a regression when running Java x-lang jobs using the RC. Created
https://github.com/apache/beam/issues/26576.

Thanks,
Cham

On Fri, May 5, 2023 at 11:11 PM Austin Bennett  wrote:

> +1 ( non-binding )
>
> On Fri, May 5, 2023 at 10:49 PM Jean-Baptiste Onofré 
> wrote:
>
>> +1 (binding)
>>
>> Regards
>> JB
>>
>> On Fri, May 5, 2023 at 4:52 AM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #3 for the version
>>> 2.47.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found. *Non-PMC members are
>>> allowed and encouraged to vote. Please help validate the release for your
>>> use case!*
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org [2],
>>> which is signed with the key with fingerprint DF3CBA4F3F4199F4 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.47.0-RC3" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
>>> 8.0.322.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.47.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> The GCR copies of the FnAPI containers are rolling out now, they should
>>> be out within the next 8 hours or so.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> blog post at /blog/validate-beam-release/.
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> [1] https://github.com/apache/beam/milestone/10
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.47.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1322/
>>> [5] https://github.com/apache/beam/tree/v2.47.0-RC3
>>> [6] https://github.com/apache/beam/pull/26439
>>> [7] https://github.com/apache/beam-site/pull/644
>>> [8] https://pypi.org/project/apache-beam/2.47.0rc3/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.47.0-RC3/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=.
>>> ..
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/26152
>>>
>>> --
>>>
>>>
>>> Jack McCluskey
>>> SWE - DataPLS PLAT/ Dataflow ML
>>> RDU
>>> jrmcclus...@google.com
>>>
>>>
>>>


Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-27 Thread Chamikara Jayalath via dev
I tried to run a Java multi-lang pipeline and it's failing due to the
following error during worker setup.

Error syncing pod, skipping" err="failed to \"StartContainer\" for
\"sdk-1-0\" with ImagePullBackOff: \"Back-off pulling image \\\"
gcr.io/cloud-dataflow/v1beta3/beam_python3.8_sdk:2.47.0\\\"\""
pod="default/df-runinferenceexample-chami-04271607-gwf8-harness-vj8w"
podUID=37d8de0a068391920b98dce559c4886f

Are these containers not available yet to test Dataflow ?

Thanks,
Cham

On Thu, Apr 27, 2023 at 2:17 PM Robert Bradshaw via dev 
wrote:

> The artifacts and signatures all look good, and I validated a couple of
> Python pipelines in a fresh install.
>
> Assuming all the tests (including the Dataflow ones) pass (modulo the two
> mentioned above; seems a fair justification to not block on those) I'm +1
> (binding) on this release.
>
> On Wed, Apr 26, 2023 at 12:39 PM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> There's also a good chance that newer test suites haven't been included
>> in mass_comment.py (
>> https://github.com/apache/beam/blob/master/release/src/main/scripts/mass_comment.py)
>> and as a result they were not executed.
>>
>> On Wed, Apr 26, 2023 at 3:29 PM Jack McCluskey 
>> wrote:
>>
>>> The Dataflow CrossLanguageValidatesRunner GoUsingJava Tests have been
>>> broken for quite some time (https://github.com/apache/beam/issues/21645)
>>> and the Kafka issue is tied to a test timeout that John Casey has fixed but
>>> didn't get cherrypicked (just fell through the cracks while waiting on
>>> tests to pass, but conversations with them led to the conclusion that we
>>> would just get it into an RC2 if necessary since it's a matter of how the
>>> tests run not how the code under test functions.)
>>>
>>> The tests still marked "pending" passed but did not get updated on the
>>> GitHub side from when Jenkins was straining under load, I'm guessing those
>>> builds have since been deleted under our new retention policy to
>>> alleviate the OOM Jenkins issues. I will try to re-run those for the sake
>>> of having clear and obvious results.
>>>
>>> On Wed, Apr 26, 2023 at 3:23 PM Valentyn Tymofieiev 
>>> wrote:
>>>
 Thanks, Jack!

 re [12]:

 I am seeing some test errors - have they been investigated?
 Also, did all test suites run? I think I am not seeing output of some
 of the suites, like

 Run Python Dataflow V2 ValidatesRunner



 On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev <
 dev@beam.apache.org> wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version
> 1.2.3, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint DF3CBA4F3F4199F4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.47.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
> 8.0.322.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> * Validation sheet with a tab for 2.47.0 release to help with
> validation [10].
> * Docker images published to Docker Hub [11].
> * PR to run tests against release branch [12].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out
> our blog post at /blog/validate-beam-release/.
>
> *Note: Dataflow containers for Java are still being finalized. I will
> follow up once that is completed; however, this should not block 
> validation
> for other SDKs and runners. *
>
> Thanks,
>
> Jack McCluskey
>
> [1] https://github.com/apache/beam/milestone/10
> [2] https://dist.apache.org/repos/dist/dev/beam/2.47.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1309/
> [5] https://github.com/apache/beam/tree/v2.47.0-RC1
> [6] https://github.com/apache/beam/pull/26439
> [7] https://github.com/apache/beam-site/pull/644
> [8] https://pypi.org/project/apache-beam/2.47.0rc1/
> [9]
> 

Re: [ANNOUNCE] New committer: Damon Douglas

2023-04-24 Thread Chamikara Jayalath via dev
Congrats Damon!

On Mon, Apr 24, 2023 at 1:03 PM Ahmet Altay via dev 
wrote:

> Congratulations Damon!
>
> On Mon, Apr 24, 2023 at 1:00 PM Robert Burke  wrote:
>
>> Congratulations Damon!!!
>>
>> On Mon, Apr 24, 2023, 12:52 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Damon Douglas (damondoug...@apache.org)
>>>
>>> Damon has contributed widely: Beam Katas, playground, infrastructure,
>>> and many IO connectors. Damon does lots of code review in addition to code.
>>> (yes, you can review code as a non-committer!)
>>>
>>> Considering their contributions to the project over this timeframe, the
>>> Beam PMC trusts Damon with the responsibilities of a Beam committer. [1]
>>>
>>> Thank you Damon! And we are looking to see more of your contributions!
>>>
>>> Kenn, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>>
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>
>>


Re: Testing a pipeline with external transform

2023-04-24 Thread Chamikara Jayalath via dev
On Mon, Apr 24, 2023 at 11:06 AM Sahith Nallapareddy 
wrote:

> Hello,
>
> Ah I missed that thank you! I am assuming for java jobs with java external
> transform, we run a Java Job Service?
>

No, the job service is tied to the runner not to the pipeline SDK. The Java
quickstart guide gives instructions for starting up a job service for
Python portable runner for running Java jobs.

- Cham



>
> Thanks,
>
> Sahith
>
> On Mon, Apr 24, 2023 at 2:01 PM Chamikara Jayalath 
> wrote:
>
>> Have you looked into quickstart guides (Java [1], Python [2]) which also
>> gives instructions for running using DirectRunner ?
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/
>> [2]
>> https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/
>>
>> On Mon, Apr 24, 2023 at 10:57 AM Sahith Nallapareddy via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hello,
>>>
>>> I was wondering the best way to test a pipeline with external transform.
>>> We were trying to use the DirectRunner but it seemed to hang on the
>>> external transform step. I was wondering if someone could point me to
>>> somewhere in the beam code where this is done or give some pointers on how
>>> to test a pipeline with an External step in it? I tried searching a bit but
>>> was unsuccessful, but maybe I was looking in the wrong place.
>>>
>>> Thanks,
>>>
>>> Sahith
>>>
>>


Re: Testing a pipeline with external transform

2023-04-24 Thread Chamikara Jayalath via dev
Have you looked into quickstart guides (Java [1], Python [2]) which also
gives instructions for running using DirectRunner ?

Thanks,
Cham

[1]
https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/
[2]
https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/

On Mon, Apr 24, 2023 at 10:57 AM Sahith Nallapareddy via dev <
dev@beam.apache.org> wrote:

> Hello,
>
> I was wondering the best way to test a pipeline with external transform.
> We were trying to use the DirectRunner but it seemed to hang on the
> external transform step. I was wondering if someone could point me to
> somewhere in the beam code where this is done or give some pointers on how
> to test a pipeline with an External step in it? I tried searching a bit but
> was unsuccessful, but maybe I was looking in the wrong place.
>
> Thanks,
>
> Sahith
>


[ANNOUNCE] New committer: Anand Inguva

2023-04-21 Thread Chamikara Jayalath
Hi all,

Please join me and the rest of the Beam PMC in welcoming a new committer: Anand
Inguva (ananding...@apache.org)

Anand has been contributing to Apache Beam for more than a year and
authored and reviewed more than 100 PRs. Anand has been a core contributor
to Beam Python SDK and drove the efforts to support Python 3.10 and Python
3.11.

Considering their contributions to the project over this timeframe, the
Beam PMC trusts Anand with the responsibilities of a Beam committer. [1]

Thank you Anand! And we are looking to see more of your contributions!

Cham, on behalf of the Apache Beam PMC

[1]
https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
committer


Re: [VOTE] Vendored Dependencies Release

2023-04-17 Thread Chamikara Jayalath via dev
+1

Thanks,
Cham

On Mon, Apr 17, 2023 at 11:04 AM Kenneth Knowles  wrote:

> +1
>
> On Fri, Apr 14, 2023 at 1:30 PM Yi Hu via dev  wrote:
>
>> Please review the release of the following artifacts that we vendor:
>>
>>  * beam-vendor-grpc-1_54_0
>>
>>
>>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version 0.1,
>> as follows:
>>
>> [ ] +1, Approve the release
>>
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>>
>> * the official Apache source release to be deployed to dist.apache.org
>> [1], which is signed with the key with fingerprint
>> 2011EC936303D9A1DB662EE1CB6974C8170405CB [2],
>>
>> * all artifacts to be deployed to the Maven Central Repository [3],
>>
>> * commit hash "a38d9b94a738e4c488e7339ae3710fd5e1dc119e" [4],
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>>
>> Release Manager
>>
>> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>>
>> [2] https://dist.apache.org/repos/dist/dev/beam/KEYS
>>
>> [3]
>> https://repository.apache.org/content/repositories/orgapachebeam-1308/
>>
>> [4]
>> https://github.com/apache/beam/commit/a38d9b94a738e4c488e7339ae3710fd5e1dc119e
>>
>>
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>> 919-641-8436 <(919)%20641-8436>
>>
>>


Re: [DISCUSS] @Experimental, @Internal, @Stable, etc annotations

2023-04-14 Thread Chamikara Jayalath via dev
I think we've been using the Java Experimental tags in two ways.

* New APIs
* Any APIs that use specific features identified by pre-defined
experimental Kind types defined in [1] (for example, I/O connectors APIs
that use Beam Schemas).

Removing the experimental tag has the effect of finalizing a number of APIs
we've been reluctant to call stable (for example, Beam Schemas,
portability, metrics related APIs). These APIs have been around for a long
time and I don't see them changing so probably this is the right thing to
do. But I just wanted to call it out.

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/b9f27f9da2e63b564feecaeb593d7b12783192b0/sdks/java/core/src/main/java/org/apache/beam/sdk/annotations/Experimental.java#L48

On Fri, Apr 14, 2023 at 1:26 PM Ahmet Altay via dev 
wrote:

>
>
> On Fri, Apr 14, 2023 at 1:15 PM Kenneth Knowles  wrote:
>
>>
>> Thanks for the discussion. Many good points. Probably just removing all
>> the annotations is a noop to users, and will solve the "afraid to use
>> experimental features" problem.
>>
>> Regarding stability, the capabilities of Java (and Python is much much
>> worse) make it infeasible to produce quality software with the rule "once
>> it is public it is frozen forever". But on the other hand, there isn't much
>> of a practical alternative. Most projects just make breaking changes at
>> minor releases quite often, in my experience. I don't want to follow that
>> pattern, for sure.
>>
>> Regarding Danny's comment of not seeing this culture - check out any of
>> our more mature IOs, which all have very high cyclomatic complexity due to
>> never being significantly refactored. Adhering to in-place state
>> compatibility for update instead of focusing on blue/green deployment is
>> also a culprit here. I don't have examples to mind, but the point about the
>> culture of stagnation came from my recent experiences as code
>> reviewer where there was some idea that we couldn't change things even when
>> they were plainly wrong and the change was plainly a fix.
>>
>> Often, it comes from corners like triggered side inputs where we simply
>> never had a clear concept and so bringing things into alignment with a spec
>> will break someone, by necessity. To be clear: I have not received pushback
>> on that one (yet). Some other examples are
>> https://s.apache.org/finishing-triggers-drop-data (breaking change
>> necessary to eliminate data loss risk)
>> https://github.com/apache/beam/issues/20528 (fix was too slow because we
>> were hesitant to commit a breaking fix)
>> https://github.com/apache/beam/pull/8134#pullrequestreview-218592801
>> (left unsafe API in place, applied doc-only fix).
>>
>> But indeed, of all the issues I raised, the customer concern with
>> `@Experimental` was the most important. We have had a few threads about it
>> in the past, too, and it hasn't gotten better.
>>
>>  1. It does not have the intended effect (making users OK with evolving
>> APIs and behavior to allow us to reach a high level of quality)
>>  2. It has an unintended effect (making users afraid to use things which
>> they should be happy to use)
>>  3. We don't use it consistently (many less-safe things are not
>> experimental, many totally stable things are experimental)
>>
>> Because of 3, if we don't have a feasible way to move to
>> "evolving/unstable by default" in a way that users know and are OK with,
>> then 1 is impossible. And so the only way to fix 2 is to just eliminate the
>> annotation approach entirely and go with language conventions.
>>
>
> +1 to eliminating @Experimental as a Beam level annotation. That is the
> simplest approach that will get us to a consistent state, and it will align
> the goals and intentions of us with users'.
>
>
>>
>> Kenn
>>
>> On Wed, Apr 12, 2023 at 5:10 PM Ahmet Altay via dev 
>> wrote:
>>
>>> I agree with Alexey and Byron.
>>> 1. We do not have any concrete evidence of our users paying attention to
>>> any of those annotations. Experimental API that were in that state for a
>>> long while are good examples. A possible exception is a deprecated
>>> annotation. My preference would be to simplify annotations to nothing
>>> (stable enough for use and will evolve backward compatibility), and maybe
>>> deprecated annotations.
>>> 2. If you all think that Experimental annotation is needed, Byron's
>>> suggestion (more or less what we do today) but with some concrete life
>>> cycle definitions of those annotations would be useful to our users. (An
>>> example could be: experimental APIs either need to graduate or be removed
>>> in X releases.)
>>>
>>>
>>>
>>> On Tue, Apr 4, 2023 at 9:01 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Great and long-to-wait topic to discuss.

 My personal opinion based on what I saw on different open-source
 projects is that all such annotations, like @Experimental or @Stable, are
 not usefull along the time and even rather useless and misleading. What
 

Re: A user-deployable Beam Transform Service

2023-03-28 Thread Chamikara Jayalath via dev
Hi All,

I've developed a version of this service using Docker Compose and it's
available here: https://github.com/apache/beam/pull/26023

Currently it consists of a controller container and a single expansion
service container (Java) but I hope to add a Python expansion service
container to this as well.

This can be used to easily start a service that hosts Beam transforms.

Once started, this service can be used by as many pipelines as needed to
expand/discover portable transforms available in Beam.

Specifically, for multi-language pipelines, this has the following benefits.

* No need to install runtimes for other languages when running pipelines in
a given language.
* No need to download external artifacts (for example, shaded expansion
service jar files). They will be served using local artifacts included in
the container. Also, in the future we can modify/optimize the set of
dependencies without updating the wrappers.
* No need to deal with multiple endpoints. A single endpoint serves all
expansion services.

I propose adding this to Beam and updating multi-language wrappers to use
this service when Docker is available in the system.

Please let me know if you have any comments or questions.

Thanks,
Cham

On Fri, Feb 10, 2023 at 4:00 PM Luke Cwik  wrote:

> Seems like a useful thing to me and will make it easier for Beam users
> overall.
>
> On Fri, Feb 10, 2023 at 3:56 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks. I added some comments to the doc.
>>
>> On Mon, Feb 6, 2023 at 1:33 PM Chamikara Jayalath via dev
>>  wrote:
>> >
>> > Hi All,
>> >
>> > Beam PTransforms are currently primarily identified as operations in a
>> pipeline that perform specific tasks. PTransform implementations were
>> traditionally linked to specific Beam SDKs.
>> >
>> > With the advent of portability framework, multi-language pipelines, and
>> expansion services that can be used to build/expand and discover
>> transforms, we have an opportunity to make this more general and
>> re-introduce Beam PTransforms as computation units that can serve any
>> use-case that needs to discover or use Beam transforms. For example, any
>> Beam SDK that runs a pipeline using a portable Beam runner should be able
>> to use a transform offered through an expansion service irrespective of the
>> implementation SDK of the transform or the pipeline.
>> >
>> > I believe we can make such use-cases much easier to manage by
>> introducing a user-deployable service that encapsulates existing Beam
>> expansion services in the form of a Kubernetes cluster. The service will
>> offer a single gRPC endpoint and will include Beam expansion services
>> developed in different languages. Any Beam pipeline, irrespective of the
>> pipeline SDK, should be able to use any transform offered by the service.
>> >
>> > This will also offer a way to make multi-language pipeline execution,
>> which currently relies on locally downloaded large dependencies and locally
>> started expansion service processes, more robust.
>> >
>> > I have written a proposal for implementing such a service and it's
>> available at https://s.apache.org/beam-transform-service.
>> >
>> > Please take a look and let me know if you have any comments or
>> questions.
>> >
>> > Thanks,
>> > Cham
>>
>


Re: Proposal: Structured Logging

2023-03-08 Thread Chamikara Jayalath via dev
Thanks for the proposal. Added some comments.

- Cham

On Wed, Mar 8, 2023 at 1:59 PM Udi Meiri  wrote:

> Hi all,
> I have written a proposal for Structured Logging in Beam:
> https://s.apache.org/beam-structured-logging
>
> Please LMK what you think. Any comments welcome here or in the doc.
>
> - Udi
>


Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-04 Thread Chamikara Jayalath via dev
+1 (binding)

Validated multi-language Java and Python pipelines.

On Fri, Mar 3, 2023 at 1:59 PM Danny McCormick via dev 
wrote:

> > I have encountered a failure in a Python pipeline running with Runner
> v1:
>
> > RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow
> Python worker version 2.45.0. Please check Dataflow worker startup logs and
> make sure that correct version of Beam SDK is installed.
>
> > We should understand why Python ValidatesRunner tests (which have
> passed)  didn't catch this error.
>
> > This can be remediated in Dataflow containers without  changes to the
> release candidate.
>
> Good catch! I've kicked off a release to fix this, it should be done later
> this evening - I won't be available when it completes, but I would expect
> it to be around 5:00 PST.
>
> On Fri, Mar 3, 2023 at 3:49 PM Danny McCormick 
> wrote:
>
>> Hey Reuven, could you provide some more context on the bug/why it is
>> important? Does it meet the standard in
>> https://beam.apache.org/contribute/release-guide/#7-triage-release-blocking-issues-in-github
>> ?
>>
>> The release branch was cut last Wednesday, so that is why it is not
>> included.
>>
>
Seems like this was a revert of a previous commit that was also not
included in the 2.46.0 release branch (
https://github.com/apache/beam/pull/25627) ?

If so we might not need a new RC but good to confirm.

Thanks,
Cham


>> On Fri, Mar 3, 2023 at 3:24 PM Reuven Lax  wrote:
>>
>>> If possible, I would like to see if we could include
>>> https://github.com/apache/beam/pull/25642 as we believe this bug has
>>> been impacting multiple users. This was merged 4 days ago, but this RC cut
>>> does not seem to include it.
>>>
>>> On Fri, Mar 3, 2023 at 12:18 PM Valentyn Tymofieiev via dev <
>>> dev@beam.apache.org> wrote:
>>>
 I have encountered a failure in a Python pipeline running with Runner
 v1:

 RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow
 Python worker version 2.45.0. Please check Dataflow worker startup logs and
 make sure that correct version of Beam SDK is installed.

 We should understand why Python ValidatesRunner tests (which have
 passed)  didn't catch this error.

 This can be remediated in Dataflow containers without  changes to the
 release candidate.

 On Fri, Mar 3, 2023 at 11:22 AM Robert Bradshaw via dev <
 dev@beam.apache.org> wrote:

> +1 (binding).
>
> I verified that the artifacts and signatures all look good, all the
> containers are pushed, and tested some pipelines with a fresh install
> from one of the Python wheels.
>
> On Fri, Mar 3, 2023 at 11:13 AM Danny McCormick
>  wrote:
> >
> > > The released artifacts seem to be missing the last commit at
> > >
> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
> > > . Is this essential to the release, or just useful for validating
> it?
> >
> > It's strictly a test infrastructure change, it has no functional
> impact. For context, the changes included were from
> https://github.com/apache/beam/pull/25661 and
> https://github.com/apache/beam/pull/25654, both were keeping
> integration tests from running correctly.
>
> Thanks.
>
> > On Fri, Mar 3, 2023 at 2:09 PM Robert Bradshaw 
> wrote:
> >>
> >> The released artifacts seem to be missing the last commit at
> >>
> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
> >> . Is this essential to the release, or just useful for validating
> it?
> >>
> >> On Fri, Mar 3, 2023 at 11:02 AM Danny McCormick
> >>  wrote:
> >> >
> >> > Thanks for calling that out, and thanks for helping me fix it! We
> should be all set now
> >> >
> >> > On Fri, Mar 3, 2023 at 1:38 PM Robert Bradshaw <
> rober...@google.com> wrote:
> >> >>
> >> >> It appears your public key is not published in
> >> >> https://dist.apache.org/repos/dist/release/beam/KEYS .
> >> >>
> >> >> On Fri, Mar 3, 2023 at 8:33 AM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
> >> >> >
> >> >> > +1 (non-binding)
> >> >> > Tested python wordcount quick start
> https://beam.apache.org/get-started/quickstart-py/ on Direct Runner
> and Dataflow Runner.
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > On Fri, Mar 3, 2023 at 11:21 AM Bruno Volpato via dev <
> dev@beam.apache.org> wrote:
> >> >> >>
> >> >> >> +1 (non-binding)
> >> >> >>
> >> >> >> Tested with
> https://github.com/GoogleCloudPlatform/DataflowTemplates (Java SDK
> 11, Dataflow runner).
> >> >> >>
> >> >> >>
> >> >> >> Thanks Danny!
> >> >> >>
> >> >> >> On Thu, Mar 2, 2023 at 5:16 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
> >> >> >>>
> >> >> >>> Hi everyone,
> >> >> >>> Please review and 

Re: [ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-16 Thread Chamikara Jayalath via dev
Congrats Jan!

On Thu, Feb 16, 2023 at 8:35 AM John Casey via dev 
wrote:

> Thanks Jan!
>
> On Thu, Feb 16, 2023 at 11:11 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations!
>>
>> On Thu, Feb 16, 2023 at 11:09 AM Reza Rokni via dev 
>> wrote:
>>
>>> Congratulations!
>>>
>>> On Thu, Feb 16, 2023 at 7:47 AM Robert Burke  wrote:
>>>
 Congratulations!

 On Thu, Feb 16, 2023, 7:44 AM Danielle Syse via dev <
 dev@beam.apache.org> wrote:

> Congrats, Jan! That's awesome news. Thank you for your continued
> contributions!
>
> On Thu, Feb 16, 2023 at 10:42 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming Jan Lukavský
>>  as our newest PMC member.
>>
>> Jan has been a part of Beam community and a long time contributor
>> since 2018 in many significant ways, including code contributions in
>> different areas, participating in technical discussions, advocating for
>> users, giving a talk at Beam Summit and even writing one of the few Beam
>> books!
>>
>> Congratulations Jan and thanks for being a part of Apache Beam!
>>
>> ---
>> Alexey
>
>


Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-12 Thread Chamikara Jayalath via dev
+1 (binding)

Tried several Java and Python multi-language pipelines.

Thanks,
Cham

On Fri, Feb 10, 2023 at 1:52 PM Luke Cwik via dev 
wrote:

> +1
>
> Validated release artifact signatures and verified the Java Flink and
> Spark quickstarts.
>
> On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev 
> wrote:
>
>> Addendum to above email.
>>
>> Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362
>>
>> On Fri, Feb 10, 2023 at 11:14 AM John Casey 
>> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #3 for the version
>>> 2.45.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found.
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint 921F35F5EC5F5DDE [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.45.0-RC1" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle GRADLE_VERSION and
>>> OpenJDK/Oracle JDK JDK_VERSION.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.45.0release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> blog post at /blog/validate-beam-release/.
>>>
>>> Thanks,
>>> John Casey
>>>
>>> [1] https://github.com/apache/beam/milestone/8
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.45.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1293/
>>> [5] https://github.com/apache/beam/tree/v2.45.0-RC1
>>> [6] https://github.com/apache/beam/pull/25407
>>> [7] https://github.com/apache/beam-site/pull/640
>>> [8] https://pypi.org/project/apache-beam/2.45.0rc1/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.45.0-RC1/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2030665842
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>
>>


Re: [fyi][discuss] Making jackson-dataformat-yaml a provided/optional dependency

2023-02-06 Thread Chamikara Jayalath via dev
Thanks. Added some comments to the PR.

- Cham


On Mon, Feb 6, 2023 at 9:29 AM Pablo Estrada via dev 
wrote:

> It's worth mentioning that neither of the libraries
> (jackson-dataformat-yaml + snakeyaml) have a newer version without the
> CVE.
> -P.
>
> On Mon, Feb 6, 2023 at 9:19 AM Pablo Estrada  wrote:
>
>> Hi all,
>> I am proposing that we make the jackson-dataformat-yaml dependency
>> optional in our expansion service module[1]. This is because it depends on
>> SnakeYAML, and there is a known CVE for it[2].
>>
>> It seems that given the way we use SnakeYAML, the CVE is not feasible to
>> exploit[2], but this will not stop tooling/user policies from being
>> alerted, so it may be convenient to simply make the dependency optional.
>>
>> I looked around for documentation on this code path (loading an allow
>> list for the expansion service's classpath), but it's not very widely
>> documented, so this feature may only be used by Beam devs, and not much by
>> Beam users.
>>
>> Thoughts on making the dependency optional?
>> Thanks!
>> -P.
>>
>> [1] https://github.com/apache/beam/pull/25350
>> [2] https://github.com/snakeyaml/snakeyaml#cve
>>
>


A user-deployable Beam Transform Service

2023-02-06 Thread Chamikara Jayalath via dev
Hi All,

Beam PTransforms are currently primarily identified as operations in a
pipeline that perform specific tasks. PTransform implementations were
traditionally linked to specific Beam SDKs.

With the advent of portability framework, multi-language pipelines
,
and expansion services that can be used to build/expand and discover
transforms, we have an opportunity to make this more general and
re-introduce Beam PTransforms as computation units that can serve any
use-case that needs to discover or use Beam transforms. For example, any
Beam SDK that runs a pipeline using a portable Beam runner should be able
to use a transform offered through an expansion service irrespective of the
implementation SDK of the transform or the pipeline.

I believe we can make such use-cases much easier to manage by introducing a
user-deployable service that encapsulates existing Beam expansion services
in the form of a Kubernetes cluster. The service will offer a single gRPC
endpoint and will include Beam expansion services developed in different
languages. Any Beam pipeline, irrespective of the pipeline SDK, should be
able to use any transform offered by the service.

This will also offer a way to make multi-language pipeline execution, which
currently relies on locally downloaded large dependencies and locally
started expansion service processes, more robust.

I have written a proposal for implementing such a service and it's
available at https://s.apache.org/beam-transform-service.

Please take a look and let me know if you have any comments or questions.

Thanks,
Cham


Re: Adding configurable gRPC channels to External transform

2023-01-25 Thread Chamikara Jayalath via dev
On Tue, Jan 24, 2023 at 1:44 PM Sahith Nallapareddy via dev <
dev@beam.apache.org> wrote:

> Hello,
>
> I made a PR, https://github.com/apache/beam/pull/25151, to add
> configurable gRPC channel to the ArtifactRetrievalService stub. We are
> hosting our external transforms in different environments and we were
> trying to host them in Google Cloud Run. This requires that the gRPC calls
> have to use TransportSecurity and also include additional authorization in
> each call. This was possible to implement for the Expand stub, but not the
> ArtifactRetrieval stub. I am wondering who are the best people to request
> reviews for this? This is my naive implementation and am very open to
> change on how this should be implemented!
>

Thanks for the contribution. I can review.

- Cham


>
> Thanks,
>
> Sahith
>


Re: Refactor Kubernetes Kafka External load balancer dependency in tests

2023-01-23 Thread Chamikara Jayalath via dev
+1 for reducing access to what's required by tests.

Thanks,
Cham

On Mon, Jan 23, 2023 at 10:30 AM Yi Hu via dev  wrote:

> Hi Damon,
>
> Thanks for the proposal! Our k8s infrastructure has long been under
> maintained. I agree Public IP exposure is not necessary and it has
> triggered security alerts quite often... Would like to help if needed.
>
> Best,
> Yi
>
> On Mon, Jan 23, 2023 at 1:25 PM Damon Douglas via dev 
> wrote:
>
>> Hello Everyone,
>>
>> I would like to share with the community a proposal [1] to refactor a
>> Kubernetes Kafka External Load Balancer dependency in tests.  It fixes
>> [2].  The referenced document summarizes the situation, background,
>> assessment, and recommendation.
>>
>> Best,
>>
>> Damon
>>
>> *References*
>>
>> 1.[Public facing][Beam][issue/25119] Refactor Kubernetes Kafka External
>> load balancer dependency in tests
>> 
>> 2. https://github.com/apache/beam/issues/25119
>>
>


Re: [VOTE] Release 2.44.0, release candidate #1

2023-01-11 Thread Chamikara Jayalath via dev
+1 (binding)

Tested several Multi-language cases.

Thanks,
Cham

On Wed, Jan 11, 2023 at 9:59 AM Jan Lukavský  wrote:

> +1 (non-binding)
>
> Tested Java SDK with Flink runner.
>
> Thanks,
>
>  Jan
> On 1/11/23 17:27, Bruno Volpato via dev wrote:
>
> +1 (non-binding)
>
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates (Java
> SDK 11, Dataflow runner).
>
>
> Thanks!
>
> On Wed, Jan 11, 2023 at 11:08 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> +1 (binding)
>>
>> Tested with  https://github.com/Talend/beam-samples/
>> (Java SDK v8/v11/v17, Spark 3 runner).
>>
>> ---
>> Alexey
>>
>> On 11 Jan 2023, at 16:53, Ritesh Ghorse via dev 
>> wrote:
>>
>> +1 (non-binding)
>> Validated Go Dataframe Transform wrapper on Dataflow runner and Go SDK
>> quickstart on Direct and Dataflow Runner.
>>
>> Thanks!
>>
>> On Wed, Jan 11, 2023 at 12:51 AM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>>
>>> I ran the Python word count on DirectRunner and Dataflow Runner.
>>>
>>> Steps:
>>> 1. pip install --pre apache_beam in a fresh virtualenv.
>>> 2. Run the command Ahmet provided except removing the sdk_location from
>>> CMD args.
>>>
>>> The job was successful.   
>>>
>>> On Tue, Jan 10, 2023 at 6:48 PM Ahmet Altay via dev 
>>> wrote:
>>>
 I validated python quick starts (direct, dataflow) X (batch,
 streaming). I ran into an issue with the dataflow batch case, running the
 wordcount with the standard:

 python -m apache_beam.examples.wordcount \
 --output  \
 --staging_location  \
 --temp_location \
 --runner DataflowRunner \
 --job_name wordcount-$USER \
 --project  \
 --num_workers 1 \
 --region us-central1 \
 --sdk_location apache-beam-2.44.0.zip

 results in:

 "/usr/local/lib/python3.10/site-packages/dataflow_worker/shuffle.py",
 line 589, in __enter__ raise
 RuntimeError(_PYTHON_310_SHUFFLE_ERROR_MESSAGE) RuntimeError: This pipeline
 requires Dataflow Runner v2 in order to run with currently used version of
 Apache Beam on Python 3.10+. Please verify that the Dataflow Runner v2 is
 not disabled in the pipeline options or enable it explicitly via:
 --dataflow_service_option=use_runner_v2. Alternatively, downgrade to Python
 3.9 to use Dataflow Runner v1.

 Questions:
 - I am not explicitly opting out of runner v2, and this is a standard
 wordcount example, I expected it to just work.

 Then I tried to add --dataflow_service_option=use_runner_v2 to the
 above wordcount command, which results in the following error:

 "message": "Dataflow Runner v2 requires a valid FnApi job, Please
 resubmit your job with a valid configuration. Note that if using Templates,
 you may need to regenerate your template with the '--use_runner_v2'."

 Maybe I am doing something wrong and it is an error on my end. It would
 be good for someone else with python experience to check this.

 /cc @Valentyn Tymofieiev 

 Ahmet




 On Tue, Jan 10, 2023 at 10:54 AM Kenneth Knowles 
 wrote:

> I have published a new maven staging repository:
> https://repository.apache.org/content/repositories/orgapachebeam-1290/
>
> It looks like it has everything, though I did not automate a check. At
> least there were no errors during publish which I ran with --no-parallel
> overnight, and some specific things that were missing from
> orgapachebeam-1289 are present.
>
> I will restart the 72 hour waiting period, since the RC is only now
> usable.
>
> Kenn
>
> On Mon, Jan 9, 2023 at 6:51 PM Kenneth Knowles 
> wrote:
>
>> I have discovered that many pom files are missing from the nexus
>> repository. I should be able to re-publish a new one. It will take some
>> time as this is one of the longest-running processes.
>>
>> On Mon, Jan 9, 2023 at 1:42 PM Kenneth Knowles 
>> wrote:
>>
>>> Correction: this is release candidate #1.
>>>
>>> On Mon, Jan 9, 2023 at 1:25 PM Kenneth Knowles 
>>> wrote:
>>>
 Hi everyone,

 Please review and vote on the release candidate #3 for the version
 2.44.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific
 comments)

 Reviewers are encouraged to test their own use cases with the
 release candidate, and vote +1 if
 no issues are found.

 The complete staging area is available for your review, which
 includes:
 * GitHub Release notes [1],
 * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 6ED551A8AE02461C [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag 

Re: Testing Multilanguage Pipelines?

2023-01-10 Thread Chamikara Jayalath via dev
On Wed, Dec 28, 2022 at 7:25 PM Byron Ellis via dev 
wrote:

> Thanks for the tips, folks! Took a bit of doing, but I got Java -> Python
> -> Java working without Docker being involved in the process (getting it
> working with Docker being involved wasn't so bad... though it didn't do
> what I wanted with respect to collecting results). Removing Docker appears
> to let me collect the results back on the Java side via Beam SQL's
> TestTable, which then lets me inspect the results for test validation
> purposes.
>

Great!
FWIW, commands for running locally with DirectRunner are also documented
here:
https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/#run-the-java-pipeline

Thanks,
Cham


>
> In case anyone else is feeling similarly foolish, here's what ended up
> working:
>
>
> https://github.com/byronellis/beam/blob/structured-pipeline-definitions/sdks/java/extensions/spd/src/test/java/org/apache/beam/sdk/extensions/spd/StructuredPipelineExecutionTest.java
>
> It ain't pretty, but it gets the job done.
>
> Best,
> B
>
>
>
> On Wed, Dec 28, 2022 at 10:42 AM Robert Bradshaw 
> wrote:
>
>> On Wed, Dec 28, 2022 at 10:09 AM Byron Ellis 
>> wrote:
>> >
>> > On Wed, Dec 28, 2022 at 9:49 AM Robert Bradshaw 
>> wrote:
>> >>
>> >> On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev
>> >>  wrote:
>> >> >
>> >> > > Given the increasing importance of multi language pipelines, it
>> does seem that we should expand the capabilities of the DirectRunner or
>> just go all in on FlinkRunner for testing and local / small scale
>> development
>> >> >
>> >> > +1 - annecdotally I've found local testing of multi-language
>> pipelines to be tricky, and have had multiple conversations with others who
>> have run into similar challenges in multiple contexts (both users and
>> people working on the project).
>> >>
>> >> I generally do all my testing against the Python runner which works
>> >> well. This is, of course, more natural for Python pipelines using
>> >> other languages, but when I was working on typescript which uses
>> >> cross-language even more heavily I just made it auto-start the python
>> >> runner just like the expansion services are auto-started which works
>> >> quite well. (The auto-started runner is just a plain-old portable
>> >> runner speaking the runner API, so no additional support is required
>> >> on the source side once it's started. And if you're already trying to
>> >> use dataframes and/or ML, you need to have Python available anyway.)
>> >>
>> >> We could consider bundling it as a docker image to reduce the required
>> >> dependency set, but we'd have to solve the docker-in-docker issue to
>> >> do that.
>> >>
>> >> I really think it's important to make cross-language a first-class
>> >> citizen--the end use should not care most of the time whether the
>> >> pipelines they use are native or not.
>> >
>> >
>> > Thanks! That's helpful. In this case getting the Python runner to
>> auto-start sounds like the most straightforward option for testing. After
>> all it's explicitly to provide Python initiated from Java so Python is
>> already going to be around and running (and in fact the test auto-starts
>> the Python expansion service already to get the graph in the first place)
>> and the deps are already going to be there.
>>
>> Yep.
>>
>> > I'm personally on the fence about Docker in these sorts of situations.
>> Yes, it makes life easier for the most part but gets complicated quickly.
>> It's also not an option for everyone.
>>
>> For sure. I think it'd be good to have various alternative packaging
>> of expansion services as different people will have different setups
>> (e.g. a Crostini Go developer is more likely to have docker than java,
>> but it's probably just the opposite for a java developer on windows).
>> This is what I did for the yaml thing. Note that nominally docker is
>> required for running a cross-language pipeline, so that makes it a
>> more natural option there. (Technically, at least for development, you
>> can have the host SDK process vend itself as a worker in LOOPBACK
>> mode, and if you pass the directEmbedDockerPython=true option to the
>> portable python runner it will inline the Python operations rather
>> than firing up a docker worker for those (assuming, of course, the
>> versions match.)
>>
>> > I'll give things a shot and report back (if you have an example of
>> auto-starting the Python runner that'd be cool too---if I get inspired I
>> might try to add that to the Python extensions in Java since right now they
>> don't actually appear to be exercising the runner itself based on the TODOs)
>>
>> In typescript the runner is started up as
>>
>>
>> PythonService.forModule("apache_beam.runners.portability.local_job_service_main",
>> ["--port", "{{PORT}}"])
>>
>> which is very similar to how the expansion service is started up
>>
>>
>>  
>> PythonService.forModule("apache_beam.runners.portability.expansion_service_main",
>> 

Re: [Proposal] Adopt a Beam I/O Standard

2022-12-15 Thread Chamikara Jayalath via dev
On Thu, Dec 15, 2022, 8:33 AM Alexey Romanenko 
wrote:

> Cham, do you remember what was a reason to not finalise that doc?
>

I think this is a continuation of those docs (so we are trying to finalize)
but probably  Herman can explain better.


> Personally, I find having such standards very useful (if they are flexible
> during a time, of course), especially for new developers and PR reviewers,
> and it’d be great to finally have such doc as a part of contribution guide.
>

+1

Thanks,
Cham

>
> —
> Alexey
>
> On 13 Dec 2022, at 04:32, Chamikara Jayalath via dev 
> wrote:
>
> Yeah, I don't think either finalized or documented (in the Website) the
> previous iteration. This doc seems to contain details from the documents
> shared in the previous iteration.
>
> Thanks,
> Cham
>
>
>
> On Mon, Dec 12, 2022 at 6:49 PM Robert Burke  wrote:
>
>> I think ultimately: until the docs a clearly available on the Beam site
>> itself, it's not documentation. See also, design docs, previous emails, and
>> similar.
>>
>> On Mon, Dec 12, 2022, 6:07 PM Andrew Pilloud via dev 
>> wrote:
>>
>>> I believe the previous iteration was here:
>>> https://lists.apache.org/thread/3o8glwkn70kqjrf6wm4dyf8bt27s52hk
>>>
>>> The associated docs are:
>>> https://s.apache.org/beam-io-api-standard-documentation
>>> https://s.apache.org/beam-io-api-standard
>>>
>>> This is missing all the relational stuff that was in those docs, this
>>> appears to be another attempt starting from the beginning?
>>>
>>> Andrew
>>>
>>>
>>> On Mon, Dec 12, 2022 at 9:57 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
>>>> Thanks for writing this!
>>>>
>>>> IIRC, the similar design doc was sent for review here a while ago. Is
>>>> this just an updated version and a new one?
>>>>
>>>> —
>>>> Alexey
>>>>
>>>> On 11 Dec 2022, at 15:16, Herman Mak via dev 
>>>> wrote:
>>>>
>>>> Hello Everyone,
>>>>
>>>> *TLDR*
>>>>
>>>> Should we adopt a set of standards that Connector I/Os should adhere
>>>> to?
>>>> Attached is a first version of a Beam I/O Standards guideline that
>>>> includes opinionated best practices across important components of a
>>>> Connector I/O, namely Documentation, Development and Testing.
>>>>
>>>> *The Long Version*
>>>>
>>>> Apache Beam is a unified open-source programming model for both batch
>>>> and streaming. It runs on multiple platform runners and integrates with
>>>> over 50 services using individually developed I/O Connectors
>>>> <https://beam.apache.org/documentation/io/connectors/>.
>>>>
>>>> Given that Apache Beam connectors are written by many different
>>>> developers and at varying points in time, they vary in syntax style,
>>>> documentation completeness and testing done. For a new adopter of Apache
>>>> Beam, that can definitely cause some uncertainty.
>>>>
>>>> So should we adopt a set of standards that Connector I/Os should adhere
>>>> to?
>>>> Attached is a first version, in Doc format, of a Beam I/O Standards
>>>> guideline that includes opinionated best practices across important
>>>> components of a Connector I/O, namely Documentation, Development and
>>>> Testing. And the aim is to incorporate this into the documentation and to
>>>> have it referenced as standards for new Connector I/Os (and ideally have
>>>> existing Connectors upgraded over time). If it looks helpful, the immediate
>>>> next step is that we can convert it into a .md as a PR into the Beam repo!
>>>>
>>>> Thanks and looking forward to feedbacks and discussion,
>>>>
>>>>  [PUBLIC] Beam I/O Standards
>>>> <https://docs.google.com/document/d/1BCTpSZDUjK90hYZjcn8aAnPd9vuRfj8YU1j3mpSgRwI/edit?usp=drive_web>
>>>>
>>>> Herman Mak |  Customer Engineer, Hong Kong, Google Cloud |
>>>> herman...@google.com |  +852-3923-5417 <+852%203923%205417>
>>>>
>>>>
>>>>
>>>>
>


Re: A Declarative API for Apache Beam

2022-12-14 Thread Chamikara Jayalath via dev
+1 for these proposals and agree that these will simplify and demystify
Beam for many new users. I think when combined with the x-lang/Schema-Aware
transform binding, these might end up being adequate solutions for many
production use-cases as well (unless users need to define custom
composites, I/O connectors, etc.).

Also, thanks for providing prototype implementations with examples.

- Cham


On Wed, Dec 14, 2022 at 3:01 PM Sachin Agarwal via dev 
wrote:

> To build on Kenn's point, if we leverage existing stuff like dbt we get
> access to a ready made community which can help drive both adoption and
> incremental innovation by bringing more folks to Beam
>
> On Wed, Dec 14, 2022 at 2:57 PM Kenneth Knowles  wrote:
>
>> 1. I love the idea. Back in the early days people talked about an "XML
>> SDK" or "JSON SDK" or "YAML SDK" and it didn't really make sense at the
>> time. Portability and specifically cross-language schema transforms gives
>> the right infrastructure so this is the perfect time: unique names (URNs)
>> for transforms and explicit lists of parameters they require.
>>
>> 2. I like the idea of re-using some existing thing like dbt if it is
>> pretty much what we were going to do anyhow. I don't think we should hold
>> ourselves back. I also don't think we'll gain anything in terms of
>> implementation. But at least it could fast-forward our design process
>> because we simply don't have to make most of the decisions because they are
>> made for us.
>>
>>
>>
>> On Wed, Dec 14, 2022 at 2:44 PM Byron Ellis via dev 
>> wrote:
>>
>>> And I guess also a PR for completeness to make it easier to find going
>>> forward instead of my random repo:
>>> https://github.com/apache/beam/pull/24670
>>>
>>> On Wed, Dec 14, 2022 at 2:37 PM Byron Ellis 
>>> wrote:
>>>
 Since Robert opened that can of worms (and we happened to talk about it
 yesterday)... :-)

 I figured I'd also share my start on a "port" of dbt to the Beam SDK.
 This would be complementary as it doesn't really provide a way of
 specifying a pipeline, more orchestrating and packaging a complex
 pipeline---dbt itself supports SQL and Python Dataframes, which both seem
 like reasonable things for Beam and it wouldn't be a stretch to include
 something like the format above. Though in my head I had imagined people
 would tend to write composite transforms in the SDK of their choosing that
 are then exposed at this layer. I decided to go with dbt as it also
 provides a number of nice "quality of life" features for its users like
 documentation, validation, environments and so on,

 I did a really quick proof-of-viability implementation here:
 https://github.com/byronellis/beam/tree/structured-pipeline-definitions

 And you can see a really simple pipeline that reads a seed file
 (TextIO), runs it through a couple of SQLTransforms and then drops it out
 to a logger via a simple DoFn here:
 https://github.com/byronellis/beam/tree/structured-pipeline-definitions/sdks/java/extensions/spd/src/test/resources/simple_pipeline

 I've also heard a rumor there might also be a textproto-based
 representation floating around too :-)

 Best,
 B





 On Wed, Dec 14, 2022 at 2:21 PM Damon Douglas via dev <
 dev@beam.apache.org> wrote:

> Hello Robert,
>
> I'm replying to say that I've been waiting for something like this
> ever since I started learning Beam and I'm grateful you are pushing this
> forward.
>
> Best,
>
> Damon
>
> On Wed, Dec 14, 2022 at 2:05 PM Robert Bradshaw 
> wrote:
>
>> While Beam provides powerful APIs for authoring sophisticated data
>> processing pipelines, it often still has too high a barrier for
>> getting started and authoring simple pipelines. Even setting up the
>> environment, installing the dependencies, and setting up the project
>> can be an overwhelming amount of boilerplate for some (though
>> https://beam.apache.org/blog/beam-starter-projects/ has gone a long
>> way in making this easier). At the other extreme, the Dataflow project
>> has the notion of templates which are pre-built Beam pipelines that
>> can be easily launched from the command line, or even from your
>> browser, but they are fairly restrictive, limited to pre-assembled
>> pipelines taking a small number of parameters.
>>
>> The idea of creating a yaml-based description of pipelines has come up
>> several times in several contexts and this last week I decided to code
>> up what it could look like. Here's a proposal.
>>
>> pipeline:
>>   - type: chain
>> transforms:
>>   - type: ReadFromText
>> args:
>>  file_pattern: "wordcount.yaml"
>>   - type: PyMap
>> fn: "str.lower"
>>   - type: PyFlatMap
>> fn: "import re\nlambda 

Re: [Beam Playground] Local Development Environment: Kubernetes vs Docker Compose

2022-12-12 Thread Chamikara Jayalath via dev
For this kind of decisions, I'd write a short doc with pros and cons and
suggest an option. We can further discuss in the doc or dev list if needed.
If there's a significant disagreement we could even go for a vote in the
dev list but usually we do not get to that (and go by lazy consensus [1]).

BTW we had a very similar discussion previously regarding using one of
these systems for hosting datastores for Beam I/O testing.
https://lists.apache.org/thread/r0gn5fzp6zy6c277r1sqvb4o9rc45rxf

Thanks,
Cham

[1] https://community.apache.org/committers/lazyConsensus.html

On Mon, Dec 12, 2022 at 11:16 AM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> *Even if this is your first day learning Beam, please feel welcome to
> vote.*
>
> *Please cast your single question answer on your preference* for
> Kubernetes [1] versus Docker Compose [2] in local development of the Beam
> Playground [3].  The form provides short and long
> versioned explanations, if needed.
>
> *https://forms.gle/GBZZ9nCzj5EvXVgQ8
>  *
>
> Thank you for your time and help.
>
> Best,
>
> Damon
>
> *References*:
>
> 1. Kubernetes - an open-source system for automating deployment, scaling,
> and management of containerized applications.
> See https://kubernetes.io/
> 2. Docker Compose - a tool for defining and running multi-container Docker
> applications.
> See https://docs.docker.com/compose/
> 3. Beam Playground - a full stack web application to execute Apache Beam
> snippets in a modern browser.
> See https://play.beam.apache.org/
>


Re: [Proposal] Adopt a Beam I/O Standard

2022-12-12 Thread Chamikara Jayalath via dev
Yeah, I don't think either finalized or documented (in the Website) the
previous iteration. This doc seems to contain details from the documents
shared in the previous iteration.

Thanks,
Cham



On Mon, Dec 12, 2022 at 6:49 PM Robert Burke  wrote:

> I think ultimately: until the docs a clearly available on the Beam site
> itself, it's not documentation. See also, design docs, previous emails, and
> similar.
>
> On Mon, Dec 12, 2022, 6:07 PM Andrew Pilloud via dev 
> wrote:
>
>> I believe the previous iteration was here:
>> https://lists.apache.org/thread/3o8glwkn70kqjrf6wm4dyf8bt27s52hk
>>
>> The associated docs are:
>> https://s.apache.org/beam-io-api-standard-documentation
>> https://s.apache.org/beam-io-api-standard
>>
>> This is missing all the relational stuff that was in those docs, this
>> appears to be another attempt starting from the beginning?
>>
>> Andrew
>>
>>
>> On Mon, Dec 12, 2022 at 9:57 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Thanks for writing this!
>>>
>>> IIRC, the similar design doc was sent for review here a while ago. Is
>>> this just an updated version and a new one?
>>>
>>> —
>>> Alexey
>>>
>>> On 11 Dec 2022, at 15:16, Herman Mak via dev 
>>> wrote:
>>>
>>> Hello Everyone,
>>>
>>> *TLDR*
>>>
>>> Should we adopt a set of standards that Connector I/Os should adhere to?
>>> Attached is a first version of a Beam I/O Standards guideline that
>>> includes opinionated best practices across important components of a
>>> Connector I/O, namely Documentation, Development and Testing.
>>>
>>> *The Long Version*
>>>
>>> Apache Beam is a unified open-source programming model for both batch
>>> and streaming. It runs on multiple platform runners and integrates with
>>> over 50 services using individually developed I/O Connectors
>>> .
>>>
>>> Given that Apache Beam connectors are written by many different
>>> developers and at varying points in time, they vary in syntax style,
>>> documentation completeness and testing done. For a new adopter of Apache
>>> Beam, that can definitely cause some uncertainty.
>>>
>>> So should we adopt a set of standards that Connector I/Os should adhere
>>> to?
>>> Attached is a first version, in Doc format, of a Beam I/O Standards
>>> guideline that includes opinionated best practices across important
>>> components of a Connector I/O, namely Documentation, Development and
>>> Testing. And the aim is to incorporate this into the documentation and to
>>> have it referenced as standards for new Connector I/Os (and ideally have
>>> existing Connectors upgraded over time). If it looks helpful, the immediate
>>> next step is that we can convert it into a .md as a PR into the Beam repo!
>>>
>>> Thanks and looking forward to feedbacks and discussion,
>>>
>>>  [PUBLIC] Beam I/O Standards
>>> 
>>>
>>> Herman Mak |  Customer Engineer, Hong Kong, Google Cloud |
>>> herman...@google.com |  +852-3923-5417 <+852%203923%205417>
>>>
>>>
>>>
>>>


Re: Easy Multi-language via a SchemaTransform-aware Expansion Service

2022-11-23 Thread Chamikara Jayalath via dev
Hi All,

The implementation of https://s.apache.org/easy-multi-language (with the
dynamic API for Python) was merged and should be available with Beam
2.44.0: https://github.com/apache/beam/pull/23413

Thanks,
Cham

On Fri, Aug 19, 2022 at 3:35 PM Chamikara Jayalath 
wrote:

> Hi All,
>
> Thanks for the comments so far. Seems like we generally agree on this
> proposal.
>
> Please see https://github.com/apache/beam/pull/22802 for a prototype
> implementation that adds the following.
>
> * Support for dynamically discovering and registering SchemaTransforms in
> the Java expansion service.
> * Support for dynamically discovering registered SchemaTransforms from the
> Python side.
> * Support for using SchemaTransforms in Python pipelines.
>
> Feel free to add more comments to the doc and/or the PR.
>
> Thanks,
> Cham
>
>
>
>
>
>
>
> On Mon, Aug 8, 2022 at 9:34 PM Chamikara Jayalath 
> wrote:
>
>> I think the *DiscoverSchemaTransform()* RPC introduced in this proposal
>> and the ability to easily deploy/use available *SchemaTransforms* using
>> an expansion service essentially provide the tooling necessary for
>> implementing such a service. Such a service could even startup expansion
>> services to discover/list transforms available in given artifacts (for
>> example, jar files).
>>
>> Thanks,
>> Cham
>>
>> On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis  wrote:
>>
>>> I like that idea, sort of like Kafka’s Schema Service but for transforms?
>>>
>>> On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> This is a great idea. I would like to approach this from the
>>>> perspective of making it easy to provide a catalog of well-defined
>>>> transforms for use in expansion services from typical SDKs and also
>>>> elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally
>>>> everything about what a transform is (its config, documentation,
>>>> expectations on inputs, etc.) can be specified programmatically in a
>>>> way that's much easier to both author and consume than it is now.
>>>>
>>>> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev
>>>>  wrote:
>>>> >
>>>> > Hi All,
>>>> >
>>>> > I believe we can make the multi-language pipelines offering [1] much
>>>> easier to use by updating the expansion service to be fully aware of
>>>> SchemaTransforms. Additionally this will make it easy to
>>>> register/discover/use transforms defined in one SDK from all other SDKs.
>>>> Specifically we could add the following features.
>>>> >
>>>> > Expansion service can be used to easily initialize and expand
>>>> transforms without need for additional code.
>>>> > Expansion service can be used to easily discover already registered
>>>> transforms.
>>>> > Pipeline SDKs can generate user-friendly stub-APIs based on
>>>> transforms registered with an expansion service, eliminating the need to
>>>> develop language-specific wrappers.
>>>> >
>>>> > Please see here for my proposal:
>>>> https://s.apache.org/easy-multi-language
>>>> >
>>>> > Lemme know if you have any comments/questions/suggestions :)
>>>> >
>>>> > Thanks,
>>>> > Cham
>>>> >
>>>> > [1]
>>>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines
>>>> >
>>>>
>>>


[ANNOUNCE] Apache Beam 2.43.0 Released

2022-11-18 Thread Chamikara Jayalath via dev
The Apache Beam team is pleased to announce the release of version 2.43.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed on the
Beam blog: https://beam.apache.org/blog/beam-2.43.0/  and the Github
release page https://github.com/apache/beam/releases/tag/v2.43.0

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.43.0.

-- Cham, on behalf of The Apache Beam team


Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-11-17 Thread Chamikara Jayalath via dev
Thanks Kenn.
BTW the correct milestone for the 2.44.0 release should be this one:
https://github.com/apache/beam/milestone/7

- Cham


On Thu, Nov 17, 2022 at 9:12 AM Ahmet Altay via dev 
wrote:

> Thank you Kenn! :)
>
> On Wed, Nov 16, 2022 at 12:45 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> The 2.44.0 release cut is scheduled for Nov 30th [1]. I'd like to
>> volunteer to do this release.
>>
>> As usual, my plan would be to cut right on that date and cherry
>> pick critical fixes.
>>
>> Help me and the release by:
>> - Making sure that any unresolved release blocking issues for 2.44.0 have
>> their "Milestone" marked as "2.44.0 Release" [2].
>> - Reviewing the current release blockers [2] and removing the Milestone
>> if they don't meet the criteria at [3].
>>
>> Kenn
>>
>> [1]
>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/5
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>> Kenn
>>
>


[RESULT] [VOTE] Release 2.43.0, release candidate #2

2022-11-17 Thread Chamikara Jayalath via dev
I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 5 of which are binding:
* Alexey Romanenko
* Jean-Baptiste Onofré
* Pablo Estrada
* Ahmet Altay
* Chamikara Jayalath

There are no disapproving votes.

Thanks everyone!


Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-17 Thread Chamikara Jayalath via dev
Thanks everybody for voting. This vote is now closed. I'll tally the
results in a separate email.

- Cham

On Thu, Nov 17, 2022 at 5:34 AM Chamikara Jayalath 
wrote:

> +1 (binding)
>
> I ran the validations mentioned in the release guide and updated the
> spreadsheet.
>
> Thanks,
> Cham
>
> On Tue, Nov 15, 2022 at 11:15 AM Ahmet Altay  wrote:
>
>> +1 (binding). - I validated the python quick starts on direct runner.
>>
>> Thank you!
>>
>> On Tue, Nov 15, 2022 at 9:51 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Regards
>>> JB
>>>
>>> On Sun, Nov 13, 2022 at 3:52 PM Chamikara Jayalath via dev
>>>  wrote:
>>> >
>>> > Hi everyone,
>>> > Please review and vote on the release candidate #2 for the version
>>> 2.43.0, as follows:
>>> > [ ] +1, Approve the release
>>> > [ ] -1, Do not approve the release (please provide specific comments)
>>> >
>>> >
>>> > Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if
>>> > no issues are found.
>>> >
>>> > The complete staging area is available for your review, which includes:
>>> > * GitHub Release notes [1],
>>> > * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint
>>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>>> > * all artifacts to be deployed to the Maven Central Repository [4],
>>> > * source code tag "v2.43.0-RC2" [5],
>>> > * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> > * Java artifacts were built with Gradle 7.5.1 and openjdk version
>>> 1.8.0_181-google-v7.
>>> > * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> > * Go artifacts and documentation are available at pkg.go.dev [9]
>>> > * Validation sheet with a tab for 2.43.0 release to help with
>>> validation [10].
>>> > * Docker images published to Docker Hub [11].
>>> >
>>> > The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>> >
>>> > For guidelines on how to try the release in your projects, check out
>>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> >
>>> > Thanks,
>>> > Cham
>>> >
>>> > [1] https://github.com/apache/beam/milestone/5
>>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> > [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1288/
>>> > [5] https://github.com/apache/beam/tree/v2.43.0-RC2
>>> > [6] https://github.com/apache/beam/pull/24044
>>> > [7] https://github.com/apache/beam-site/pull/636
>>> > [8] https://pypi.org/project/apache-beam/2.43.0rc2/
>>> > [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
>>> > [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>>> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>
>>


Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-17 Thread Chamikara Jayalath via dev
+1 (binding)

I ran the validations mentioned in the release guide and updated the
spreadsheet.

Thanks,
Cham

On Tue, Nov 15, 2022 at 11:15 AM Ahmet Altay  wrote:

> +1 (binding). - I validated the python quick starts on direct runner.
>
> Thank you!
>
> On Tue, Nov 15, 2022 at 9:51 AM Jean-Baptiste Onofré 
> wrote:
>
>> +1 (binding)
>>
>> Regards
>> JB
>>
>> On Sun, Nov 13, 2022 at 3:52 PM Chamikara Jayalath via dev
>>  wrote:
>> >
>> > Hi everyone,
>> > Please review and vote on the release candidate #2 for the version
>> 2.43.0, as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific comments)
>> >
>> >
>> > Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if
>> > no issues are found.
>> >
>> > The complete staging area is available for your review, which includes:
>> > * GitHub Release notes [1],
>> > * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag "v2.43.0-RC2" [5],
>> > * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> > * Java artifacts were built with Gradle 7.5.1 and openjdk version
>> 1.8.0_181-google-v7.
>> > * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> > * Go artifacts and documentation are available at pkg.go.dev [9]
>> > * Validation sheet with a tab for 2.43.0 release to help with
>> validation [10].
>> > * Docker images published to Docker Hub [11].
>> >
>> > The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>> >
>> > For guidelines on how to try the release in your projects, check out
>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>> >
>> > Thanks,
>> > Cham
>> >
>> > [1] https://github.com/apache/beam/milestone/5
>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1288/
>> > [5] https://github.com/apache/beam/tree/v2.43.0-RC2
>> > [6] https://github.com/apache/beam/pull/24044
>> > [7] https://github.com/apache/beam-site/pull/636
>> > [8] https://pypi.org/project/apache-beam/2.43.0rc2/
>> > [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
>> > [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>
>


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Chamikara Jayalath via dev
On Tue, Nov 15, 2022 at 12:52 PM Ahmed Abualsaud 
wrote:

> Schema-aware transforms are not restricted to I/Os. An arbitrary transform
>> can be a Schema-Transform.  Also, designation Read/Write does not map to an
>> arbitrary transform. Probably we should try to make this more generic ?
>>
>
> Agreed, I suggest keeping everything on the left side of the name unique
> to the transform, so that the right side is consistently SchemaTransform
> | SchemaTransformProvider | SchemaTransformConfiguration. What do others
> think?
>

Sgtm. I don't think we should enforce class names though but it's good to
have a recommendation.


>
> Also, probably what's more important is the identifier of the
>> SchemaTransformProvider being unique.
>
> FWIW, we came up with a similar generic URN naming scheme for
>> cross-language transforms:
>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>
>
> The URN convention in that link looks good, it may be a good idea to
> replace transform with schematransform in the URN in this case to make a
> distinction. ie.
> beam:schematransform:org.apache.beam:kafka_read_with_metadata:v1. I will
> mention this in the other thread when I go over the comments in the
> Supporting SchemaTransforms doc [1].
>

+1 for replacing "transform" with "schematransform" to prevent URN
conflicts (even though these are not exactly in the same category).

Thanks,
Cham


>
> [1]
>
>  Supporting existing connectors with SchemaTrans...
> <https://docs.google.com/document/d/1qW9O3VxdGxUM887TdwhD1iH9AdNbpu0_wXbCGvFP0OM/edit?usp=drive_web>
>
>
> On Tue, Nov 15, 2022 at 3:41 PM John Casey via dev 
> wrote:
>
>> One distinction here is the difference between the URN for a provider /
>> transform, and the class name in Java.
>>
>> We should have a standard for both, but they are distinct
>>
>> On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>>
>>>
>>> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hello Everyone,
>>>>
>>>> Do we like the following Java class naming convention for
>>>> SchemaTransformProviders [1]?  The proposal is:
>>>>
>>>> (Read|Write)SchemaTransformProvider
>>>>
>>>>
>>>> *For those new to Beam, even if this is your first day, consider
>>>> yourselves a welcome contributor to this conversation.  Below are
>>>> definitions/references and a suggested learning guide to understand this
>>>> email.*
>>>>
>>>> Explanation
>>>>
>>>> The  identifies the Beam I/O [2] and Read or Write identifies a
>>>> read or write Ptransform, respectively.
>>>>
>>>
>>> Schema-aware transforms are not restricted to I/Os. An arbitrary
>>> transform can be a Schema-Transform.  Also, designation Read/Write does not
>>> map to an arbitrary transform. Probably we should try to make this more
>>> generic ?
>>>
>>> Also, probably what's more important is the identifier of the
>>> SchemaTransformProvider being unique. Note the class name (the latter is
>>> guaranteed to be unique if we follow the Java package naming guidelines).
>>>
>>> FWIW, we came up with a similar generic URN naming scheme for
>>> cross-language transforms:
>>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>> For example, to implement a SchemaTransformProvider [1] for
>>>> BigQueryIO.Write[7], would look like:
>>>>
>>>> BigQueryWriteSchemaTransformProvider
>>>>
>>>>
>>>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>>>> like like:
>>>>
>>>> PubsubReadSchemaTransformProvider
>>>>
>>>>
>>>> Definitions/References
>>>>
>>>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>>>> transforms using a language agnostic configuration.
>>>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>>>> functions as the configuration of that SchemaProvider.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>>>
>>>> [2] *Beam I/O*: PTransform for readi

Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Chamikara Jayalath via dev
On Tue, Nov 15, 2022 at 1:38 PM Reuven Lax via dev 
wrote:

> Out of curiosity, several IOs (including PubSub) already do support
> schemas. Are you planning on modifying those?
>

Schema-aware Transform is an overloaded term. I think this is about the
implementations of the following.
https://docs.google.com/document/d/1B-pxOjIA8Znl99nDRFEQMfr7VG91MZGfki2BPanjjZA/edit


>
> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
> dev@beam.apache.org> wrote:
>
>> Hello Everyone,
>>
>> Do we like the following Java class naming convention for
>> SchemaTransformProviders [1]?  The proposal is:
>>
>> (Read|Write)SchemaTransformProvider
>>
>>
>> *For those new to Beam, even if this is your first day, consider
>> yourselves a welcome contributor to this conversation.  Below are
>> definitions/references and a suggested learning guide to understand this
>> email.*
>>
>> Explanation
>>
>> The  identifies the Beam I/O [2] and Read or Write identifies a
>> read or write Ptransform, respectively.
>>
>> For example, to implement a SchemaTransformProvider [1] for
>> BigQueryIO.Write[7], would look like:
>>
>> BigQueryWriteSchemaTransformProvider
>>
>>
>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>> like like:
>>
>> PubsubReadSchemaTransformProvider
>>
>>
>> Definitions/References
>>
>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>> transforms using a language agnostic configuration.
>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>> functions as the configuration of that SchemaProvider.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>
>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>> sinks.
>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>
>> [3] *SchemaTransform*: An interface containing a buildTransform method
>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>
>> [4] *Row*: A Beam Row is a generic element of data whose properties are
>> defined by a Schema[5].
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>
>> [5] *Schema*: A description of expected field names and their data types.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>
>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>> PInput or POutput tagged by a String name.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>
>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>> BigQuery table.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>
>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
>> message payloads into a PCollection.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>>
>> Suggested Learning/Reading to understand this email
>>
>> 1. https://beam.apache.org/documentation/programming-guide/#overview
>> 2. https://beam.apache.org/documentation/programming-guide/#transforms
>> (Up to 4.1)
>> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
>> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>>
>


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Chamikara Jayalath via dev
On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> Do we like the following Java class naming convention for
> SchemaTransformProviders [1]?  The proposal is:
>
> (Read|Write)SchemaTransformProvider
>
>
> *For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  Below are
> definitions/references and a suggested learning guide to understand this
> email.*
>
> Explanation
>
> The  identifies the Beam I/O [2] and Read or Write identifies a
> read or write Ptransform, respectively.
>

Schema-aware transforms are not restricted to I/Os. An arbitrary transform
can be a Schema-Transform.  Also, designation Read/Write does not map to an
arbitrary transform. Probably we should try to make this more generic ?

Also, probably what's more important is the identifier of the
SchemaTransformProvider being unique. Note the class name (the latter is
guaranteed to be unique if we follow the Java package naming guidelines).

FWIW, we came up with a similar generic URN naming scheme for
cross-language transforms:
https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn

Thanks,
Cham


> For example, to implement a SchemaTransformProvider [1] for
> BigQueryIO.Write[7], would look like:
>
> BigQueryWriteSchemaTransformProvider
>
>
> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
> like:
>
> PubsubReadSchemaTransformProvider
>
>
> Definitions/References
>
> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
> transforms using a language agnostic configuration.
> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
> functions as the configuration of that SchemaProvider.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>
> [2] *Beam I/O*: PTransform for reading from or writing to sources and
> sinks.
> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>
> [3] *SchemaTransform*: An interface containing a buildTransform method
> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>
> [4] *Row*: A Beam Row is a generic element of data whose properties are
> defined by a Schema[5].
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>
> [5] *Schema*: A description of expected field names and their data types.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>
> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
> PInput or POutput tagged by a String name.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>
> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
> BigQuery table.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>
> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
> message payloads into a PCollection.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>
> Suggested Learning/Reading to understand this email
>
> 1. https://beam.apache.org/documentation/programming-guide/#overview
> 2. https://beam.apache.org/documentation/programming-guide/#transforms
> (Up to 4.1)
> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>


[VOTE] Release 2.43.0, release candidate #2

2022-11-13 Thread Chamikara Jayalath via dev
Hi everyone,
Please review and vote on the release candidate #2 for the version 2.43.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if
no issues are found.

The complete staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.43.0-RC2" [5],
* website pull request listing the release [6], the blog post [6], and
publishing the API reference manual [7].
* Java artifacts were built with Gradle 7.5.1 and openjdk version
1.8.0_181-google-v7.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI[8].
* Go artifacts and documentation are available at pkg.go.dev [9]
* Validation sheet with a tab for 2.43.0 release to help with validation
[10].
* Docker images published to Docker Hub [11].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at https://beam.apache.org/blog/validate-beam-release/.

Thanks,
Cham

[1] https://github.com/apache/beam/milestone/5
[2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1288/
[5] https://github.com/apache/beam/tree/v2.43.0-RC2
[6] https://github.com/apache/beam/pull/24044
[7] https://github.com/apache/beam-site/pull/636
[8] https://pypi.org/project/apache-beam/2.43.0rc2/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
[11] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: bhulette stepping back (for now)

2022-11-12 Thread Chamikara Jayalath via dev
Good luck with your next endeavor Brian! Thanks for all the contributions
to Beam (and hopefully more in the future when you have time :-) )

- Cham

On Fri, Nov 11, 2022 at 10:47 PM Moritz Mack  wrote:

> Also, thanks so much for all the great and through reviews! That was
> always much appreciated!
>
> All the best, Brian
>
>
>
> On 11.11.22, 23:23, "Ahmet Altay via dev"  wrote:
>
>
>
> Thank you for everything Brian! On Fri, Nov 11, 2022 at 11: 27 AM Austin
> Bennett  wrote: Thanks for everything you've done, @
> Bhulette@ apache. org!   On Fri, Nov 11, 2022 at 11: 01 AM Pablo Estrada
> via dev 
>
> Thank you for everything Brian!
>
>
>
> On Fri, Nov 11, 2022 at 11:27 AM Austin Bennett  wrote:
>
> Thanks for everything you've done, @bhule...@apache.org!
>
>
>
> On Fri, Nov 11, 2022 at 11:01 AM Pablo Estrada via dev <
> dev@beam.apache.org> wrote:
>
> I promised I wouldn't cry so I won't. Cya!
>
>
>
> On Fri, Nov 11, 2022 at 10:46 AM Robin Qiu via dev 
> wrote:
>
> Thanks for your contribution Brian! Hope you enjoy your new team!
>
>
>
> Best,
>
> Robin
>
>
>
> On Fri, Nov 11, 2022 at 10:27 AM Kenneth Knowles  wrote:
>
> Your contributions have been huge. You will be missed! But have a fabulous
> time with BigQuery. And thank you so much for letting us know [1]
>
>
>
> Kenn
>
>
>
> [1] See "stepping down considerately" from
> https://www.apache.org/foundation/policies/conduct.html
> 
>
>
>
> On Thu, Nov 10, 2022 at 4:00 PM Brian Hulette  wrote:
>
> Hi dev@beam,
>
>
>
> I just wanted to let the community know that I will be stepping back from
> Beam development for now. I'm switching to a different team within Google
> next week - I will be working on BigQuery.
>
>
>
> I'm removing myself from automated code review assignments [1], and won't
> actively monitor the beam lists anymore. That being said, I'm happy to
> contribute to discussions or code reviews when it would be particularly
> helpful, e.g. for anything relating to DataFrames/Schemas/SQL. I can always
> be reached at bhule...@apache.org, and @TheNeuralBit [2] on GitHub.
>
>
>
> Brian
>
>
>
> [1] https://github.com/apache/beam/pull/24108
> 
>
> [2] https://github.com/TheNeuralBit
> 
>
> *As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice.
> *
>
>
>


Re: [ANNOUNCE] New committer: Yi Hu

2022-11-11 Thread Chamikara Jayalath via dev
Contrats Yi!

On Thu, Nov 10, 2022 at 10:48 AM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> Great job Yi! I am happy to see your contributions recognized.
>
> On Thu, Nov 10, 2022 at 11:52 AM Yi Hu via dev 
> wrote:
>
>> Thank you for the help of you all over the time, and I am glad to
>> contribute and help with the community.
>>
>> Best,
>> Yi
>>
>> On Thu, Nov 10, 2022 at 11:29 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Congratulations! Well deserved!
>>>
>>> —
>>> Alexey
>>>
>>> On 9 Nov 2022, at 21:01, Tomo Suzuki via dev 
>>> wrote:
>>>
>>> Congratulations!
>>>
>>> On Wed, Nov 9, 2022 at 3:00 PM John Casey via dev 
>>> wrote:
>>>
 Congrats! this is well deserved YI

 On Wed, Nov 9, 2022 at 2:58 PM Austin Bennett <
 whatwouldausti...@gmail.com> wrote:

> Congrats, and Thanks, Yi!
>
> On Wed, Nov 9, 2022 at 11:24 AM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> I am with the Beam PMC on this, congratulations and very well
>> deserved, Yi!
>>
>> On Wed, Nov 9, 2022 at 11:08 AM Byron Ellis via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations!
>>>
>>> On Wed, Nov 9, 2022 at 11:00 AM Pablo Estrada via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 thanks Yi : D

 On Wed, Nov 9, 2022 at 10:47 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Congrats Yi! I've really appreciated the ways you've consistently
> taken responsibility for improving our team's infra and working 
> through
> sharp edges in the codebase that others have ignored. This is 
> definitely
> well deserved!
>
> Thanks,
> Danny
>
> On Wed, Nov 9, 2022 at 1:37 PM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations Yi!
>>
>> On Wed, Nov 9, 2022 at 1:35 PM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations Yi!
>>>
>>> On Wed, Nov 9, 2022 at 1:34 PM Ahmed Abualsaud via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats Yi!

 On Wed, Nov 9, 2022 at 1:33 PM Sachin Agarwal via dev <
 dev@beam.apache.org> wrote:

> Congratulations Yi!
>
> On Wed, Nov 9, 2022 at 10:32 AM Kenneth Knowles <
> k...@apache.org> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a
>> new committer: Yi Hu (y...@apache.org)
>>
>> Yi started contributing to Beam in early 2022. Yi's
>> contributions are very diverse! I/Os, performance tests, 
>> Jenkins, support
>> for Schema logical types. Not only code but a very large amount 
>> of code
>> review. Yi is also noted for picking up smaller issues that 
>> normally would
>> be left on the backburner and filing issues that he finds rather 
>> than
>> ignoring them.
>>
>> Considering their contributions to the project over this
>> timeframe, the Beam PMC trusts Yi with the responsibilities of a 
>> Beam
>> committer. [1]
>>
>> Thank you Yi! And we are looking to see more of your
>> contributions!
>>
>> Kenn, on behalf of the Apache Beam PMC
>>
>> [1]
>>
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>>
>>>


Re: [VOTE] Release 2.43.0, release candidate #1

2022-11-10 Thread Chamikara Jayalath via dev
Ack. Thanks for finding this.

- Cham

On Thu, Nov 10, 2022 at 5:42 PM Valentyn Tymofieiev 
wrote:

> -1.
> It looks like the format of Python wheels has changed.
> We should update the stager code and python container entrypoint code,
> otherwise we will have a 2 min pipeline start time regression on some
> runners.
> Opened https://github.com/apache/beam/issues/24110
>
> On Thu, Nov 10, 2022 at 11:10 AM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks folks.
>>
>> Blocking issues were https://github.com/apache/beam/issues/24065 and
>> https://github.com/apache/beam/pull/24041.
>>
>> I'll build RC2 when fixes are cherry-picked.
>>
>> This vote is now closed.
>>
>> - Cham
>>
>> On Thu, Nov 10, 2022 at 11:03 AM Anand Inguva 
>> wrote:
>>
>>> +1 (non-binding) validated Python SDK QuickStart, Beam RunInference
>>> examples on Direct and Dataflow Runner. Also, verified the Python 3.10
>>> artifacts.
>>>
>>>
>>> On Wed, Nov 9, 2022 at 1:40 PM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Ack. There's another potential cherry-pick here:
>>>> https://github.com/apache/beam/pull/24041
>>>>
>>>> This should not prevent validation against RC1 for any other potential
>>>> regressions.
>>>>
>>>> I'll build a RC2 when cherry-picks are in.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> On Wed, Nov 9, 2022 at 9:30 AM Ritesh Ghorse via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> The Dataframe wrapper in Go SDK is failing because of
>>>>> https://github.com/apache/beam/issues/24065. I have a PR here
>>>>> <https://github.com/apache/beam/pull/24066> to unblock the release.
>>>>> The current PR allows Dataframe wrapper to work as expected but proper fix
>>>>> should be added while merging RunInference wrapper.
>>>>>
>>>>> Thanks,
>>>>> Ritesh
>>>>>
>>>>>
>>>>> On Wed, Nov 9, 2022 at 8:40 AM Alexey Romanenko <
>>>>> aromanenko@gmail.com> wrote:
>>>>>
>>>>>> +1 (binding)
>>>>>>
>>>>>> Tested with  https://github.com/Talend/beam-samples/
>>>>>> (Java SDK v8 & v11, Spark 3 runner).
>>>>>>
>>>>>> ---
>>>>>> Alexey
>>>>>>
>>>>>> On 9 Nov 2022, at 01:38, Chamikara Jayalath via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>> Please review and vote on the release candidate #1 for the version
>>>>>> 2.43.0, as follows:
>>>>>> [ ] +1, Approve the release
>>>>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>>>>
>>>>>>
>>>>>> Reviewers are encouraged to test their own use cases with the release
>>>>>> candidate, and vote +1 if
>>>>>> no issues are found.
>>>>>>
>>>>>> The complete staging area is available for your review, which
>>>>>> includes:
>>>>>> * GitHub Release notes [1],
>>>>>> * the official Apache source release to be deployed to
>>>>>> dist.apache.org [2], which is signed with the key with fingerprint
>>>>>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>>> * source code tag "v2.43.0-RC1" [5],
>>>>>> * website pull request listing the release [6], the blog post [6],
>>>>>> and publishing the API reference manual [7].
>>>>>> * Java artifacts were built with Gradle 7.5.1 and openjdk version
>>>>>> 1.8.0_181-google-v7.
>>>>>> * Python artifacts are deployed along with the source release to the
>>>>>> dist.apache.org [2] and PyPI[8].
>>>>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>>>>> * Validation sheet with a tab for 2.43.0 release to help with
>>>>>> validation [10].
>>>>>> * Docker images published to Docker Hub [11].
>>>>>>
>>>>>> The vote will be open for at least 72 hours. It is adopted by
>>>>>> majority approval, with at least 3 PMC affirmative votes.
>>>>>>
>>>>>> For guidelines on how to try the release in your projects, check out
>>>>>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>>>>>>
>>>>>> Thanks,
>>>>>> Cham
>>>>>>
>>>>>> [1] https://github.com/apache/beam/milestone/5
>>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>>>> [4]
>>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1287/
>>>>>> [5] https://github.com/apache/beam/tree/v2.43.0-RC1
>>>>>> [6] https://github.com/apache/beam/pull/24044
>>>>>> [7] https://github.com/apache/beam-site/pull/635
>>>>>> [8] https://pypi.org/project/apache-beam/2.43.0rc1/
>>>>>> [9]
>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC1/go/pkg/beam
>>>>>> [10]
>>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>>>>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>>>>
>>>>>>
>>>>>>


Re: [VOTE] Release 2.43.0, release candidate #1

2022-11-10 Thread Chamikara Jayalath via dev
Thanks folks.

Blocking issues were https://github.com/apache/beam/issues/24065 and
https://github.com/apache/beam/pull/24041.

I'll build RC2 when fixes are cherry-picked.

This vote is now closed.

- Cham

On Thu, Nov 10, 2022 at 11:03 AM Anand Inguva 
wrote:

> +1 (non-binding) validated Python SDK QuickStart, Beam RunInference
> examples on Direct and Dataflow Runner. Also, verified the Python 3.10
> artifacts.
>
>
> On Wed, Nov 9, 2022 at 1:40 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Ack. There's another potential cherry-pick here:
>> https://github.com/apache/beam/pull/24041
>>
>> This should not prevent validation against RC1 for any other potential
>> regressions.
>>
>> I'll build a RC2 when cherry-picks are in.
>>
>> Thanks,
>> Cham
>>
>> On Wed, Nov 9, 2022 at 9:30 AM Ritesh Ghorse via dev 
>> wrote:
>>
>>> The Dataframe wrapper in Go SDK is failing because of
>>> https://github.com/apache/beam/issues/24065. I have a PR here
>>> <https://github.com/apache/beam/pull/24066> to unblock the release. The
>>> current PR allows Dataframe wrapper to work as expected but proper fix
>>> should be added while merging RunInference wrapper.
>>>
>>> Thanks,
>>> Ritesh
>>>
>>>
>>> On Wed, Nov 9, 2022 at 8:40 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> Tested with  https://github.com/Talend/beam-samples/
>>>> (Java SDK v8 & v11, Spark 3 runner).
>>>>
>>>> ---
>>>> Alexey
>>>>
>>>> On 9 Nov 2022, at 01:38, Chamikara Jayalath via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>> Hi everyone,
>>>> Please review and vote on the release candidate #1 for the version
>>>> 2.43.0, as follows:
>>>> [ ] +1, Approve the release
>>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>>
>>>>
>>>> Reviewers are encouraged to test their own use cases with the release
>>>> candidate, and vote +1 if
>>>> no issues are found.
>>>>
>>>> The complete staging area is available for your review, which includes:
>>>> * GitHub Release notes [1],
>>>> * the official Apache source release to be deployed to dist.apache.org
>>>> [2], which is signed with the key with fingerprint
>>>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>> * source code tag "v2.43.0-RC1" [5],
>>>> * website pull request listing the release [6], the blog post [6], and
>>>> publishing the API reference manual [7].
>>>> * Java artifacts were built with Gradle 7.5.1 and openjdk version
>>>> 1.8.0_181-google-v7.
>>>> * Python artifacts are deployed along with the source release to the
>>>> dist.apache.org [2] and PyPI[8].
>>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>>> * Validation sheet with a tab for 2.43.0 release to help with
>>>> validation [10].
>>>> * Docker images published to Docker Hub [11].
>>>>
>>>> The vote will be open for at least 72 hours. It is adopted by majority
>>>> approval, with at least 3 PMC affirmative votes.
>>>>
>>>> For guidelines on how to try the release in your projects, check out
>>>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> [1] https://github.com/apache/beam/milestone/5
>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>> [4]
>>>> https://repository.apache.org/content/repositories/orgapachebeam-1287/
>>>> [5] https://github.com/apache/beam/tree/v2.43.0-RC1
>>>> [6] https://github.com/apache/beam/pull/24044
>>>> [7] https://github.com/apache/beam-site/pull/635
>>>> [8] https://pypi.org/project/apache-beam/2.43.0rc1/
>>>> [9]
>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC1/go/pkg/beam
>>>> [10]
>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>>
>>>>
>>>>


Re: [VOTE] Release 2.43.0, release candidate #1

2022-11-09 Thread Chamikara Jayalath via dev
Ack. There's another potential cherry-pick here:
https://github.com/apache/beam/pull/24041

This should not prevent validation against RC1 for any other potential
regressions.

I'll build a RC2 when cherry-picks are in.

Thanks,
Cham

On Wed, Nov 9, 2022 at 9:30 AM Ritesh Ghorse via dev 
wrote:

> The Dataframe wrapper in Go SDK is failing because of
> https://github.com/apache/beam/issues/24065. I have a PR here
> <https://github.com/apache/beam/pull/24066> to unblock the release. The
> current PR allows Dataframe wrapper to work as expected but proper fix
> should be added while merging RunInference wrapper.
>
> Thanks,
> Ritesh
>
>
> On Wed, Nov 9, 2022 at 8:40 AM Alexey Romanenko 
> wrote:
>
>> +1 (binding)
>>
>> Tested with  https://github.com/Talend/beam-samples/
>> (Java SDK v8 & v11, Spark 3 runner).
>>
>> ---
>> Alexey
>>
>> On 9 Nov 2022, at 01:38, Chamikara Jayalath via dev 
>> wrote:
>>
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version
>> 2.43.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if
>> no issues are found.
>>
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.43.0-RC1" [5],
>> * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.5.1 and openjdk version
>> 1.8.0_181-google-v7.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.43.0 release to help with validation
>> [10].
>> * Docker images published to Docker Hub [11].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>
>> Thanks,
>> Cham
>>
>> [1] https://github.com/apache/beam/milestone/5
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1287/
>> [5] https://github.com/apache/beam/tree/v2.43.0-RC1
>> [6] https://github.com/apache/beam/pull/24044
>> [7] https://github.com/apache/beam-site/pull/635
>> [8] https://pypi.org/project/apache-beam/2.43.0rc1/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC1/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>
>>
>>


[VOTE] Release 2.43.0, release candidate #1

2022-11-08 Thread Chamikara Jayalath via dev
Hi everyone,
Please review and vote on the release candidate #1 for the version 2.43.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if
no issues are found.

The complete staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.43.0-RC1" [5],
* website pull request listing the release [6], the blog post [6], and
publishing the API reference manual [7].
* Java artifacts were built with Gradle 7.5.1 and openjdk version
1.8.0_181-google-v7.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI[8].
* Go artifacts and documentation are available at pkg.go.dev [9]
* Validation sheet with a tab for 2.43.0 release to help with validation
[10].
* Docker images published to Docker Hub [11].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at https://beam.apache.org/blog/validate-beam-release/.

Thanks,
Cham

[1] https://github.com/apache/beam/milestone/5
[2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1287/
[5] https://github.com/apache/beam/tree/v2.43.0-RC1
[6] https://github.com/apache/beam/pull/24044
[7] https://github.com/apache/beam-site/pull/635
[8] https://pypi.org/project/apache-beam/2.43.0rc1/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC1/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
[11] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-11-08 Thread Chamikara Jayalath via dev
External artifacts have been built. I'm working on getting Dataflow
containers and documentation setup so should be out for review/vote soon :)

On Tue, Nov 8, 2022 at 12:22 PM Ahmet Altay  wrote:

> Any progress on the RC? Any blockers we can help with?
>
> On Fri, Nov 4, 2022 at 9:05 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Update:
>>
>> RC creation is still ongoing. I hope to get it out for review early next
>> week
>>
>> Thanks,
>> Cham
>>
>>
>>
>> On Fri, Nov 4, 2022 at 9:14 AM Ahmet Altay  wrote:
>>
>>> Thank you Cham!
>>>
>>> On Thu, Nov 3, 2022 at 10:54 PM Chamikara Jayalath 
>>> wrote:
>>>
>>>> Fix for the blocking issue was cherry-picked today. So I hope to build
>>>> the RC1 tomorrow.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> On Thu, Nov 3, 2022 at 8:19 PM Ahmet Altay  wrote:
>>>>
>>>>> How is the release coming along? Do you need any help?
>>>>>
>>>>> On Mon, Oct 31, 2022 at 1:58 PM Chamikara Jayalath via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> Update:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I've been validating the release branch by running all Jenkins test
>>>>>> suites on it (as required by the release guide). This revealed two new
>>>>>> potential issues. I added these to the release milestone [1]. Please
>>>>>> comment on these issues if you are familiar with the errors (for example,
>>>>>> if they are known issues from a previous release). We can continue the
>>>>>> release once these are resolved or moved out of the 2.43.0 release
>>>>>> milestone.
>>>>>>
>>>>>> Thanks,
>>>>>> Cham
>>>>>>
>>>>>> [1] https://github.com/apache/beam/milestone/5
>>>>>>
>>>>>> On Wed, Oct 26, 2022 at 12:42 PM Chamikara Jayalath <
>>>>>> chamik...@google.com> wrote:
>>>>>>
>>>>>>> Update:
>>>>>>>
>>>>>>> All blocking issues have either been addressed or pushed to the next
>>>>>>> release. I'll go ahead and create the first RC.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Cham
>>>>>>>
>>>>>>> On Thu, Oct 20, 2022 at 9:41 AM Chamikara Jayalath <
>>>>>>> chamik...@google.com> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> The release branch was cut:
>>>>>>>> https://github.com/apache/beam/tree/release-2.43.0
>>>>>>>>
>>>>>>>> We currently have three open blockers in the release milestone:
>>>>>>>> https://github.com/apache/beam/milestone/5
>>>>>>>>
>>>>>>>> I'll look into cherry-picking fixes for these and hopefully
>>>>>>>> creating a RC early next week.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Cham
>>>>>>>>
>>>>>>>> On Wed, Oct 5, 2022 at 3:25 PM Ahmet Altay 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 - Thank you Cham!
>>>>>>>>>
>>>>>>>>> On Wed, Oct 5, 2022 at 1:38 PM Chamikara Jayalath via dev <
>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> The next (2.43.0) release branch cut is scheduled for October
>>>>>>>>>> 19th, according to the release calendar [1].
>>>>>>>>>>
>>>>>>>>>> I would like to volunteer myself to do this release. My plan is
>>>>>>>>>> to cut the branch on that date, and cherrypick release-blocking fixes
>>>>>>>>>> afterwards, if any.
>>>>>>>>>>
>>>>>>>>>> Please help me make sure the release goes smoothly by:
>>>>>>>>>> - Making sure that any unresolved release blocking issues for
>>>>>>>>>> 2.43.0 should have their "Milestone" marked as "2.43.0 Release"
>>>>>>>>>> [2] as soon as possible.
>>>>>>>>>> - Reviewing the current release blockers [2] and removing the
>>>>>>>>>> Milestone if they don't meet the criteria at [3].
>>>>>>>>>>
>>>>>>>>>> Let me know if you have any comments/objections/questions.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Cham
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>>>>>>>>>> [2] https://github.com/apache/beam/milestone/5
>>>>>>>>>> [3] https://beam.apache.org/contribute/release-blocking/
>>>>>>>>>>
>>>>>>>>>


Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-11-04 Thread Chamikara Jayalath via dev
Update:

RC creation is still ongoing. I hope to get it out for review early next
week

Thanks,
Cham



On Fri, Nov 4, 2022 at 9:14 AM Ahmet Altay  wrote:

> Thank you Cham!
>
> On Thu, Nov 3, 2022 at 10:54 PM Chamikara Jayalath 
> wrote:
>
>> Fix for the blocking issue was cherry-picked today. So I hope to build
>> the RC1 tomorrow.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Nov 3, 2022 at 8:19 PM Ahmet Altay  wrote:
>>
>>> How is the release coming along? Do you need any help?
>>>
>>> On Mon, Oct 31, 2022 at 1:58 PM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Update:
>>>>
>>>> Hi All,
>>>>
>>>> I've been validating the release branch by running all Jenkins test
>>>> suites on it (as required by the release guide). This revealed two new
>>>> potential issues. I added these to the release milestone [1]. Please
>>>> comment on these issues if you are familiar with the errors (for example,
>>>> if they are known issues from a previous release). We can continue the
>>>> release once these are resolved or moved out of the 2.43.0 release
>>>> milestone.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> [1] https://github.com/apache/beam/milestone/5
>>>>
>>>> On Wed, Oct 26, 2022 at 12:42 PM Chamikara Jayalath <
>>>> chamik...@google.com> wrote:
>>>>
>>>>> Update:
>>>>>
>>>>> All blocking issues have either been addressed or pushed to the next
>>>>> release. I'll go ahead and create the first RC.
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> On Thu, Oct 20, 2022 at 9:41 AM Chamikara Jayalath <
>>>>> chamik...@google.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> The release branch was cut:
>>>>>> https://github.com/apache/beam/tree/release-2.43.0
>>>>>>
>>>>>> We currently have three open blockers in the release milestone:
>>>>>> https://github.com/apache/beam/milestone/5
>>>>>>
>>>>>> I'll look into cherry-picking fixes for these and hopefully creating
>>>>>> a RC early next week.
>>>>>>
>>>>>> Thanks,
>>>>>> Cham
>>>>>>
>>>>>> On Wed, Oct 5, 2022 at 3:25 PM Ahmet Altay  wrote:
>>>>>>
>>>>>>> +1 - Thank you Cham!
>>>>>>>
>>>>>>> On Wed, Oct 5, 2022 at 1:38 PM Chamikara Jayalath via dev <
>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> The next (2.43.0) release branch cut is scheduled for October
>>>>>>>> 19th, according to the release calendar [1].
>>>>>>>>
>>>>>>>> I would like to volunteer myself to do this release. My plan is to
>>>>>>>> cut the branch on that date, and cherrypick release-blocking fixes
>>>>>>>> afterwards, if any.
>>>>>>>>
>>>>>>>> Please help me make sure the release goes smoothly by:
>>>>>>>> - Making sure that any unresolved release blocking issues for 2.43.
>>>>>>>> 0 should have their "Milestone" marked as "2.43.0 Release" [2] as
>>>>>>>> soon as possible.
>>>>>>>> - Reviewing the current release blockers [2] and removing the
>>>>>>>> Milestone if they don't meet the criteria at [3].
>>>>>>>>
>>>>>>>> Let me know if you have any comments/objections/questions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Cham
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>>>>>>>> [2] https://github.com/apache/beam/milestone/5
>>>>>>>> [3] https://beam.apache.org/contribute/release-blocking/
>>>>>>>>
>>>>>>>


Re: [ANNOUNCE] New committer: Ritesh Ghorse

2022-11-04 Thread Chamikara Jayalath via dev
Congrats, Ritesh!

On Fri, Nov 4, 2022 at 9:34 AM John Casey via dev 
wrote:

> Congrats!
>
> On Fri, Nov 4, 2022 at 10:36 AM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Congrats Ritesh!
>>
>> On Fri, Nov 4, 2022 at 10:29 AM Andy Ye via dev 
>> wrote:
>>
>>> Congrats Ritesh!
>>>
>>> On Fri, Nov 4, 2022 at 9:26 AM Kerry Donny-Clark via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congratulations Ritesh, I'm happy to see your hard work and community
 spirit recognized!

 On Fri, Nov 4, 2022 at 10:16 AM Jack McCluskey via dev <
 dev@beam.apache.org> wrote:

> Congrats Ritesh!
>
> On Thu, Nov 3, 2022 at 10:12 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Congrats Ritesh! This is definitely well deserved!
>>
>> On Thu, Nov 3, 2022 at 8:08 PM Robert Burke 
>> wrote:
>>
>>> Woohoo! Well done Ritesh! :D
>>>
>>> On Thu, Nov 3, 2022, 5:04 PM Anand Inguva via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congratulations Ritesh.

 On Thu, Nov 3, 2022 at 7:51 PM Yi Hu via dev 
 wrote:

> Congratulations Ritesh!
>
> On Thu, Nov 3, 2022 at 7:23 PM Byron Ellis via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations!
>>
>> On Thu, Nov 3, 2022 at 4:21 PM Austin Bennett <
>> whatwouldausti...@gmail.com> wrote:
>>
>>> Congratulations, and Thanks @riteshgho...@apache.org!
>>>
>>> On Thu, Nov 3, 2022 at 4:17 PM Sachin Agarwal via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats Ritesh!

 On Thu, Nov 3, 2022 at 4:16 PM Kenneth Knowles 
 wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Ritesh Ghorse (riteshgho...@apache.org)
>
> Ritesh started contributing to Beam in mid-2021 and has
> contributed immensely to bringin the Go SDK to fruition, in 
> addition to
> contributions to Java and Python and release validation.
>
> Considering their contributions to the project over this
> timeframe, the Beam PMC trusts Ritesh with the responsibilities 
> of a Beam
> committer. [1]
>
> Thank you Ritesh! And we are looking to see more of your
> contributions!
>
> Kenn, on behalf of the Apache Beam PMC
>
> [1]
>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>



Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-11-03 Thread Chamikara Jayalath via dev
Fix for the blocking issue was cherry-picked today. So I hope to build the
RC1 tomorrow.

Thanks,
Cham

On Thu, Nov 3, 2022 at 8:19 PM Ahmet Altay  wrote:

> How is the release coming along? Do you need any help?
>
> On Mon, Oct 31, 2022 at 1:58 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Update:
>>
>> Hi All,
>>
>> I've been validating the release branch by running all Jenkins test
>> suites on it (as required by the release guide). This revealed two new
>> potential issues. I added these to the release milestone [1]. Please
>> comment on these issues if you are familiar with the errors (for example,
>> if they are known issues from a previous release). We can continue the
>> release once these are resolved or moved out of the 2.43.0 release
>> milestone.
>>
>> Thanks,
>> Cham
>>
>> [1] https://github.com/apache/beam/milestone/5
>>
>> On Wed, Oct 26, 2022 at 12:42 PM Chamikara Jayalath 
>> wrote:
>>
>>> Update:
>>>
>>> All blocking issues have either been addressed or pushed to the next
>>> release. I'll go ahead and create the first RC.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Oct 20, 2022 at 9:41 AM Chamikara Jayalath 
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> The release branch was cut:
>>>> https://github.com/apache/beam/tree/release-2.43.0
>>>>
>>>> We currently have three open blockers in the release milestone:
>>>> https://github.com/apache/beam/milestone/5
>>>>
>>>> I'll look into cherry-picking fixes for these and hopefully creating a
>>>> RC early next week.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> On Wed, Oct 5, 2022 at 3:25 PM Ahmet Altay  wrote:
>>>>
>>>>> +1 - Thank you Cham!
>>>>>
>>>>> On Wed, Oct 5, 2022 at 1:38 PM Chamikara Jayalath via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> The next (2.43.0) release branch cut is scheduled for October 19th,
>>>>>> according to the release calendar [1].
>>>>>>
>>>>>> I would like to volunteer myself to do this release. My plan is to
>>>>>> cut the branch on that date, and cherrypick release-blocking fixes
>>>>>> afterwards, if any.
>>>>>>
>>>>>> Please help me make sure the release goes smoothly by:
>>>>>> - Making sure that any unresolved release blocking issues for 2.43.0 
>>>>>> should
>>>>>> have their "Milestone" marked as "2.43.0 Release" [2] as soon as
>>>>>> possible.
>>>>>> - Reviewing the current release blockers [2] and removing the
>>>>>> Milestone if they don't meet the criteria at [3].
>>>>>>
>>>>>> Let me know if you have any comments/objections/questions.
>>>>>>
>>>>>> Thanks,
>>>>>> Cham
>>>>>>
>>>>>> [1]
>>>>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>>>>>> [2] https://github.com/apache/beam/milestone/5
>>>>>> [3] https://beam.apache.org/contribute/release-blocking/
>>>>>>
>>>>>


Re: Support existing IOs with Schema Transforms

2022-11-03 Thread Chamikara Jayalath via dev
Thanks for writing this. Added some comments. We should also consider
documenting Schema-Aware Transform API and SchemaTransform authoring
process in the Beam programming guide.

- Cham

On Thu, Nov 3, 2022 at 12:56 PM Ritesh Ghorse via dev 
wrote:

> Thanks for writing this! Having schema transform supported IOs would ease
> up writing multi-language wrappers in Go SDK.
>
> On Thu, Nov 3, 2022 at 2:36 PM Sachin Agarwal via dev 
> wrote:
>
>> I think this is a great idea - making any many existing IOs as possible
>> available to developers in any language is a huge win (and helps reduce the
>> need to re-implement IOs on a language-by-language basis going forward).
>>
>> On Thu, Nov 3, 2022 at 11:25 AM Ahmed Abualsaud via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> There has been an effort to add SchemaTransform capabilities to our
>>> connectors to facilitate the use of multi-lang pipelines. I've drafted a
>>> document below that provides guidelines and examples of how to support IOs
>>> with SchemaTransforms. Please take a look and share your thoughts and
>>> suggestions!
>>>
>>>  Supporting existing connectors with SchemaTrans...
>>> 
>>>
>>>
>>> Best,
>>> Ahmed
>>>
>>


Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-10-31 Thread Chamikara Jayalath via dev
Update:

Hi All,

I've been validating the release branch by running all Jenkins test suites
on it (as required by the release guide). This revealed two new
potential issues. I added these to the release milestone [1]. Please
comment on these issues if you are familiar with the errors (for example,
if they are known issues from a previous release). We can continue the
release once these are resolved or moved out of the 2.43.0 release
milestone.

Thanks,
Cham

[1] https://github.com/apache/beam/milestone/5

On Wed, Oct 26, 2022 at 12:42 PM Chamikara Jayalath 
wrote:

> Update:
>
> All blocking issues have either been addressed or pushed to the next
> release. I'll go ahead and create the first RC.
>
> Thanks,
> Cham
>
> On Thu, Oct 20, 2022 at 9:41 AM Chamikara Jayalath 
> wrote:
>
>> Hi All,
>>
>> The release branch was cut:
>> https://github.com/apache/beam/tree/release-2.43.0
>>
>> We currently have three open blockers in the release milestone:
>> https://github.com/apache/beam/milestone/5
>>
>> I'll look into cherry-picking fixes for these and hopefully creating a RC
>> early next week.
>>
>> Thanks,
>> Cham
>>
>> On Wed, Oct 5, 2022 at 3:25 PM Ahmet Altay  wrote:
>>
>>> +1 - Thank you Cham!
>>>
>>> On Wed, Oct 5, 2022 at 1:38 PM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> The next (2.43.0) release branch cut is scheduled for October 19th,
>>>> according to the release calendar [1].
>>>>
>>>> I would like to volunteer myself to do this release. My plan is to cut
>>>> the branch on that date, and cherrypick release-blocking fixes afterwards,
>>>> if any.
>>>>
>>>> Please help me make sure the release goes smoothly by:
>>>> - Making sure that any unresolved release blocking issues for 2.43.0 should
>>>> have their "Milestone" marked as "2.43.0 Release" [2] as soon as
>>>> possible.
>>>> - Reviewing the current release blockers [2] and removing the Milestone
>>>> if they don't meet the criteria at [3].
>>>>
>>>> Let me know if you have any comments/objections/questions.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> [1]
>>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>>>> [2] https://github.com/apache/beam/milestone/5
>>>> [3] https://beam.apache.org/contribute/release-blocking/
>>>>
>>>


Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-10-26 Thread Chamikara Jayalath via dev
Update:

All blocking issues have either been addressed or pushed to the next
release. I'll go ahead and create the first RC.

Thanks,
Cham

On Thu, Oct 20, 2022 at 9:41 AM Chamikara Jayalath 
wrote:

> Hi All,
>
> The release branch was cut:
> https://github.com/apache/beam/tree/release-2.43.0
>
> We currently have three open blockers in the release milestone:
> https://github.com/apache/beam/milestone/5
>
> I'll look into cherry-picking fixes for these and hopefully creating a RC
> early next week.
>
> Thanks,
> Cham
>
> On Wed, Oct 5, 2022 at 3:25 PM Ahmet Altay  wrote:
>
>> +1 - Thank you Cham!
>>
>> On Wed, Oct 5, 2022 at 1:38 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> The next (2.43.0) release branch cut is scheduled for October 19th,
>>> according to the release calendar [1].
>>>
>>> I would like to volunteer myself to do this release. My plan is to cut
>>> the branch on that date, and cherrypick release-blocking fixes afterwards,
>>> if any.
>>>
>>> Please help me make sure the release goes smoothly by:
>>> - Making sure that any unresolved release blocking issues for 2.43.0 should
>>> have their "Milestone" marked as "2.43.0 Release" [2] as soon as
>>> possible.
>>> - Reviewing the current release blockers [2] and removing the Milestone
>>> if they don't meet the criteria at [3].
>>>
>>> Let me know if you have any comments/objections/questions.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>>> [2] https://github.com/apache/beam/milestone/5
>>> [3] https://beam.apache.org/contribute/release-blocking/
>>>
>>


Re: [PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-10-20 Thread Chamikara Jayalath via dev
Hi All,

The release branch was cut:
https://github.com/apache/beam/tree/release-2.43.0

We currently have three open blockers in the release milestone:
https://github.com/apache/beam/milestone/5

I'll look into cherry-picking fixes for these and hopefully creating a RC
early next week.

Thanks,
Cham

On Wed, Oct 5, 2022 at 3:25 PM Ahmet Altay  wrote:

> +1 - Thank you Cham!
>
> On Wed, Oct 5, 2022 at 1:38 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Hi all,
>>
>> The next (2.43.0) release branch cut is scheduled for October 19th,
>> according to the release calendar [1].
>>
>> I would like to volunteer myself to do this release. My plan is to cut
>> the branch on that date, and cherrypick release-blocking fixes afterwards,
>> if any.
>>
>> Please help me make sure the release goes smoothly by:
>> - Making sure that any unresolved release blocking issues for 2.43.0 should
>> have their "Milestone" marked as "2.43.0 Release" [2] as soon as
>> possible.
>> - Reviewing the current release blockers [2] and removing the Milestone
>> if they don't meet the criteria at [3].
>>
>> Let me know if you have any comments/objections/questions.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/5
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>


Re: [VOTE] Release 2.42.0, release candidate #2

2022-10-14 Thread Chamikara Jayalath via dev
+1 (binding)

Thanks,
Cham

On Fri, Oct 14, 2022 at 5:43 AM Alexey Romanenko 
wrote:

> +1 (binding)
>
> Tested with  https://github.com/Talend/beam-samples/
> (Java SDK v8 & v11, Spark 3 runner).
>
> ---
> Alexey
>
> On 14 Oct 2022, at 05:17, Ahmet Altay via dev  wrote:
>
> +1 (binding)
>
> Tested python quickstart examples on the direct runner. Thank you!
>
> On Thu, Oct 13, 2022 at 5:35 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (binding)
>>
>> Validated release artifacts and signatures. Tested a Python pipeline
>> on a clean install.
>>
>> On Thu, Oct 13, 2022 at 1:22 PM Ritesh Ghorse via dev
>>  wrote:
>> >
>> > +1 (non-binding)
>> > Validated Go SDK Quickstart on Direct and Dataflow runner.
>> >
>> > Thanks,
>> > Ritesh Ghorse
>> >
>> > On Thu, Oct 13, 2022 at 4:01 PM Pablo Estrada via dev <
>> dev@beam.apache.org> wrote:
>> >>
>> >> +1 (binding)
>> >>
>> >> I've validated local/unit tests for existing dataflow templates. They
>> look good!
>> >> Best
>> >> -P.
>> >>
>> >> On Thu, Oct 13, 2022 at 10:41 AM Ning Kang via dev <
>> dev@beam.apache.org> wrote:
>> >>>
>> >>> +1 Thank you, Robert!
>> >>>
>> >>> On Thu, Oct 13, 2022 at 12:47 AM Robert Burke 
>> wrote:
>> 
>>  Hi everyone,
>>  Please review and vote on the release candidate #2 for the version
>> 2.42.0, as follows:
>>  [ ] +1, Approve the release
>>  [ ] -1, Do not approve the release (please provide specific comments)
>> 
>>  Reviewers are encouraged to test their own use cases with the
>> release candidate, and vote +1 if no issues are found.
>> 
>>  The complete staging area is available for your review, which
>> includes:
>>  * GitHub Release notes [1],
>>  * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
>>  * all artifacts to be deployed to the Maven Central Repository [4],
>>  * source code tag "v2.42.0-RC2" [5],
>>  * website pull request listing the release [6], the blog post [6],
>> and publishing the API reference manual [7].
>>  * Java artifacts were built with Gradle 7.5.1 and AdoptOpen JDK
>> 1.8.0_292.
>>  * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI [8]
>>  * Go Package information and SDK RC [9]
>>  * Validation sheet with a tab for 2.42.0 release to help with
>> validation [10].
>>  * Docker images published to Docker Hub [11]. (Soon)
>> 
>>  The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>> 
>>  Updates from RC1 include a fix to SpannerIO backlog estimation [12]
>> and a fix to the BigQueryIO interpretation of coders on an internal flatten
>> [13]. Otherwise, previous validation should be unaffected.
>> 
>>  For guidelines on how to try the release in your projects, check out
>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>> 
>>  Thanks,
>>  Robert Burke
>>  2.42.0 Release Manager
>> 
>>  [1] https://github.com/apache/beam/milestone/4
>>  [2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/
>>  [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>  [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1286/
>>  [5] https://github.com/apache/beam/tree/v2.42.0-RC2
>>  [6] https://github.com/apache/beam/pull/23406
>>  [7] https://github.com/apache/beam-site/pull/634
>>  [8] https://pypi.org/project/apache-beam/2.42.0rc2/
>>  [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC2/go/pkg/beam
>>  [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
>>  [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>  [12] https://github.com/apache/beam/issues/23494
>>  [13] https://github.com/apache/beam/issues/23561
>> 
>>
>
>


Re: Java + Python Xlang pipeline

2022-10-09 Thread Chamikara Jayalath via dev
By default, it will use Docker. You can try to change the default
environment type using the option [1] but I'm not sure if other environment
types will work for Flink Java x-lang pipelines.

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/b94cff209cc8d1ae61cc916ff6b0b68561dc34c8/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L52

On Fri, Oct 7, 2022 at 10:26 PM Xiao Ma  wrote:

> Thank you  very muchfor the reply and  explaination. For the Java beam
> sdk, can it start as a worker pool, like the Python worker pool with
> --worker_pool option? Or the Java sdk doesn't have the external environment
> type, it has to be as docker started?
>
> Thank you.
>
> Matk
>
> On Sat, Oct 8, 2022 at 12:08 AM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Fri, Oct 7, 2022 at 6:29 PM Xiao Ma  wrote:
>>
>>> Hello,
>>>
>>> I would like to run a pipeline with Java as the main language and python
>>> transformation embedded. The beam pipeline is running on the flink cluster.
>>> Currently, I can run it with a taskmanager + java worker pool and a python
>>> worker pool. Could I ask if there is a way to run the java code on the task
>>> manager directly and keep the python worker pool?
>>>
>>> Current: taskmanager + java worker pool + python worker pool
>>> Desired: taskmanager + python worker pool
>>>
>>
>> Generally this is not possible. If the transform has to be executed on
>> the SDK side, the runner usually sets up an environment (for example, a
>> Docker container) with the corresponding SDK and executes the bundles with
>> the transform using the Beam Fn API.  Runners can choose to override this
>> by executing the transform within the runner itself, but you'll have to
>> modify the Flink runner to do this.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> Thank you very much.
>>>
>>> *Mark Ma*
>>>
>>> --
> Xiao Ma
> Geotab
> Software Developer, Data Engineering | B.Sc, M.Sc
> Direct +1 (416) 836 - 3541 <(416)%20836-3541>
> Toll-free  +1 (877) 436 - 8221 <(877)%20436-8221>
> Visit   www.geotab.com
> Twitter | Facebook | YouTube | LinkedIn
>


Re: Java + Python Xlang pipeline

2022-10-07 Thread Chamikara Jayalath via dev
On Fri, Oct 7, 2022 at 6:29 PM Xiao Ma  wrote:

> Hello,
>
> I would like to run a pipeline with Java as the main language and python
> transformation embedded. The beam pipeline is running on the flink cluster.
> Currently, I can run it with a taskmanager + java worker pool and a python
> worker pool. Could I ask if there is a way to run the java code on the task
> manager directly and keep the python worker pool?
>
> Current: taskmanager + java worker pool + python worker pool
> Desired: taskmanager + python worker pool
>

Generally this is not possible. If the transform has to be executed on the
SDK side, the runner usually sets up an environment (for example, a Docker
container) with the corresponding SDK and executes the bundles with the
transform using the Beam Fn API.  Runners can choose to override this by
executing the transform within the runner itself, but you'll have to modify
the Flink runner to do this.

Thanks,
Cham


>
> Thank you very much.
>
> *Mark Ma*
>
>


[PROPOSAL] Preparing for Apache Beam 2.43.0 Release

2022-10-05 Thread Chamikara Jayalath via dev
Hi all,

The next (2.43.0) release branch cut is scheduled for October 19th,
according to the release calendar [1].

I would like to volunteer myself to do this release. My plan is to cut the
branch on that date, and cherrypick release-blocking fixes afterwards, if
any.

Please help me make sure the release goes smoothly by:
- Making sure that any unresolved release blocking issues for 2.43.0 should
have their "Milestone" marked as "2.43.0 Release" [2] as soon as possible.
- Reviewing the current release blockers [2] and removing the Milestone if
they don't meet the criteria at [3].

Let me know if you have any comments/objections/questions.

Thanks,
Cham

[1]
https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com
[2] https://github.com/apache/beam/milestone/5
[3] https://beam.apache.org/contribute/release-blocking/


  1   2   3   4   5   6   >