Re: Getting started with Apache Beam Quest

2024-06-21 Thread Svetak Sundhar via dev
Hi Ronobir,

The quest is free now; could you navigate to
https://www.cloudskillsboost.google/catalog?qlcampaign=3l-event-90, and
search "Getting Started with Apache Beam" and try?

Let me know if you run into any issues,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Tue, Jun 11, 2024 at 8:23 PM Svetak Sundhar 
wrote:

> Hi Ronobir,
>
> We're working on enabling the free code again. I'll follow up again here
> once it's enabled.
>
> Thanks for your patience!
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Thu, May 30, 2024 at 4:57 PM Ronobir Das  wrote:
>
>> Hey All,
>>
>> Are you guys still running the Apache Beam Quest promotion for a free
>> access code? I'm just starting to learn about Dataflow and Beam and while
>> taking the Coursera course series on Dataflow I was looking at Youtube
>> materials on Apache Beam. I saw the Apache Beam youtube channel and the
>> Beam Quest 2023 video which led me to the blog post.
>> When I try to click on the link for the access code and sign in to my
>> Google Cloud account, after I go to the Getting Started with Apache Beam
>> quest it's saying it will cost 21 credits so I thought I would ask if you
>> guys are still running the promo.
>>
>> Thank you so much!
>>
>> Cordially,
>> Ron
>>
>> -
>> Ronobir Das
>>
>>


Re: Getting started with Apache Beam Quest

2024-06-11 Thread Svetak Sundhar via dev
Hi Ronobir,

We're working on enabling the free code again. I'll follow up again here
once it's enabled.

Thanks for your patience!


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, May 30, 2024 at 4:57 PM Ronobir Das  wrote:

> Hey All,
>
> Are you guys still running the Apache Beam Quest promotion for a free
> access code? I'm just starting to learn about Dataflow and Beam and while
> taking the Coursera course series on Dataflow I was looking at Youtube
> materials on Apache Beam. I saw the Apache Beam youtube channel and the
> Beam Quest 2023 video which led me to the blog post.
> When I try to click on the link for the access code and sign in to my
> Google Cloud account, after I go to the Getting Started with Apache Beam
> quest it's saying it will cost 21 credits so I thought I would ask if you
> guys are still running the promo.
>
> Thank you so much!
>
> Cordially,
> Ron
>
> -
> Ronobir Das
>
>


Re: Beam + Google Summer of Code 2024

2024-05-02 Thread Svetak Sundhar via dev
Welcome and welcome (back)!


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, May 2, 2024 at 2:06 AM Reeba Qureshi  wrote:

> Hello everyone!
>
> I'm really excited to be working with Apache Beam again! Looking forward
> to it!
>
> Thanks,
> Reeba
>
> On Thu, 2 May, 2024, 10:04 Ayush Pandey,  wrote:
>
>> Hi Danny,
>>
>> Thank you for the kind introduction. I really look forward to
>> collaborating with and learning from this amazing community.
>>
>>
>> Best Regards,
>> Ayush
>>
>>
>> On Wed, 1 May 2024 at 14:40, XQ Hu  wrote:
>>
>>> Welcome to Beam!
>>>
>>> On Wed, May 1, 2024 at 4:13 PM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hey everyone,

 It's my pleasure to announce 2 contributors have been accepted as GSoC
 students for Beam this year!

 Ayush Pandey will be working on a project to implement RAG example
 pipelines using Beam [1]. This will be a really valuable addition to Beam's
 ML offering, showing how users can leverage things like MLTransform and
 Enrichment for interacting with LLMs. @Jack McCluskey
  and I will be mentoring Ayush for this
 project.

 Reeba Qureshi will be working on adding new features to Beam Yaml,
 including onboarding new IOs and ML transforms [2]. This will help more
 fully round out our growing Yaml offering and should make low code
 pipelines even more attainable. Reeba also was a GSoC contributor last year
 [3] and we're really excited to have her back! @Jeff Kinard
  and I will be mentoring Reeba for this project.

 Welcome to the community Ayush, and welcome back Reeba!

 Thanks,
 Danny

 [1]
 https://docs.google.com/document/d/1M_8fvqKVBi68hQo_x1AMQ8iEkzeXTcSl0CwTH00cr80/edit#heading=h.mp9iumh7r8v
 [2]
 https://docs.google.com/document/d/1vXj1qhy0Asiosn3gFDgYVKYQs3Lsyj972klSv5_hfG8/edit
 [3] https://lists.apache.org/thread/5yb0jr41xg1xonlxr97p0o06mnk3ktbb

>>>


Re: [ANNOUNCE] New Committer: Svetak Sundhar

2024-02-13 Thread Svetak Sundhar via dev
Thanks everyone!! Looking forward to the continued collaboration :)


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Mon, Feb 12, 2024 at 9:58 PM Byron Ellis via dev 
wrote:

> Congrats Svetak!
>
> On Mon, Feb 12, 2024 at 6:57 PM Shunping Huang via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations, Svetak!
>>
>> On Mon, Feb 12, 2024 at 9:50 PM XQ Hu via dev 
>> wrote:
>>
>>> Great job, Svetak! Thanks for all your contributions to Beam!!!
>>>
>>> On Mon, Feb 12, 2024 at 4:44 PM Valentyn Tymofieiev via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats, Svetak!

 On Mon, Feb 12, 2024 at 11:20 AM Kenneth Knowles 
 wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Svetak Sundhar (sve...@apache.org).
>
> Svetak has been with Beam since 2021. Svetak has contributed code to
> many areas of Beam, including notebooks, Beam Quest, dataframes, and IOs.
> We also want to especially highlight the effort Svetak has put into
> improving Beam's documentation, participating in release validation, and
> evangelizing Beam.
>
> Considering his contributions to the project over this timeframe, the
> Beam PMC trusts Svetak with the responsibilities of a Beam committer. [1]
>
> Thank you Svetak! And we are looking to see more of your contributions!
>
> Kenn, on behalf of the Apache Beam PMC
>
> [1]
>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>



Re: [VOTE] Release 2.54.0, release candidate #2

2024-02-08 Thread Svetak Sundhar via dev
+1 (Non-Binding)

Tested with Python SDK on DirectRunner and Dataflow Runner


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, Feb 8, 2024 at 12:45 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> +1 (binding)
>
> Tried out Java/Python multi-lang jobs and upgrading BQ/Kafka transforms
> from 2.53.0 to 2.54.0 using the Transform Service.
>
> Thanks,
> Cham
>
> On Wed, Feb 7, 2024 at 5:52 PM XQ Hu via dev  wrote:
>
>> +1 (non-binding)
>>
>> Validated with a simple RunInference Python pipeline:
>> https://github.com/google/dataflow-ml-starter/actions/runs/7821639833/job/21339032997
>>
>> On Wed, Feb 7, 2024 at 7:10 PM Yi Hu via dev  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Validated with Dataflow Template:
>>> https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1317
>>>
>>> Regards,
>>>
>>> On Wed, Feb 7, 2024 at 11:18 AM Ritesh Ghorse via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 (non-binding)

 Ran a few batch and streaming examples for Python SDK on Dataflow Runner

 Thanks!

 On Wed, Feb 7, 2024 at 4:08 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Validated Java SDK with Flink runner.
>
>  Jan
> On 2/7/24 06:23, Robert Burke via dev wrote:
>
> Hi everyone,
> Please review and vote on the release candidate #2 for the version
> 2.54.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if
> no issues are found. Only PMC member votes will count towards the final
> vote, but votes from all
> community members is encouraged and helpful for finding regressions;
> you
> can either test your own
> use cases [13] or use cases from the validation sheet [10].
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which is signed with the key with fingerprint D20316F712213422 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.54.0-RC2" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> * Validation sheet with a tab for 2.54.0 release to help with
> validation
> [10].
> * Docker images published to Docker Hub [11].
> * PR to run tests against release branch [12].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out
> our RC
> testing guide [13].
>
> Thanks,
> Robert Burke
> Beam 2.54.0 Release Manager
>
> [1] https://github.com/apache/beam/milestone/18?closed=1
> [2] https://dist.apache.org/repos/dist/dev/beam/2.54.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1368/
> [5] https://github.com/apache/beam/tree/v2.54.0-RC2
> [6] https://github.com/apache/beam/pull/30201
> [7] https://github.com/apache/beam-site/pull/659
> [8] https://pypi.org/project/apache-beam/2.54.0rc2/
> [9]
>
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.54.0-RC2/go/pkg/beam
> [10]
>
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=28763708
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> [12] https://github.com/apache/beam/pull/30104
> [13]
>
> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>
>


Re: [VOTE] Release 2.53.0, release candidate #2

2023-12-28 Thread Svetak Sundhar via dev
+1 (non binding)

Tested with Healthcare notebooks.


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, Dec 28, 2023 at 3:52 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Tested Java SDK with Flink Runner.
>
>  Jan
> On 12/27/23 14:13, Danny McCormick via dev wrote:
>
> +1 (non-binding)
>
> Tested with some example ML notebooks.
>
> Thanks,
> Danny
>
> On Tue, Dec 26, 2023 at 6:41 PM XQ Hu via dev  wrote:
>
>> +1 (non-binding)
>>
>> Tested with the simple RunInference pipeline:
>> https://github.com/google/dataflow-ml-starter/actions/runs/7332832875/job/19967521369
>>
>> On Tue, Dec 26, 2023 at 3:29 PM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Happy holidays everyone,
>>>
>>> Please review and vote on the release candidate #2 for the version
>>> 2.53.0, as follows:
>>>
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>>> count towards the final vote, but votes from all community members are
>>> encouraged and helpful for finding regressions; you can either test your
>>> own use cases [13] or use cases from the validation sheet [10].
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org [2],
>>> which is signed with the key with fingerprint DF3CBA4F3F4199F4
>>> (D20316F712213422 if automated) [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v1.2.3-RC3" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.53.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> RC testing guide [13].
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> [1] https://github.com/apache/beam/milestone/17
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.53.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1365/
>>> [5] https://github.com/apache/beam/tree/v2.53.0-RC2
>>> [6] https://github.com/apache/beam/pull/29856
>>> [7] https://github.com/apache/beam-site/pull/657
>>> [8] https://pypi.org/project/apache-beam/2.53.0rc2/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.53.0-RC2/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1290249774
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [12] https://github.com/apache/beam/pull/29758
>>> [13]
>>> https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md
>>>
>>>
>>> --
>>>
>>>
>>> Jack McCluskey
>>> SWE - DataPLS PLAT/ Dataflow ML
>>> RDU
>>> jrmcclus...@google.com
>>>
>>>
>>>


Re: Free access code for Beam Quest

2023-12-12 Thread Svetak Sundhar via dev
Hi Jose,

Thanks for reaching out. The free access code is currently paused and will
be resuming in early January, I'll follow up here once resumed.

As per utilizing the access code, once you create an account on Qwiklabs,
you'll be able to launch the labs in the quest with credits or with tokens
-- select "with credits".

Thanks, and do let me know if you have any questions,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Tue, Dec 12, 2023 at 12:14 PM Jose Valencia 
wrote:

> Hi!
>
> I attended the Beam College sessions in October 2023 but I didn't have the
> chance to do the Apache Beam Quest "Getting Started with Apache Beam" labs.
>
> We were advised to go to Getting started with Apache Beam: An open source
> proficiency credential sponsored by Google Cloud
>  and get an Access
> Code there. However the link "Access Code" actually is
> https://www.cloudskillsboost.google/catalog?qlcampaign=1h-swiss-19 where
> a list of courses is shown and I can't find a code.
>
> When I go into the "Getting Started with Apache Beam" with this gmail
> account and try to start a course I'm asked to buy tokens.
>
> Is it possible to get a Free Access Code? How would I use it?
>
> Thanks!
> Jose L. Valencia
>


Re: [Discuss] Idea to increase RC voting participation

2023-12-05 Thread Svetak Sundhar via dev
o those who created the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 23, 2023, 19:18 Robert Bradshaw via dev <
>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 23, 2023 at 7:26 AM Danny McCormick via dev <
>>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> So to summarize, I think there's broad consensus (or at least
>>>>>>>>>>>> lazy consensus) around the following:
>>>>>>>>>>>>
>>>>>>>>>>>> - (1) Updating our release email/guidelines to be more specific
>>>>>>>>>>>> about what we mean by release validation/how to be helpful during 
>>>>>>>>>>>> this
>>>>>>>>>>>> process. This includes both encouraging validation within each 
>>>>>>>>>>>> user's own
>>>>>>>>>>>> code base and encouraging people to document/share their process of
>>>>>>>>>>>> validation and link it in the release spreadsheet.
>>>>>>>>>>>> - (2) Doing something like what Airflow does (#29424
>>>>>>>>>>>> <https://github.com/apache/airflow/issues/29424>) and creating
>>>>>>>>>>>> an issue asking people who have contributed to the current release 
>>>>>>>>>>>> to help
>>>>>>>>>>>> validate their changes.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm also +1 on doing both of these. The first bit (updating our
>>>>>>>>>>>> guidelines) is relatively easy - it should just require updating
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#vote-and-validate-the-release-candidate
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> I took a look at the second piece (copying what Airflow does)
>>>>>>>>>>>> to see if we could just copy their automation, but it looks like 
>>>>>>>>>>>> it's
>>>>>>>>>>>> tied to airflow breeze
>>>>>>>>>>>> <https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/provider_issue_TEMPLATE.md.jinja2>
>>>>>>>>>>>> (their repo-specific automation tooling), so we'd probably need to 
>>>>>>>>>>>> build
>>>>>>>>>>>> the automation ourselves. It shouldn't be terrible, basically we'd 
>>>>>>>>>>>> want a
>>>>>>>>>>>> GitHub Action that compares the current release tag with the last 
>>>>>>>>>>>> release
>>>>>>>>>>>> tag, grabs all the commits in between, parses them to get the 
>>>>>>>>>>>> author, and
>>>>>>>>>>>> creates an issue with that data, but it does represent more effort 
>>>>>>>>>>>> than
>>>>>>>>>>>> just updating a markdown file. There might even be an existing 
>>>>>>>>>>>> Action that
>>>>>>>>>>>> can help with this, I haven't looked too hard.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I was thinking along the lines of a script that would scrape the
>>>>>>>>>>> issues resolved in a given release and add a comment to them noting 
>>>>>>>>>>> that
>>>>>>>>>>> the change is in release N and encouraging (with clear 
>>>>>>>>>>> instructions) how
>>>>>>>>>>> this can be validated. Creating a "validate this release" issue 
>>>>>>>>>>> with all
>>>>>>>>>>> "contributing" participants could be an interesting way to do this 
>>>>

Fwd: [VOTE] Release 2.52.0, release candidate #5

2023-11-17 Thread Svetak Sundhar via dev
Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



-- Forwarded message -
From: Danny McCormick via dev 
Date: Mon, Nov 13, 2023 at 6:07 PM
Subject: [VOTE] Release 2.52.0, release candidate #5
To: dev 
Cc: Danny McCormick 


Hi everyone,
Please review and vote on the release candidate #5 for the version 2.52.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if no issues are found. Only PMC member votes will
count towards the final vote, but votes from all community members is
encouraged and helpful for finding regressions; you can either test your
own use cases or use cases from the validation sheet [10].

The complete staging area is available for your review, which includes:

   - GitHub Release notes [1]
   - the official Apache source release to be deployed to dist.apache.org [2],
   which is signed with the key with fingerprint D20316F712213422 [3]
   - all artifacts to be deployed to the Maven Central Repository [4]
   - source code tag "v2.52.0-RC5" [5]
   - website pull request listing the release [6], the blog post [6], and
   publishing the API reference manual [7]
   - Python artifacts are deployed along with the source release to the
   dist.apache.org [2] and PyPI[8].
   - Go artifacts and documentation are available at pkg.go.dev [9]
   - Validation sheet with a tab for 2.52.0 release to help with validation
   [10]
   - Docker images published to Docker Hub [11]
   - PR to run tests against release branch [12]


The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at https://beam.apache.org/blog/validate-beam-release/.

Thanks,
Danny

[1] https://github.com/apache/beam/milestone/16
[2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1363/
[5] https://github.com/apache/beam/tree/v2.52.0-RC5
[6] https://github.com/apache/beam/pull/29331
[7] https://github.com/apache/beam-site/pull/655
[8] https://pypi.org/project/apache-beam/2.52.0rc5/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
[11] https://hub.docker.com/search?q=apache%2Fbeam=image
[12] https://github.com/apache/beam/pull/29418


Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-16 Thread Svetak Sundhar via dev
+1 (non binding)

validated on python use cases.


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Validated Java SDK with Flink runner on own use cases.
>
>   Jan
>
> On 11/15/23 11:35, Jean-Baptiste Onofré wrote:
> > +1 (binding)
> >
> > Quickly tested Java SDK and checked the legal part (hash, signatures,
> headers).
> >
> > Regards
> > JB
> >
> > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev
> >  wrote:
> >> Hi everyone,
> >> Please review and vote on the release candidate #5 for the version
> 2.52.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >>
> >> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases or use cases from the validation sheet [10].
> >>
> >> The complete staging area is available for your review, which includes:
> >>
> >> GitHub Release notes [1]
> >> the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint D20316F712213422 [3]
> >> all artifacts to be deployed to the Maven Central Repository [4]
> >> source code tag "v2.52.0-RC5" [5]
> >> website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7]
> >> Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> >> Go artifacts and documentation are available at pkg.go.dev [9]
> >> Validation sheet with a tab for 2.52.0 release to help with validation
> [10]
> >> Docker images published to Docker Hub [11]
> >> PR to run tests against release branch [12]
> >>
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >>
> >> For guidelines on how to try the release in your projects, check out
> our blog post at https://beam.apache.org/blog/validate-beam-release/.
> >>
> >> Thanks,
> >> Danny
> >>
> >> [1] https://github.com/apache/beam/milestone/16
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1363/
> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
> >> [6] https://github.com/apache/beam/pull/29331
> >> [7] https://github.com/apache/beam-site/pull/655
> >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
> >> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
> >> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> >> [12] https://github.com/apache/beam/pull/29418
>


Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-10 Thread Svetak Sundhar via dev
+1 Non Binding -- tested Python SDK batch.


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Fri, Nov 10, 2023 at 2:58 PM Danny McCormick via dev 
wrote:

> > Note: the release guide
> 
>  and blog post
> 
>  say
> the RC image has a tag "${RELEASE_VERSION}_rc{RC_NUM}", whereas the actual
> tags on Docker Hub are mostly "${RELEASE_VERSION}rc{RC_NUM}" without the
> "_" since 2.40.0. If this is the new standard we may want to update all
> places where this is stated?
>
> Yep, we should update! If you put up a PR I'm happy to approve :)
> otherwise I can loop it into my post release docs update.
>
> Thanks,
> Danny
>
> On Fri, Nov 10, 2023 at 2:00 PM Johanna Öjeling via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non-binding)
>>
>> Tested the Go SDK on Dataflow with own use cases.
>>
>> Note: the release guide
>> 
>>  and blog post
>> 
>>  say
>> the RC image has a tag "${RELEASE_VERSION}_rc{RC_NUM}", whereas the actual
>> tags on Docker Hub are mostly "${RELEASE_VERSION}rc{RC_NUM}" without the
>> "_" since 2.40.0. If this is the new standard we may want to update all
>> places where this is stated?
>>
>> Johanna
>>
>> On Fri, Nov 10, 2023 at 5:56 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (binding)
>>>
>>> Artifacts and signatures look good, validated one of the Python wheels
>>> in a fresh install.
>>>
>>> On Fri, Nov 10, 2023 at 7:23 AM Alexey Romanenko
>>>  wrote:
>>> >
>>> > +1 (binding)
>>> >
>>> > Java SDK with Spark runner
>>> >
>>> > —
>>> > Alexey
>>> >
>>> > On 9 Nov 2023, at 16:44, Ritesh Ghorse via dev 
>>> wrote:
>>> >
>>> > +1 (non-binding)
>>> >
>>> > Validated Python SDK quickstart batch and streaming.
>>> >
>>> > Thanks!
>>> >
>>> > On Thu, Nov 9, 2023 at 9:25 AM Jan Lukavský  wrote:
>>> >>
>>> >> +1 (binding)
>>> >>
>>> >> Validated Java SDK with Flink runner on own use cases.
>>> >>
>>> >>  Jan
>>> >>
>>> >> On 11/9/23 03:31, Danny McCormick via dev wrote:
>>> >>
>>> >> Hi everyone,
>>> >> Please review and vote on the release candidate #3 for the version
>>> 2.52.0, as follows:
>>> >> [ ] +1, Approve the release
>>> >> [ ] -1, Do not approve the release (please provide specific comments)
>>> >>
>>> >>
>>> >> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>>> count towards the final vote, but votes from all community members is
>>> encouraged and helpful for finding regressions; you can either test your
>>> own use cases or use cases from the validation sheet [10].
>>> >>
>>> >> The complete staging area is available for your review, which
>>> includes:
>>> >>
>>> >> GitHub Release notes [1]
>>> >> the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint D20316F712213422 [3]
>>> >> all artifacts to be deployed to the Maven Central Repository [4]
>>> >> source code tag "v2.52.0-RC3" [5]
>>> >> website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7]
>>> >> Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> >> Go artifacts and documentation are available at pkg.go.dev [9]
>>> >> Validation sheet with a tab for 2.52.0 release to help with
>>> validation [10]
>>> >> Docker images published to Docker Hub [11]
>>> >> PR to run tests against release branch [12]
>>> >>
>>> >>
>>> >> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>> >>
>>> >> For guidelines on how to try the release in your projects, check out
>>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> >>
>>> >> Thanks,
>>> >> Danny
>>> >>
>>> >> [1] https://github.com/apache/beam/milestone/16
>>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> >> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1361/
>>> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC3
>>> >> [6] https://github.com/apache/beam/pull/29331
>>> >> [7] https://github.com/apache/beam-site/pull/653
>>> >> [8] https://pypi.org/project/apache-beam/2.52.0rc2/
>>> >> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC3/go/pkg/beam
>>> >> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>>> >> [11] 

Re: [VOTE] Release 2.52.0, release candidate #2

2023-11-08 Thread Svetak Sundhar via dev
Thanks, Danny!

@all: Reminder that if there's anything you think that is worth documenting
while RC testing, please feel free to add it here

.

We can then use it to update
https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#vote-and-validate-the-release-candidate
.

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, Nov 8, 2023 at 9:04 AM Jean-Baptiste Onofré  wrote:

> +1 (binding)
>
> Regards
> JB
>
> On Wed, Nov 8, 2023 at 12:24 AM Danny McCormick via dev
>  wrote:
> >
> > Hi everyone,
> > Please review and vote on the release candidate #2 for the version
> 2.52.0, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases or use cases from the validation sheet [10].
> >
> > The complete staging area is available for your review, which includes:
> >
> > GitHub Release notes [1]
> > the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint D20316F712213422 [3]
> > all artifacts to be deployed to the Maven Central Repository [4]
> > source code tag "v2.52.0-RC1" [5]
> > website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7]
> > Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> > Go artifacts and documentation are available at pkg.go.dev [9]
> > Validation sheet with a tab for 2.52.0 release to help with validation
> [10]
> > Docker images published to Docker Hub [11]
> > PR to run tests against release branch [12]
> >
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >
> > For guidelines on how to try the release in your projects, check out our
> blog post at https://beam.apache.org/blog/validate-beam-release/.
> >
> > Thanks,
> > Danny
> >
> > [1] https://github.com/apache/beam/milestone/16
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1360/
> > [5] https://github.com/apache/beam/tree/v2.52.0-RC2
> > [6] https://github.com/apache/beam/pull/29331
> > [7] https://github.com/apache/beam-site/pull/652
> > [8] https://pypi.org/project/apache-beam/2.52.0rc2/
> > [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC2/go/pkg/beam
> > [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> > [12] https://github.com/apache/beam/pull/29319
>


Re: Credentials Rotation Failure on IO-Datastores cluster

2023-10-31 Thread Svetak Sundhar via dev
I took a quick look -- the error is the following:

*22:17:26* ERROR: (gcloud.container.clusters.update) ResponseError:
code=400, message=Operation
operation-1698804621818-e9c8fe33-d4a2-44cd-86aa-9c4e09dea259 is
currently upgrading cluster io-datastores. Please wait and try again
once it is done.




This is different than the last time this error happened
(https://lists.apache.org/thread/xw2hx8yycpfmhf64w0vyt96r0d8zwnyg)


I noticed node pool pool-1 was still updating when this error was
sent, so I think it should succeed now.


Should we retrigger the seed job manually?



Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Tue, Oct 31, 2023 at 10:17 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Something went wrong during the automatic credentials rotation for
> IO-Datastores Cluster, performed at Wed Nov 01 00:52:45 UTC 2023. It may be
> necessary to check the state of the cluster certificates. For further
> details refer to the following links:
>  * Failing job:
> https://ci-beam.apache.org/job/Rotate%20IO-Datastores%20Cluster%20Credentials/
>  * Job configuration:
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_IODatastoresCredentialsRotation.groovy
>  * Cluster URL:
> https://pantheon.corp.google.com/kubernetes/clusters/details/us-central1-a/io-datastores/details?mods=dataflow_dev=apache-beam-testing


Re: [Discuss] Idea to increase RC voting participation

2023-10-23 Thread Svetak Sundhar via dev
han having a more
>>>> general outline of how to install the RC and things to look out for when
>>>> testing.
>>>>
>>>> On Tue, Oct 17, 2023 at 12:55 PM Austin Bennett 
>>>> wrote:
>>>>
>>>>> Great effort.  I'm also interested in streamlining releases -- so if
>>>>> there are alot of manual tests that could be automated, would be great
>>>>> to discover and then look to address.
>>>>>
>>>>> On Tue, Oct 17, 2023 at 8:47 AM Robert Bradshaw via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> I would also strongly suggest that people try out the release against
>>>>>> their own codebases. This has the benefit of ensuring the release won't
>>>>>> break your own code when they go out, and stress-tests the new code 
>>>>>> against
>>>>>> real-world pipelines. (Ideally our own tests are all passing, and this
>>>>>> validation is automated as much as possible (though ensuring it matches 
>>>>>> our
>>>>>> documentation and works in a clean environment still has value), but
>>>>>> there's a lot of code and uses out there that we don't have access to
>>>>>> during normal Beam development.)
>>>>>>
>>>>>> On Tue, Oct 17, 2023 at 8:21 AM Svetak Sundhar via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I’ve participated in RC testing for a few releases and have observed
>>>>>>> a bit of a knowledge gap in how releases can be tested. Given that Beam
>>>>>>> encourages contributors to vote on RC’s regardless of tenure, and that
>>>>>>> voting on an RC is a relatively low-effort, high leverage way to 
>>>>>>> influence
>>>>>>> the release of the library, I propose the following:
>>>>>>>
>>>>>>> During the vote for the next release, voters can document the
>>>>>>> process they followed on a separate document, and add the link on 
>>>>>>> column G
>>>>>>> here
>>>>>>> <https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=437054928>.
>>>>>>> One step further, could be a screencast of running the test, and 
>>>>>>> attaching
>>>>>>> a link of that.
>>>>>>>
>>>>>>> We can keep repeating this through releases until we have
>>>>>>> documentation for many of the different tests. We can then add these 
>>>>>>> docs
>>>>>>> into the repo.
>>>>>>>
>>>>>>> I’m proposing this because I’ve gathered the following feedback from
>>>>>>> colleagues that are tangentially involved with Beam: They are 
>>>>>>> interested in
>>>>>>> participating in release validation, but don’t know how to get started.
>>>>>>> Happy to hear other suggestions too, if there are any to address the
>>>>>>> above.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Svetak Sundhar
>>>>>>>
>>>>>>>   Data Engineer
>>>>>>> s vetaksund...@google.com
>>>>>>>
>>>>>>>


[Discuss] Idea to increase RC voting participation

2023-10-17 Thread Svetak Sundhar via dev
Hi all,

I’ve participated in RC testing for a few releases and have observed a bit
of a knowledge gap in how releases can be tested. Given that Beam
encourages contributors to vote on RC’s regardless of tenure, and that
voting on an RC is a relatively low-effort, high leverage way to influence
the release of the library, I propose the following:

During the vote for the next release, voters can document the process they
followed on a separate document, and add the link on column G here
.
One step further, could be a screencast of running the test, and attaching
a link of that.

We can keep repeating this through releases until we have documentation for
many of the different tests. We can then add these docs into the repo.

I’m proposing this because I’ve gathered the following feedback from
colleagues that are tangentially involved with Beam: They are interested in
participating in release validation, but don’t know how to get started.
Happy to hear other suggestions too, if there are any to address the
above.

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com


Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-03 Thread Svetak Sundhar via dev
+1 Non Binding

Tested Python Direct Runner and Dataflow Runner as well.

On the spreadsheet, I came across "Dataflow v1 (until 2.49.0, inclusive)",
and do not fully understand what this means.

Does this mean
(1) we shouldn't be testing on Dataflow runner v1 for releases after 2.49 or
(2) make sure we test on runner v1 for this release?

Thanks in advance for the clarification,



Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Tue, Oct 3, 2023 at 2:14 PM Danny McCormick via dev 
wrote:

> +1 (non-binding)
>
> Tested python/ML execution with
> https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_huggingface.ipynb
> (interactive runner) and
> https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/automatic_model_refresh.ipynb
> (Dataflow runner).
>
> Thanks,
> Danny
>
> On Tue, Oct 3, 2023 at 1:58 PM Kenneth Knowles  wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version
>> 2.51.0, as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>> count towards the final vote, but votes from all community members is
>> encouraged and helpful for finding regressions; you can either test your
>> own use cases or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which includes:
>>
>>- GitHub Release notes [1],
>>- the official Apache source release to be deployed to dist.apache.org
>>[2], which is signed with the key with fingerprint  [3],
>>- all artifacts to be deployed to the Maven Central Repository [4],
>>- source code tag "v1.2.3-RC3" [5],
>>- website pull request listing the release [6], the blog post [6],
>>and publishing the API reference manual [7].
>>- Java artifacts were built with Gradle GRADLE_VERSION and
>>OpenJDK/Oracle JDK JDK_VERSION.
>>- Python artifacts are deployed along with the source release to the
>>dist.apache.org [2] and PyPI[8].
>>- Go artifacts and documentation are available at pkg.go.dev [9]
>>- Validation sheet with a tab for 1.2.3 release to help with
>>validation [10].
>>- Docker images published to Docker Hub [11].
>>- PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>
>> Thanks,
>> Kenn
>>
>> [1] https://github.com/apache/beam/milestone/15
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.51.0
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1356/
>> [5] https://github.com/apache/beam/tree/v2.51.0-RC1
>> [6] https://github.com/apache/beam/pull/28800
>> [7] https://github.com/apache/beam-site/pull/649
>> [8] https://pypi.org/project/apache-beam/2.51.0rc1/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.51.0-RC1/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=437054928
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/28663
>>
>


Dicom IO De-ID Transform

2023-08-31 Thread Svetak Sundhar via dev
Hi Beam Community,

I have written up a design

for
implementing de-identification in our existing Dicom IO (Java SDK), and
would appreciate any thoughts or feedback!

A tl;dr on why this is important:

DICOM is the international standard to communicate and manage medical
images and data. Oftentimes, a common need in healthcare is to
"de-identify" the data, or scrub it of PHI information. The Google Cloud
Healthcare API supports de-identifying DICOM stores
,
and this proposal seeks to leverage that.

I've prototyped a PR here .

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com


Re: [PROPOSAL] Design Doc template for PTransforms

2023-08-24 Thread Svetak Sundhar via dev
Thanks Kenn! I put up a PR  to
add this reference to the Beam website.

I'm actively using the template now, and one piece of feedback I have is to
include a section for prototyped work. Maybe a table that allows the user
to post links to Github prototype PRs for the options they are proposing?

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, Aug 24, 2023 at 11:15 AM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> Thanks Kenn! I think this would be a great community resource. While we
> don't want to enforce usage, perhaps we could introduce tooling to check
> basic compliance and raise a warning. Similar to a linter or test suite.
> Kerry
>
> On Thu, Aug 24, 2023 at 10:25 AM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> Based on some work I've been doing internally, I put together a public
>> version of a design doc template for PTransforms.
>>
>> https://s.apache.org/ptransform-design-doc
>>
>> A major goal is to be explicit about important questions that make a
>> transform robust:
>>
>>  - what are "all" the parameters to a transform?
>>  - how could a transform fail?
>>  - how could we monitor or measure the transform?
>>  - how could we use a transform in a new context like YAML or a new SDK?
>>
>> All of these together add up to a PTransform being a more self-contained
>> piece of software that can be understood and used in novel ways, instead of
>> just defined by the code and behavior that may accrete over time tightly
>> coupled to the SDK it was written with.
>>
>> LMK what you think. Of course, I can't force anyone to use it or not use
>> it, except for my team internal to my employer :-)
>>
>> Kenn
>>
>


Re: Mechanism for "Beam Website Feedback"

2023-08-14 Thread Svetak Sundhar via dev
Hi Ahmet,

I'm +1 on the idea-- one clarification question:

Do you propose that when feedback is sent, it gets forwarded to the dev
list? If not, we will need to ensure that the backend (eg a Google sheet)
is monitored.



Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Mon, Aug 14, 2023 at 4:29 PM Ahmet Altay via dev 
wrote:

> Hi all,
>
> We regularly get emails with "Subject: Beam Website Feedback", they are
> filtered out before they reach this mailing list. I believe the reason for
> that is people have some feedback to share, but clicking the feedback will
> open the default email application and they will be surprised and close it.
> Sometimes they close it by sending us an empty email.
>
> My proposal -  We can change the feedback button with a simple embedded
> form (a text box + submit button). We can use something like google forms
> to implement this without making a more complex backend change.
>
> For reference, this is a partial and it is implemented here [1].
>
> What do you think?
>
> Ahmet
>
> [1]
> https://github.com/apache/beam/blob/bbaa7ebd3eec614832d76cfc577858638a96a11d/website/www/site/layouts/partials/feedback.html#L21
>


Re: Why there are no certifications for Apache Beam

2023-07-27 Thread Svetak Sundhar via dev
Hello Abhishek,

Thank you for your interest. We do have a skill badge you can earn with
Apache Beam (https://www.cloudskillsboost.google/quests/310).

You can use this access code
 (to
earn this for free), search "Getting Started with Apache Beam", and then
complete the labs. Please reach out if you have any questions or run into
any blockers.

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, Jul 27, 2023 at 3:08 PM Abhishek Patre 
wrote:

> Hello Team,
>
> I hope this email finds you well. I was wondering if you could provide
> some clarity on the availability of certification for Apache Beam. I
> apologize if this isn't the appropriate mailing list for this inquiry.
> Having a certification option for Apache Beam would be incredibly
> beneficial in terms of showcasing our skills and expertise.
>
> Thank you for your time and assistance.
>
> Regards
> Abhishek Patre
>


Re: [VOTE] Release 2.49.0, release candidate #2

2023-07-13 Thread Svetak Sundhar via dev
+1 (Non-Binding)

Python quickstart Dataflow runner.


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, Jul 13, 2023 at 5:03 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Tested Java SDK with FlinkRunner.
>
>  Jan
> On 7/13/23 02:30, Bruno Volpato via dev wrote:
>
> +1 (non-binding).
>
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
> (Java SDK 11, Dataflow runner).
>
> Thanks Yi!
>
> On Tue, Jul 11, 2023 at 4:23 PM Yi Hu via dev  wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version
>> 2.49.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if
>> no issues are found. Only PMC member votes will count towards the final
>> vote, but votes from all
>> community members is encouraged and helpful for finding regressions; you
>> can either test your own
>> use cases or use cases from the validation sheet [10].
>>
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with
>> fingerprint either CB6974C8170405CB (y...@apache.org) or D20316F712213422
>> (GitHub Action automated) [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.49.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle
>> JDK JDK_VERSION.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI [8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.49.0 release to help with validation
>> [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at /blog/validate-beam-release/.
>>
>> Thanks,
>> Release Manager
>>
>> [1] https://github.com/apache/beam/milestone/13
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.49.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1349/
>> [5] https://github.com/apache/beam/tree/v2.49.0-RC2
>> [6] https://github.com/apache/beam/pull/27374 (unchanged since RC1)
>> [7] https://github.com/apache/beam-site/pull/646  (unchanged since RC1)
>> [8] https://pypi.org/project/apache-beam/2.49.0rc2/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.49.0-RC2/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=934901728
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [12] https://github.com/apache/beam/pull/27307
>>
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>>
>>


Re: [DISCUSS] Enable Github Discussions?

2023-07-06 Thread Svetak Sundhar via dev
Thanks all for your perspectives, and Alexey for pointing out that we must
keep the dev and user lists regardless. Looks like we don't need to take
any further action here.

For the sake of discussion, I'd be curious to hear perspectives of those
that are part of the Airflow community or another community where both
user/dev lists and GH Discussions are used. Specifically, is there is any
sort of value add or is it just a maintenance burden?



Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, Jul 5, 2023 at 10:21 AM Jack McCluskey via dev 
wrote:

> Also going to be -1 on this one, I'm not sure we pick anything up from
> adding a forum apart from adding another place that needs to be checked.
>
> On Tue, Jul 4, 2023 at 4:03 AM Jan Lukavský  wrote:
>
>> -1
>>
>> Totally agree with Byron and Alexey.
>>
>>  Jan
>> On 7/3/23 21:18, Byron Ellis via dev wrote:
>>
>> -1. This just leads to needless fragmentation not to mention being at the
>> mercy of a specific technology provider.
>>
>> On Mon, Jul 3, 2023 at 11:39 AM XQ Hu via dev 
>> wrote:
>>
>>> +1 with GH discussion.
>>> If Airflow can do this https://github.com/apache/airflow/discussions, I
>>> think we can do this as well.
>>>
>>> On Mon, Jul 3, 2023 at 9:51 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
>>>> -1
>>>> I understand that for some people, who maybe are not very familiar with
>>>> ASF and its “Apache Way” [1], it may sound a bit obsolete but mailing lists
>>>> are one of the key things of every ASF project which Apache Beam is. Having
>>>> user@, dev@ and commits@ lists are required for ASF project to
>>>> maintain the open discussions that are publicly accessible and archived in
>>>> the same way for all ASF projects.
>>>>
>>>> I just wanted to remind a key motto at Apache Software Foundation is:
>>>>   *“If it didn't happen on the mailing list, it didn't happen.”*
>>>>
>>>> —
>>>> Alexey
>>>>
>>>> [1] https://apache.org/theapacheway/index.html
>>>>
>>>> On 1 Jul 2023, at 19:54, Anand Inguva via dev 
>>>> wrote:
>>>>
>>>> +1 for GitHub discussions as well. But I am also little concerned about
>>>> multiple places for discussions. As Danny said, if we have a good plan on
>>>> how to move forward on how/when to archive the current mailing list, that
>>>> would be great.
>>>>
>>>> Thanks,
>>>> Anand
>>>>
>>>> On Sat, Jul 1, 2023, 3:21 AM Damon Douglas 
>>>> wrote:
>>>>
>>>>> I'm very strong +1 for replacing the use of Email with GitHub
>>>>> Discussions. Thank you for bringing this up.
>>>>>
>>>>> On Fri, Jun 30, 2023 at 7:38 AM Danny McCormick via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> Thanks for starting this discussion!
>>>>>>
>>>>>> I'm a weak -1 for this proposal. While I think that GH Discussions
>>>>>> can be a good forum, I think most of the things that Discussions do are
>>>>>> covered by some combination of the dev/user lists and GitHub issues, and
>>>>>> the net outcome of this will be creating one more forum to pay attention
>>>>>> to. I know in the past we've had a hard time keeping up with Stack 
>>>>>> overflow
>>>>>> questions for a similar reason. With that said, I'm not opposed to trying
>>>>>> it out and experimenting as long as we have (a) clear criteria for
>>>>>> understanding if the change is effective or not (can be subjective), (b) 
>>>>>> a
>>>>>> clear idea of when we'd revisit the discussion, and (c) a clear path to
>>>>>> rollback the decision without it being *too *much work (this might
>>>>>> mean something like disabling future discussions and keeping the history 
>>>>>> or
>>>>>> somehow moving the history to the dev or user list). If we do this, I 
>>>>>> also
>>>>>> think we should update https://beam.apache.org/community/contact-us/
>>>>>> with a clear taxonomy of what goes where (this is what I'm unsure of
>>>>>> today).
>>>>>>
>>>>>> FWIW, if we were proposing cutting either the user list or both the
>>>>>> user and

[DISCUSS] Enable Github Discussions?

2023-06-30 Thread Svetak Sundhar via dev
Hi all,

I wanted to start a discussion to gauge interest on enabling Github
Discussions  in Apache
Beam.


Pros:

+ GH Discussions allows for folks to get unblocked on small/medium
implementation blocker (Google employees can often get this help by
scheduling a call with teammates whereas there is a larger barrier for
non-Google employees to get this help).

+ On the above point, more visibility into the development blockers that
others have previously faced.

+ GH Discussions is more discoverable and approachable for new users and
contributors.

+ A centralized place to have discussions. Long term, it makes sense to
eventually fully migrate to GH Discussions.


Cons:

- For a period of time when we use both the dev list and GH Discussions,
context can be confusing.

- Anything else?


To be clear, I’m not advocating that we move off the dev list immediately.
I propose that over time we slowly start moving discussions over to GH
discussions, utilizing things such as the poll feature.


I am aware that the Airflow project [1] uses both GH Discussions today and
a dev@ list [2] today.


[1] https://github.com/apache/airflow/discussions

[2] https://lists.apache.org/list.html?d...@airflow.apache.org

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com


[DISCUSS] Enable Github Discussions?

2023-06-29 Thread Svetak Sundhar via dev
Hi all,

I wanted to start a discussion to gauge interest on enabling Github
Discussions  in Apache
Beam.


Pros:

+ GH Discussions allows for folks to get unblocked on small/medium
implementation blocker (Google employees can often get this help by
scheduling a call with teammates whereas there is a larger barrier for
non-Google employees to get this help).

+ On the above point, more visibility into the development blockers that
others have previously faced.

+ GH Discussions is more discoverable and approachable for new users and
contributors.


Cons:

- When using both the dev list and GH Discussions, context can be
confusing.

- Anything else?


To be clear, I’m not advocating that we move off of the dev list. I propose
that over time we keep dev@ to vote on releases, and discuss the future of
Beam. GH Discussions would simply serve as a forum for troubleshooting, and
discussing smaller scale issues.


I am aware that the Airflow project [1] uses both GH Discussions today and
a dev@ list [2] today.


[1] https://github.com/apache/airflow/discussions

[2] https://lists.apache.org/list.html?d...@airflow.apache.org


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com


Re: [Launch Announcement] Beam Quest

2023-06-28 Thread Svetak Sundhar via dev
Hi,

Could you elaborate on your issue here? Are you running into an error?

Could you

1) Click on the free access code here

2) Create an account
3) Navigate to "Getting started with apache beam"
4) launch the lab with credits.


Let me know if you have any questions or run into any issues,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, Jun 28, 2023 at 2:15 AM Komi Marc ATSOU 
wrote:

> Please is the free access code working for someone? It is not working for
> me.
>
> Le ven. 9 juin 2023 à 19:12, Ahmet Altay via user 
> a écrit :
>
>> Thank you Svetak! I would encourage everyone to try out, and get the
>> badges :)
>>
>> On Fri, Jun 9, 2023 at 7:03 AM Svetak Sundhar via user <
>> u...@beam.apache.org> wrote:
>>
>>> Hi Beam Community,
>>>
>>> We're excited to launch the "Getting Started with Apache Beam" Quest
>>> . This quest provides a
>>> completion badge that can be shared on social media (such as Linkedin and
>>> Twitter) upon completion of four qwiklabs.
>>>
>>> These labs venture into various concepts of Beam in the Java and Python
>>> SDK (that many of you have developed), and should take less than 7 hours to
>>> obtain. I've written about it in our Beam Blog
>>> ; we are offering this free of
>>> charge till July 8.
>>>
>>> Please share the information with whomever you think may be interested,
>>> and please share on social media once you obtain your badge. Additionally,
>>> if you have any feedback on the labs, please contact me directly at
>>> svetaksund...@google.com-- we plan to have these labs evolve over time!
>>>
>>> I look forward to discussing this more at Beam Summit next week.
>>>
>>> As this was one of GCP's first OSS quests, there were many people
>>> instrumental in making this possible.
>>>
>>> Thanks to:
>>> -Danielle Syse
>>> -Ajay Hemnani
>>> -Joellen Saunders
>>> -Grzegorz Wierzchows
>>> -Ahmet Altay
>>> -XQ Hu
>>> -Jenny Palomino
>>> -Svetak Sundhar
>>> -Shunping Huang
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Svetak Sundhar
>>>
>>>   Data Engineer
>>> s vetaksund...@google.com
>>>
>>>
>
> --
>
> *ATSOU Komi Marc*
> *Cel : +33 7 58 92 30 35 <+33%207%2058%2092%2030%2035>*
> *Skype : bi-ayefo*
>


Re: DRAFT - Apache Beam Board Report - June 2023

2023-06-10 Thread Svetak Sundhar via dev
Added an item, thanks Kenn!


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Sat, Jun 10, 2023 at 1:01 AM Jean-Baptiste Onofré 
wrote:

> It looks good to me, thanks !
>
> Regards
> JB
>
> On Fri, Jun 9, 2023 at 11:28 PM Kenneth Knowles  wrote:
> >
> > Hi all,
> >
> > The next Beam board report is due next Wednesday, June 14. Please help
> me to draft it at https://s.apache.org/beam-draft-report-2023-06.
> >
> > Ideas:
> >
> >  - highlights from CHANGES.md
> >  - interesting technical discussions
> >  - integrations with other projects
> >  - community events
> >  - major user facing addition/deprecation
> >  - stuff that will be presented at Beam Summit next week :-)
> >
> > Past reports are at https://whimsy.apache.org/board/minutes/Beam.html
> for examples.
> >
> > I will edit the final version from everyone's suggestions.
> >
> > Thanks,
> >
> > Kenn
> >
> >
>


[Launch Announcement] Beam Quest

2023-06-09 Thread Svetak Sundhar via dev
Hi Beam Community,

We're excited to launch the "Getting Started with Apache Beam" Quest
. This quest provides a
completion badge that can be shared on social media (such as Linkedin and
Twitter) upon completion of four qwiklabs.

These labs venture into various concepts of Beam in the Java and Python SDK
(that many of you have developed), and should take less than 7 hours to
obtain. I've written about it in our Beam Blog
; we are offering this free of
charge till July 8.

Please share the information with whomever you think may be interested, and
please share on social media once you obtain your badge. Additionally, if
you have any feedback on the labs, please contact me directly at
svetaksund...@google.com-- we plan to have these labs evolve over time!

I look forward to discussing this more at Beam Summit next week.

As this was one of GCP's first OSS quests, there were many people
instrumental in making this possible.

Thanks to:
-Danielle Syse
-Ajay Hemnani
-Joellen Saunders
-Grzegorz Wierzchows
-Ahmet Altay
-XQ Hu
-Jenny Palomino
-Svetak Sundhar
-Shunping Huang

Thanks,



Svetak Sundhar

  Data Engineer
s vetaksund...@google.com


Re: [VOTE] Release 2.48.0 release candidate #2

2023-05-31 Thread Svetak Sundhar via dev
+1 (non-binding) (Python)


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, May 31, 2023 at 1:50 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> Nvm, I was running the Kafka cluster and the job in two different
> projects. It's working as expected.
>
> +1 (binding) for the release.
>
> Thanks,
> Cham
>
>
> On Tue, May 30, 2023 at 6:09 PM Chamikara Jayalath 
> wrote:
>
>> I'm seeing a potential regression when running Python x-lang Kafka jobs
>> on Datafllow.
>>
>>
>> https://pantheon.corp.google.com/dataflow/jobs/us-central1/2023-05-30_16_31_32-1219154560944228293;step=;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;logsSeverity=INFO;graphView=0?project=google.com:clouddfe=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))
>>
>> "Topic kafka_taxirides_realtime not present in metadata after 6 ms"
>>
>> Currently not sure if this is due to my Kafka cluster setup or not.
>>
>> Thanks,
>> Cham
>>
>>
>>
>>
>>
>> On Tue, May 30, 2023 at 5:52 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (binding)
>>>
>>> On Tue, May 30, 2023 at 5:42 PM Robert Bradshaw 
>>> wrote:
>>>
 On Tue, May 30, 2023 at 2:01 PM Ritesh Ghorse via dev <
 dev@beam.apache.org> wrote:

> Thanks Danny and Jack! Dataflow containers are up!
>
> Only PMC votes count but feel free to test your use cases and vote on
> this thread!
>

 While we need at least 3 affirmative PMC votes to formally do a
 release, it is definitely the case that all votes are valuable input and
 are taken into consideration when deciding to do so.


> On Tue, May 30, 2023 at 11:26 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> +1 (binding)
>>
>> Tested with  https://github.com/Talend/beam-samples/
>> (Java SDK v8/v11/v17, Spark 3.x runner).
>>
>> On 27 May 2023, at 19:38, Bruno Volpato via dev 
>> wrote:
>>
>> I was able to check that containers are all there and complete
>> my validation.
>>
>> +1 (non-binding).
>>
>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates 
>> (Java
>> SDK 11, Dataflow runner).
>>
>>
>> Thanks Ritesh and Danny!
>>
>> On Fri, May 26, 2023 at 10:09 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> It looks like some Dataflow containers didn't get published, so some
>>> jobs using the legacy runner (runner v2 disabled) will fail. I kicked 
>>> off
>>> the container release, so that should hopefully be available later 
>>> today.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Thu, May 25, 2023 at 11:19 PM Ritesh Ghorse via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hi everyone,
 Please review and vote on the release candidate #2 for the version
 2.48.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific
 comments)


 Reviewers are encouraged to test their own use cases with the
 release candidate, and vote +1 if no issues are found. Only PMC member
 votes will count towards the final vote, but votes from all community
 members are encouraged and helpful for finding regressions; you can 
 either
 test your own use cases or use cases from the validation sheet [10].

 The complete staging area is available for your review, which
 includes:
 * GitHub Release notes [1],
 * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 E4C74BEC861570F5A3E44E46280A0AC32DBAE62B [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.48.0-RC2" [5],
 * website pull request listing the release [6], the blog post [6],
 and publishing the API reference manual [7] (to be generated).
 * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle
 JDK 8.0.322.
 * Python artifacts are deployed along with the source release to
 the dist.apache.org [2] and PyPI[8].
 * Go artifacts and documentation are available at pkg.go.dev [9]
 * Validation sheet with a tab for 2.48.0 release to help with
 validation [10].
 * Docker images published to Docker Hub [11].
 * PR to run tests against release branch [12].

 The vote will be open for at least 72 hours. It is adopted by
 majority approval, with at least 3 PMC affirmative votes.

 For guidelines on how to try the release in your projects, check
 out our blog post at /blog/validate-beam-release/.

 *NOTE: Dataflow containers for Python are not finalized yet (likely
 to happen on tuesday). I will follow up on this thread once that is 
 

Re: [VOTE] Release 2.47.0, release candidate #3

2023-05-06 Thread Svetak Sundhar via dev
+1 (Non-Binding)

I tested Python Quick Start on Dataflow Runner as well



Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Sat, May 6, 2023 at 4:44 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> I'm seeing a regression when running Java x-lang jobs using the RC.
> Created https://github.com/apache/beam/issues/26576.
>
> Thanks,
> Cham
>
> On Fri, May 5, 2023 at 11:11 PM Austin Bennett  wrote:
>
>> +1 ( non-binding )
>>
>> On Fri, May 5, 2023 at 10:49 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Regards
>>> JB
>>>
>>> On Fri, May 5, 2023 at 4:52 AM Jack McCluskey via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hi everyone,

 Please review and vote on the release candidate #3 for the version
 2.47.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 Reviewers are encouraged to test their own use cases with the release
 candidate, and vote +1 if no issues are found. *Non-PMC members are
 allowed and encouraged to vote. Please help validate the release for your
 use case!*

 The complete staging area is available for your review, which includes:
 * GitHub Release notes [1],
 * the official Apache source release to be deployed to dist.apache.org [2],
 which is signed with the key with fingerprint DF3CBA4F3F4199F4 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.47.0-RC3" [5],
 * website pull request listing the release [6], the blog post [6], and
 publishing the API reference manual [7].
 * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
 8.0.322.
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2] and PyPI[8].
 * Go artifacts and documentation are available at pkg.go.dev [9]
 * Validation sheet with a tab for 2.47.0 release to help with
 validation [10].
 * Docker images published to Docker Hub [11].
 * PR to run tests against release branch [12].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 The GCR copies of the FnAPI containers are rolling out now, they should
 be out within the next 8 hours or so.

 For guidelines on how to try the release in your projects, check out
 our blog post at /blog/validate-beam-release/.

 Thanks,

 Jack McCluskey

 [1] https://github.com/apache/beam/milestone/10
 [2] https://dist.apache.org/repos/dist/dev/beam/2.47.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1322/
 [5] https://github.com/apache/beam/tree/v2.47.0-RC3
 [6] https://github.com/apache/beam/pull/26439
 [7] https://github.com/apache/beam-site/pull/644
 [8] https://pypi.org/project/apache-beam/2.47.0rc3/
 [9]
 https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.47.0-RC3/go/pkg/beam
 [10]
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=.
 ..
 [11] https://hub.docker.com/search?q=apache%2Fbeam=image
 [12] https://github.com/apache/beam/pull/26152

 --


 Jack McCluskey
 SWE - DataPLS PLAT/ Dataflow ML
 RDU
 jrmcclus...@google.com





Re: Starter projects for Beam

2023-04-28 Thread Svetak Sundhar via dev
Hi Tariq,

Thanks for your interest! A good starting point are good first issues:
https://github.com/apache/beam/labels/good%20first%20issue?page=2=is%3Aopen+label%3A%22good+first+issue%22
.

Feel free to assign an issue to yourself and put up a PR/ask any needed
questions when ready.

Thanks,


Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Fri, Apr 28, 2023 at 2:17 PM Tariq Hasan  wrote:

> Hello,
>
> I am reaching out as a new entrant into the Apache Beam project.
>
> As a developer with a few years of experience, I was looking to grow my
> passion around software development through open-source contributions.
>
> With Apache Beam, I am quite interested in working across multiple areas,
> including but not limited to Java and Python SDKs and the various runners
> and transforms on the roadmap.
>
> I was reaching out here for some guidance with regards to starter projects
> that could be a viable starting point.
>
> If anyone can offer suggestions on possible scope to contribute to the
> project and resources to get going, that would be very helpful.
>
> Sincerely,
>
> Tariq Hasan
>
>


Re: Jenkins Flakes

2023-04-11 Thread Svetak Sundhar via dev
+1 to the proposal.

Regarding the "(and not guaranteed to work)" part, is the resolution that
the memory issues may still persist and we restore the normal retention
limit (and we look for another fix), or that we never restore back to the
normal retention limit?


Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Tue, Apr 11, 2023 at 10:34 AM Jack McCluskey via dev 
wrote:

> +1 for getting Jenkins back into a happier state, getting release blockers
> resolved ahead of building an RC has been severely hindered by Jenkins not
> picking up tests or running them properly.
>
> On Tue, Apr 11, 2023 at 10:24 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> *;tldr - I want to temporarily reduce the number of builds that we retain
>> to reduce pressure on Jenkins*
>>
>> Hey everyone, over the past few days our Jenkins runs have been
>> particularly flaky across the board, with errors like the following showing
>> up all over the place [1]:
>>
>> java.nio.file.FileSystemException: 
>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml:
>>  No space left on device [2]
>>
>>
>> These errors indicate that we're out of space on the Jenkins master node.
>> After some digging (thanks @Yi Hu  @Ahmet Altay
>>  and @Bruno Volpato  for
>> contributing), we've determined that at least one large contributing issue
>> is that some of our builds are eating up too much space. For example, our
>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this
>> is just one example).
>>
>> @Yi Hu  found one change around code coverage that is
>> likely heavily contributing to the problem and rolled that back [3]. We can
>> continue to find other contributing factors here.
>>
>> In the meantime, to get us back to healthy *I propose that we reduce the
>> number of builds that we are retaining to 40 for all jobs that are using a
>> large amount of storage (>5GB)*. This will hopefully allow us to return
>> Jenkins to a normal functioning state, though it will do so at the cost of
>> a significant amount of build history (right now, for example,
>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the
>> normal retention limit once the underlying problem is resolved. Given that
>> this is irreversible (and not guaranteed to work), I wanted to gather
>> feedback before doing this. Personally, I rarely use builds that old, but
>> others may feel differently.
>>
>> Please let me know if you have any objections or support for this
>> proposal.
>>
>> Thanks,
>> Danny
>>
>> [1] Tracking issue: https://github.com/apache/beam/issues/26197
>> [2] Example run with this error:
>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console
>> [3] Rollback PR: https://github.com/apache/beam/pull/26199
>>
>


Re: Regarding Project proposal review and feedback

2023-04-02 Thread Svetak Sundhar via dev
Hi Siddharth,
I left some comments as well on the sentiment analysis proposal.

Thanks,


Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Sun, Apr 2, 2023 at 1:58 PM Anand Inguva via dev 
wrote:

> I left some comments on the sentiment analysis proposal.
>
> Thanks,
> Anand
>
> On Thu, Mar 30, 2023 at 9:59 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks Siddharth! I left some comments on the sentiment analysis
>> proposal, I am probably not the best person to comment on the flink
>> datastream api one though.
>>
>> Thanks,
>> Danny
>>
>> On Fri, Mar 24, 2023 at 11:53 PM Siddharth Aryan <
>> siddhartharyan...@gmail.com> wrote:
>>
>>> Hello ,
>>> I am Siddharth Aryan a undergrad and I am looking forward to someone who
>>> can help me reviewing my proposal and give me a feedback on the them which
>>> help me to create a good proposal.
>>> Here ,I am attaching my both the project proposals:
>>> >Sentimental Analysis Pipeline with the help of Machine Learnig:
>>>
>>> https://docs.google.com/document/d/1U6zcXAWsDCrWlbf14f5VlLqPZFucwXR48tD7mrERW-g/edit?usp=sharing
>>>
>>> >Integrating Apache Beam with Flink Datastream API:
>>>
>>> https://docs.google.com/document/d/1sQEe9eVuoHX9QWS9Zj5wVl7MLmfk7QO09pjZOsk-TFY/edit?usp=sharing
>>>
>>> Best Regards
>>> Siddharth Aryan
>>>
>>> Github :https://github.com/nervoussidd
>>>
>>


Re: Custom Windowing

2023-03-22 Thread Svetak Sundhar via dev
Hi Sahith,

It isn't immediately obvious to me what the error might be, though I was
able to sift through the stacktrace and find areas of the codebase that it
touches (
https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java#L412
).

Perhaps we can schedule a call to look into the code and learn more about
what might be going wrong? Alternatively, you could file a GCP Support
ticket, that'll give us access to look into the Dataflow job to see if we
can find any more evidence of what might be going wrong.

Thanks,


Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Fri, Mar 17, 2023 at 1:02 PM Sahith Nallapareddy via dev <
dev@beam.apache.org> wrote:

> Hello,
>
> We are working on writing a custom windowing function. The functionality
> is similar to the one described in this book
> https://www.oreilly.com/library/view/streaming-systems/9781491983867/ch04.html
> (the bounded session and per key fixed window is what we are trying).
> However, we are not sure what is wrong with our implementation as we run
> into this error in dataflow: Error message from worker:
> java.lang.IllegalStateException:
> [2023-03-14T21:28:46.639Z..2023-03-14T21:58:46.639Z) is in more than one
> state address window set
>
> Can anyone explain what this error means and how we can reproduce it? we
> have tests setup and the tests pass fine, this only appears in dataflow
>
>
> Full stack trace:
>
> java.lang.IllegalStateException:
> [2023-03-14T21:28:54.817Z..2023-03-14T21:43:54.817Z) is in more than one
> state address window set at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:588)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.checkInvariants(MergingActiveWindowSet.java:335)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.MergingActiveWindowSet.persist(MergingActiveWindowSet.java:89)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.persist(ReduceFnRunner.java:385)
> at
> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:98)
> at
> org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:43)
> at
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:121)
> at
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
> at
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
> at
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:137)
> at
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
> at
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
> at
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:212)
> at
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:163)
> at
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
> at
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1445)
> at
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1100(StreamingDataflowWorker.java:165)
> at
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$7.run(StreamingDataflowWorker.java:1120)
> at
> org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor.lambda$executeLockHeld$0(BoundedQueueExecutor.java:133)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Thanks,
>
> Sahith
>


Re: [ANNOUNCE] New committer: John Casey

2022-07-29 Thread Svetak Sundhar via dev
Congrats John!!


Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Fri, Jul 29, 2022 at 5:02 PM Ritesh Ghorse via dev 
wrote:

> Congratulations John!
>
> On Fri, Jul 29, 2022 at 5:01 PM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Congrats John, what a great addition!
>>
>> On Fri, Jul 29, 2022 at 4:56 PM Kerry Donny-Clark via dev <
>> dev@beam.apache.org> wrote:
>>
>>> John, you have made a huge impact on the many, many users of Kafka and
>>> other IOs. This is great recognition of your commitment to Beam.
>>> Kerry
>>>
>>> On Fri, Jul 29, 2022 at 4:46 PM Byron Ellis via dev 
>>> wrote:
>>>
 Congratulations John!

 On Fri, Jul 29, 2022 at 1:09 PM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Congrats John and welcome! This is well deserved!
>
> On Fri, Jul 29, 2022 at 4:07 PM Kenneth Knowles 
> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming
>> a new committer: John Casey (johnca...@apache.org)
>>
>> John started contributing to Beam in late 2021. John has quickly
>> become our resident expert on KafkaIO - identifying bugs, making
>> enhancements, helping users - in addition to a variety of other
>> contributions.
>>
>> Considering his contributions to the project over this timeframe, the
>> Beam PMC trusts John with the responsibilities of a Beam committer. [1]
>>
>> Thank you John! And we are looking to see more of your contributions!
>>
>> Kenn, on behalf of the Apache Beam PMC
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>