Re: Beam Summit Status Report - 2/2

2022-02-03 Thread Kenneth Knowles
Hi Danielle,

Would it be possible in the future to inline the content of the status
report to the email? This way it will be archived with the mailing list.
Also it will make it more likely that casual readers of the list may skim
it.

Kenn

On Wed, Feb 2, 2022 at 3:39 PM Danielle Syse  wrote:

> Hi everyone,
>
> I hope you're all having a great week thus far. Attached below is the
> updated Beam Summit Status Report from today's meeting. Let me know if you
> have any comments, questions, or concerns.
>
> We're also looking for speakers for our Beam Summit! Please fill out the
> following form to help contribute: https://bit.ly/3o2D9FL
>
> 2/2 Status Report: https://bit.ly/3IAzR4u
>
> Thank you,
>
> Danielle Syse
>
>


Re: Beam Java starter project template

2022-02-03 Thread Kenneth Knowles
I'm convinced on all points. My main motivation was to keep it simple. But
of course we should keep it simple for users, not us :-)

I can take on the task of asking about MIT license and requesting the repos
be created. Not sure if it needs my level of privileges but I'm happy to do
it anyhow.

Kenn

On Wed, Feb 2, 2022 at 10:30 AM Robert Bradshaw  wrote:

> On Wed, Feb 2, 2022 at 10:12 AM David Cavazos  wrote:
> >
> > MIT is much more permissive, but I also don't have any problems changing
> it to Apache license. In any case, how about we create the following repos?
>
> For these starter projects, we don't want to encumber any users of
> these templates with any particular licensing requirements (right?)
> and we don't even care about attribution. We want these to be pretty
> much as close to public domain as possible. That's not what the Apache
> licence does. (If it's even relevant, a good argument could likely be
> made for de minis or fair use, but I think it's best to be explicit
> about this. Perhaps this'd be a good question for apache legal?
>
> > apache/beam-starter-java
> > apache/beam-starter-python
> > apache/beam-starter-go
> > apache/beam-starter-kotlin
> > apache/beam-starter-scala
> >
> > We'll start by populating the Java one which is the most pressing one
> and the one that is ready, but the rest should be simpler.
> >
> > +David Huntsperger, tldr; these are minimal starter projects for every
> language. Once we have Java, Python and Go, it might be a good idea to
> change the quickstarts to use these instead of the word count. There is
> already a dedicated word count walkthrough so I think that is already
> covered.
> >
> > If we all agree on the repo names, who can help us create them?
> >
> > On Thu, Jan 27, 2022 at 12:58 PM Robert Bradshaw 
> wrote:
> >>
> >> On Tue, Jan 18, 2022 at 6:17 AM Kenneth Knowles 
> wrote:
> >> >
> >> > Agree with Luke here. "Just git clone and go" is a big part of it.
> >> >
> >> > But also the answer to "I simply don't know what one would put in a
> Python repo than, other than a bare setup.py that lists a dependency on
> apache_beam" is answered by David's initial email and his repo, namely:
> >> >
> >> >  - GitHub Actions configuration
> >> >  - README.md
> >> >  - example that already runs
> >>
> >> OK, fair enough.
> >>
> >> >  - LICENSE (notably you've got it as MIT but to be part of Apache
> software it needs to be ASL2)
> >>
> >> On the topic of licence, it's a bit tricky because one doesn't want to
> >> bind the users of such a template as being a derivative work of a
> >> too-restrictive licence. The licence of the template itself should
> >> generally be very permissive.
> >>
> >> > On Fri, Jan 14, 2022 at 2:34 PM Luke Cwik  wrote:
> >> >>
> >> >> I think for consistency it makes sense to users to be told to
> checkout this git repo for the language of your choice and run. Some repos
> will have more/less than others when it comes to setup necessary.
> >> >>
> >> >> On Fri, Jan 14, 2022 at 2:26 PM Robert Bradshaw 
> wrote:
> >> >>>
> >> >>> +1 for doing this for Java, as setting up a project there is quite
> >> >>> complicated. I simply don't know what one would put in a Python repo
> >> >>> than, other than a bare setup.py that lists a dependency on
> >> >>> apache_beam. We don't have recommendations on file layout, etc. more
> >> >>> than that (though there's plenty of generic advice to be found out
> >> >>> there on the topic). I have a hunch go is similar, and javascript
> >> >>> would be as well (npm install apache-beam and your package.json file
> >> >>> gets updated).
> >> >>>
> >> >>> On Fri, Jan 14, 2022 at 2:17 PM Luke Cwik  wrote:
> >> >>> >
> >> >>> > There are several examples already within the Beam repo found in:
> >> >>> > https://github.com/apache/beam/tree/master/examples
> >> >>> > https://github.com/apache/beam/tree/master/sdks/go/examples
> >> >>> >
> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples
> >> >>> >
> >> >>> >
> >> >>> > On Fri, Jan 14, 2022 at 11:07 AM Sachin Agarwal <
> sachi...@google.com> wrote:
> >> >>> >>
> >> >>> >> I'd love to do something other than Wordcount just for
> novelty/freshness but agreed with the suggestion that having an example in
> each quickstart would be ideal.
> >> >>> >>
> >> >>> >> On Fri, Jan 14, 2022 at 11:06 AM David Huntsperger <
> dhuntsper...@google.com> wrote:
> >> >>> >>>
> >> >>> >>> + 1 to a separate repo for each language.
> >> >>> >>>
> >> >>> >>> Would it make sense to include the Wordcount example in each
> repo? I know that makes the repos less minimal, but we could rewrite the
> quickstarts around these repos instead of the current Wordcount examples.
> Or maybe we don't need to use the Wordcount example in the quickstarts...
> >> >>> >>>
> >> >>> >>> On Wed, Jan 12, 2022 at 1:54 PM David Cavazos <
> dcava...@google.com> wrote:
> >> >>> 
> >> >>>  I agree with dropping the archetypes. Less maintenance is
> preferable, and the github r

Re: Timestamp Verification when Outputting from FinishBundleContext Vs. ProcessContext

2022-02-03 Thread Kenneth Knowles
One reason is the lack of `elem` used here:
https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L440

That might be the whole reason.

Still, your point is a good one that the sort of validation is different.
The check for timestamp skew dates back to even before Beam and has been a
source of various troubles. It is also probably obsolete. The purpose was
really to make sure elements did not fall into time intervals that would be
dropped immediately. Now that element dropping is associated with expired
windows, it may be entirely obsolete or just ready for some updating. +Lara
Schmidt  is the person who has looked in detail
most recently, I think.

Kenn

On Wed, Feb 2, 2022 at 8:00 AM Evan Galpin  wrote:

> Hey folks,
>
> I noticed through tracing code that when calling
> ProcessContext#outputWithTimestamp, the method checkTimestamp is
> invoked[1]. However, no similar check appears to be invoked when calling
> FinishBundleContext#output, which explicitly requires passing a timestamp
> as one of the arguments[2]. Instead, all that's checked is that the pane
> and timestamp are not null.  Is this difference intentional?  Could someone
> help me improve my understanding?
>
> Thanks,
> Evan
>
> [1]
> https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L422
> [2]
> https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/transforms/DoFn.FinishBundleContext.html#output-org.apache.beam.sdk.values.TupleTag-T-org.joda.time.Instant-org.apache.beam.sdk.transforms.windowing.BoundedWindow-
>


Re: @Timer and Kafka SDF not interacting properly

2022-02-03 Thread Kenneth Knowles
I know we chatted about this off-list, but I wanted to just follow up and
see if you figured it out. Sounds like it could be an important bug in the
DirectRunner. I don't recall whether it reproduces on e.g. local
Flink/Spark Runner or the Python local portable runner.

Kenn

On Tue, Feb 1, 2022 at 1:10 PM John Casey  wrote:

> I'm investigating an issue where KafkaIO.read.withDynamicRead doesn't
> appear to be working properly when used with the SDF based reader.
> Specifically, it doesn't appear that the pipeline picks up any new topics
> or partitions.
>
> I'm running locally using the DirectRunner, and I've set breakpoints at
> the start of WatchKafkaTopPartitionDoFn::processElement and ~::onTimer.
>
> It looks like the initial processElement works fine. It is called once,
> and populates the pipeline with the initial state of Kafka. However, the
> onTimer method is never called. I've configured the timer to be 1 minute,
> and I've waited about 20 minutes, but the method never gets called, which
> means that no new partitions are set up.
>
> My current (unvalidated) suspicion is that the way we are creating splits
> for Kafka is causing the timer to never hit that 1 minute mark, preventing
> onTimer from being called.
>
> Is someone familiar with how Java Timers work or what might be causing a
> timer to not trigger?
>
> Thanks,
> John
>


Re: Javascript SDK

2022-02-03 Thread Robert Burke
The best way to not release is to have it in it's own branch off the
mainline. That was the original tactic employed by the SDK, until that
branch got merged in and was unable to be disentangled.

Then it was mostly a matter of not doing any of the container type code for
it.

Ultimately once it's in master, it's part of the repo and will be part of a
given release versions archive of the repo.

Technically with Go, because Go package versions are automatically tied to
the repos tags, we had been "releasing" versions of the SDK anyway. I don't
think thats true for Node and it's package management.

Specifically for Node, to do more not releasing, we also could avoid,
publishing the code there. But i don't know anything about it, it could be
as simple as Do Nothing.

Personally, if it gets added to the repo at all I'd rather we rip off the
band-aid and at least have all the tests regularly run, and various GitHub
actions. Even if we aren't doing the container release activities, because
it's experimental, that's much better than bit rot and being part of the
main repo has a simpler contribution convention.

Those are my 2 cents.
Robert B
Beam Go Busybody

On Thu, Feb 3, 2022, 3:29 PM Kenneth Knowles  wrote:

> We did the same for the Go SDK for some time. I imagine just "not doing
> the work to release it" suffices? Maybe +Robert Burke  has
> some other memories of how to not release.
>
> Kenn
>
> On Mon, Jan 31, 2022 at 1:05 PM Kerry Donny-Clark 
> wrote:
>
>> This project was a great way to kickstart a new SDK. I'd like to bring
>> this into Beam and start cleanup. Are there any steps to take before making
>> a PR? Is there a way to mark this as experimental/not for release?
>> Kerry
>>
>> On Mon, Jan 17, 2022 at 1:22 AM Pablo Estrada  wrote:
>>
>>> This project was fun, and I learned a lot putting some time into it. I'd
>>> love for it to be brought into the main repository and worked over some
>>> time to be fully supported.
>>> Best
>>> -P.
>>>
>>> On Fri, Jan 14, 2022 at 4:46 PM Ahmet Altay  wrote:
>>>
 Really nice! Congratulations to all who worked on this project.

 On Fri, Jan 14, 2022 at 4:41 PM Kenneth Knowles 
 wrote:

> This was super fun, and I really hope it can be an inspiration to
> others that you can build a working Beam SDK in a week!
>
> (hint hint https://issues.apache.org/jira/browse/BEAM-4010 and
> https://issues.apache.org/jira/browse/BEAM-12658 :-)
>
> On Fri, Jan 14, 2022 at 11:38 AM Robert Bradshaw 
> wrote:
>
>> And, of course, an example:
>>
>>
>> https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/src/apache_beam/examples/wordcount.ts
>>
>> On Fri, Jan 14, 2022 at 11:35 AM Robert Bradshaw 
>> wrote:
>> >
>> > Last week at Google we had a hackathon to kick off the new year, and
>> > one of the projects we came up with was seeing how far we could get
>> in
>> > putting together a typescript SDK. Starting from nothing we were
>> able
>> > to make a lot of progress and I wanted to share the results here.
>> >
>> >
>> https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/README.md
>> >
>> > I think this is an exciting project and look forward to officially
>> > supporting a new language. Clearly there is still a fair amount to
>> do,
>> > and we also need to figure out the best way to get this reviewed
>> (we'd
>> > especially welcome feedback (and contributions) from those, if any,
>> in
>> > the know about javascript/typescript/node even if they're not beam
>> or
>> > distributed computing experts) and into the main repository
>> (assuming
>> > the community is as interested in this as I am).
>> >
>> > The above link is a decent overview, but copying below for posterity
>> > as that will likely evolve over time (e.g. as decisions get made and
>> > TODOs get resolved).
>> >
>> > - Robert
>> >
>> >
>> > 
>> >
>> > # Node Beam SDK
>> >
>> > This is the start of a fully functioning Javascript (actually,
>> > Typescript) SDK. There are two distinct aims with this SDK
>> >
>> > 1. Tap into the large (and relatively underserved, by existing data
>> > processing frameworks) community of javascript developers with a
>> > native SDK targeting this language.
>> >
>> > 1. Develop a new SDK which can serve both as a proof of concept and
>> > reference that highlights the (relative) ease of porting Beam to new
>> > languages, a differentiating feature of Beam and Dataflow.
>> >
>> > To accomplish this, we lean heavily on the portability framework.
>> For
>> > example, we make heavy use of cross-language transforms, in
>> particular
>> > for IOs (as a full SDF implementation may not fit into the week). In
>> > addition, the direct runner is simply a

Re: Thoughts from a first time contributor

2022-02-03 Thread Aizhamal Nurmamat kyzy
Welcome to the community Danny, and thanks for the feedback!

With some of our planned improvements on Beam's landing page, we are hoping
to address the 1st friction point you mentioned. We have received
similar feedback in the past a few times.

Regarding the 3rd feedback, I am hoping we can continue the discussion on
GH vs Jira issues, and some improvements will be introduced there as well.

Thanks again for the write up!

On Thu, Feb 3, 2022 at 3:27 PM Kenneth Knowles  wrote:

> Thanks for writing all this up and for putting up PRs to improve things!
>
> Kenn
>
> On Mon, Jan 31, 2022 at 12:17 PM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> 👋 Hey folks, my name is Danny - I recently completed my first Beam
>> PR[1] (a small extension to the Go Dataflow runner) and am planning on
>> becoming a more regular part of the community. As such, I wanted to use my
>> fresh newbie eyes and share some of what was nice and where there was
>> friction about getting started.
>>
>> Disclaimer: this is coming from the perspective of someone who is pretty
>> used to open source development, but has minimal experience with the Apache
>> way, Beam, and the languages my change came in. I'm hoping my experience is
>> helpful to those of you who have been around for a while and haven't seen
>> things as a newcomer in a long time, but it may not be reflective of the
>> experience of others.
>>
>> *Things that were really nice:*
>>
>> - The community has been really welcoming and encouraging of
>> contributions, something I saw in my first code review, my first pr, and
>> even the tone of the docs. Special thanks to @lostluck and @jrmccluskey for
>> making my first interactions welcoming and prompt. That experience can be
>> the difference between one time and repeat contributors.
>>
>> - Getting started writing my first pipeline, and then ramping up to more
>> complex concepts was surprisingly easy - in particular, the docs, examples,
>> and Katas made for a reasonably smooth process. It wasn't always clear how
>> to go from that to more complex transforms and there's of course room for
>> more clarity, but I appreciate the work that's gone into the getting
>> started experience.
>>
>> - Overall, the code base is pretty easily understood/reasoned about, and
>> the high quality of code made it pretty easy to make my first change. I'm
>> pretty impressed at how simple/well composed this system is even as it
>> approaches a tricky problem space (hopefully I'm saying the same thing
>> after I make some bigger changes :))
>>
>> *Friction Points:*
>>
>> - It was harder than expected for me to figure out what made Beam
>> different/special from other tools in the space out there for users.
>> Specifically, it wasn't immediately obvious why I would use Beam instead of
>> just running my jobs directly on Spark or Flink or one of the other
>> runners. One pretty big challenge here was that I didn't really get how
>> easy it was to switch runner types/how powerful the portability (and
>> unified streaming/batch) model was. I'm not sure exactly what would make
>> this easier for someone new to the space, but some sort of graphic or brief
>> statement of "this is where Beam adds value over most other frameworks" in
>> the Readme would be cool.
>>
>> - There were some small paper cut usability things in the repo.
>> Specifically, it looks like the labeler is broken (issue[2] and pr[3] added
>> to address this) and there isn't a CONTRIBUTING.md (issue[4] and pr[5]
>> added to address this), though the contribution guide is linked elsewhere.
>> Both of these are probably non-issues for experienced contributors, but add
>> a small amount of friction for people who are trying to get involved,
>> especially those who navigate a fair amount of OSS repos, and they're
>> pretty easy to fix.
>>
>> - I know there's some separate discussion about this in a different
>> thread[6], but the use of Jira instead of GitHub issues added a layer of
>> friction to getting started. Concretely, I would've put up my first pull
>> request and created my first issue earlier if I didn't need to go through
>> the process of creating a Jira account and getting permissions to assign
>> tickets, and it was harder to find a good first issue to contribute to. I
>> can imagine others might not have pushed through that.
>>
>> With all that said, I'm really excited to be joining the community and
>> get to add to Beam, and I hope it was helpful or interesting to get a
>> newbies perspective.
>>
>> [1] https://github.com/apache/beam/pull/16643
>> [2] https://issues.apache.org/jira/browse/BEAM-13779
>> [3] https://github.com/apache/beam/pull/16665
>> [4] https://issues.apache.org/jira/browse/BEAM-13780
>> [5] https://github.com/apache/beam/pull/1
>> [6] https://lists.apache.org/thread/q5nbwxqvfkzlz664c4kchzkbj26c3r89
>>
>> Thanks,
>> Danny
>>
>


Re: Javascript SDK

2022-02-03 Thread Kenneth Knowles
We did the same for the Go SDK for some time. I imagine just "not doing the
work to release it" suffices? Maybe +Robert Burke  has
some other memories of how to not release.

Kenn

On Mon, Jan 31, 2022 at 1:05 PM Kerry Donny-Clark 
wrote:

> This project was a great way to kickstart a new SDK. I'd like to bring
> this into Beam and start cleanup. Are there any steps to take before making
> a PR? Is there a way to mark this as experimental/not for release?
> Kerry
>
> On Mon, Jan 17, 2022 at 1:22 AM Pablo Estrada  wrote:
>
>> This project was fun, and I learned a lot putting some time into it. I'd
>> love for it to be brought into the main repository and worked over some
>> time to be fully supported.
>> Best
>> -P.
>>
>> On Fri, Jan 14, 2022 at 4:46 PM Ahmet Altay  wrote:
>>
>>> Really nice! Congratulations to all who worked on this project.
>>>
>>> On Fri, Jan 14, 2022 at 4:41 PM Kenneth Knowles  wrote:
>>>
 This was super fun, and I really hope it can be an inspiration to
 others that you can build a working Beam SDK in a week!

 (hint hint https://issues.apache.org/jira/browse/BEAM-4010 and
 https://issues.apache.org/jira/browse/BEAM-12658 :-)

 On Fri, Jan 14, 2022 at 11:38 AM Robert Bradshaw 
 wrote:

> And, of course, an example:
>
>
> https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/src/apache_beam/examples/wordcount.ts
>
> On Fri, Jan 14, 2022 at 11:35 AM Robert Bradshaw 
> wrote:
> >
> > Last week at Google we had a hackathon to kick off the new year, and
> > one of the projects we came up with was seeing how far we could get
> in
> > putting together a typescript SDK. Starting from nothing we were able
> > to make a lot of progress and I wanted to share the results here.
> >
> >
> https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/README.md
> >
> > I think this is an exciting project and look forward to officially
> > supporting a new language. Clearly there is still a fair amount to
> do,
> > and we also need to figure out the best way to get this reviewed
> (we'd
> > especially welcome feedback (and contributions) from those, if any,
> in
> > the know about javascript/typescript/node even if they're not beam or
> > distributed computing experts) and into the main repository (assuming
> > the community is as interested in this as I am).
> >
> > The above link is a decent overview, but copying below for posterity
> > as that will likely evolve over time (e.g. as decisions get made and
> > TODOs get resolved).
> >
> > - Robert
> >
> >
> > 
> >
> > # Node Beam SDK
> >
> > This is the start of a fully functioning Javascript (actually,
> > Typescript) SDK. There are two distinct aims with this SDK
> >
> > 1. Tap into the large (and relatively underserved, by existing data
> > processing frameworks) community of javascript developers with a
> > native SDK targeting this language.
> >
> > 1. Develop a new SDK which can serve both as a proof of concept and
> > reference that highlights the (relative) ease of porting Beam to new
> > languages, a differentiating feature of Beam and Dataflow.
> >
> > To accomplish this, we lean heavily on the portability framework. For
> > example, we make heavy use of cross-language transforms, in
> particular
> > for IOs (as a full SDF implementation may not fit into the week). In
> > addition, the direct runner is simply an extension of the worker
> > suitable for running on portable runners such as the ULR, which will
> > directly transfer to running on production runners such as Dataflow
> > and Flink. The target audience should hopefully not be put off by
> > running other language code encapsulated in docker images.
> >
> > ## API
> >
> > We generally try to apply the concepts from the Beam API in a
> > Typescript idiomatic way, but it should be noted that few of the
> > initial developers have extensive (if any) Javascript/Typescript
> > development experience, so feedback is greatly appreciated.
> >
> > In addition, some notable departures are taken from the traditional
> SDKs:
> >
> > * We take a "relational foundations" approach, where [schema'd
> > data](
> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit#heading=h.puuotbien1gf
> )
> > is the primary way to interact with data, and we generally eschew the
> > key-value requiring transforms in favor of a more flexible approach
> > naming fields or expressions. Javascript's native Object is used as
> > the row type.
> >
> > * As part of being schema-first we also de-emphasize Coders as a
> > first-class concept in the SDK, relegating it to an advance feature
>>

Re: Thoughts from a first time contributor

2022-02-03 Thread Kenneth Knowles
Thanks for writing all this up and for putting up PRs to improve things!

Kenn

On Mon, Jan 31, 2022 at 12:17 PM Danny McCormick 
wrote:

> 👋 Hey folks, my name is Danny - I recently completed my first Beam
> PR[1] (a small extension to the Go Dataflow runner) and am planning on
> becoming a more regular part of the community. As such, I wanted to use my
> fresh newbie eyes and share some of what was nice and where there was
> friction about getting started.
>
> Disclaimer: this is coming from the perspective of someone who is pretty
> used to open source development, but has minimal experience with the Apache
> way, Beam, and the languages my change came in. I'm hoping my experience is
> helpful to those of you who have been around for a while and haven't seen
> things as a newcomer in a long time, but it may not be reflective of the
> experience of others.
>
> *Things that were really nice:*
>
> - The community has been really welcoming and encouraging of
> contributions, something I saw in my first code review, my first pr, and
> even the tone of the docs. Special thanks to @lostluck and @jrmccluskey for
> making my first interactions welcoming and prompt. That experience can be
> the difference between one time and repeat contributors.
>
> - Getting started writing my first pipeline, and then ramping up to more
> complex concepts was surprisingly easy - in particular, the docs, examples,
> and Katas made for a reasonably smooth process. It wasn't always clear how
> to go from that to more complex transforms and there's of course room for
> more clarity, but I appreciate the work that's gone into the getting
> started experience.
>
> - Overall, the code base is pretty easily understood/reasoned about, and
> the high quality of code made it pretty easy to make my first change. I'm
> pretty impressed at how simple/well composed this system is even as it
> approaches a tricky problem space (hopefully I'm saying the same thing
> after I make some bigger changes :))
>
> *Friction Points:*
>
> - It was harder than expected for me to figure out what made Beam
> different/special from other tools in the space out there for users.
> Specifically, it wasn't immediately obvious why I would use Beam instead of
> just running my jobs directly on Spark or Flink or one of the other
> runners. One pretty big challenge here was that I didn't really get how
> easy it was to switch runner types/how powerful the portability (and
> unified streaming/batch) model was. I'm not sure exactly what would make
> this easier for someone new to the space, but some sort of graphic or brief
> statement of "this is where Beam adds value over most other frameworks" in
> the Readme would be cool.
>
> - There were some small paper cut usability things in the repo.
> Specifically, it looks like the labeler is broken (issue[2] and pr[3] added
> to address this) and there isn't a CONTRIBUTING.md (issue[4] and pr[5]
> added to address this), though the contribution guide is linked elsewhere.
> Both of these are probably non-issues for experienced contributors, but add
> a small amount of friction for people who are trying to get involved,
> especially those who navigate a fair amount of OSS repos, and they're
> pretty easy to fix.
>
> - I know there's some separate discussion about this in a different
> thread[6], but the use of Jira instead of GitHub issues added a layer of
> friction to getting started. Concretely, I would've put up my first pull
> request and created my first issue earlier if I didn't need to go through
> the process of creating a Jira account and getting permissions to assign
> tickets, and it was harder to find a good first issue to contribute to. I
> can imagine others might not have pushed through that.
>
> With all that said, I'm really excited to be joining the community and get
> to add to Beam, and I hope it was helpful or interesting to get a newbies
> perspective.
>
> [1] https://github.com/apache/beam/pull/16643
> [2] https://issues.apache.org/jira/browse/BEAM-13779
> [3] https://github.com/apache/beam/pull/16665
> [4] https://issues.apache.org/jira/browse/BEAM-13780
> [5] https://github.com/apache/beam/pull/1
> [6] https://lists.apache.org/thread/q5nbwxqvfkzlz664c4kchzkbj26c3r89
>
> Thanks,
> Danny
>


Google Healthcare API Java SDK: HTTP Client Retry

2022-02-03 Thread Sampat P
Hello,

Did anyone work on HTTP Retry Client in the HealthCareAPI client.

https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L587

Is it possible to add a param to the initClient and expose it as an API so
the dataflow pipeline can pass a boolean if a retry is needed?

https://github.com/apache/beam/blob/6814a06ac3d7f4f4a3e84a72bb5fae6d63a3b71a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java#L732

Execute Bundle right now only take an input of FHIR Store name:
https://github.com/apache/beam/blob/6814a06ac3d7f4f4a3e84a72bb5fae6d63a3b71a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L904

We need an option to send a retry flag so the HTTP client could retry
failed REST API calls.

Please let me know if this was tried before or if you need further
discussion?

Please invite me to slack channel mailing...@gmail.com so we could discuss
in slack.

Thanks,
Sampat.


Re: Test

2022-02-03 Thread Sampat P
Hello,

Could you invite me to the beam slack channel: mailing...@gmail.com

Thanks,
Sampat.

On Thu, Feb 3, 2022 at 4:11 PM Sampat P  wrote:

>
>


Flaky test issue report (48)

2022-02-03 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing 
examples tests (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13783: 
apache_beam.transforms.combinefn_lifecycle_test.LocalCombineFnLifecycleTest.test_combine
 is flaky (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13525: Java VR (Dataflow, V2, 
Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests (created 
2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12673: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey (created 2021-07-28)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https:

P1 issues report (72)

2022-02-03 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing 
examples tests (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13798: Upgrade Kubernetes 
Clusters (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13787: Job in Flink + K8S 
Cluster exits without executing the pipeline (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13781: grpc-netty-shaded version 
conflict (created 2022-01-31)
https://issues.apache.org/jira/browse/BEAM-13769: 
beam_PreCommit_Python_Cron failing on test_create_uses_coder_for_pickling 
(created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13763: Rotate credentials for 
'io-datastores' Kubernetes cluster (created 2022-01-28)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13694: 
beam_PostCommit_Java_Hadoop_Versions failing with ClassDefNotFoundError 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13693: 
beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours 
(created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13686: OOM while logging a large 
pipeline even when logging level is higher (created 2022-01-19)
https://issues.apache.org/jira/browse/BEAM-13668: Java Spanner IO Request 
Count metrics broke backwards compatibility (created 2022-01-15)
https://issues.apache.org/jira/browse/BEAM-13615: Bumping up FnApi 
environment version to 9 in Java, Python SDK (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13606: bigtable io doesn't 
handle non-ok row mutations (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13579: Cannot run 
python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 
2021-12-29)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13376: Missing error for 
nonexistent column family BigTable (created 2021-12-03)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13203: Potential data loss when 
using SnsIO.writeAsync (created 2021-11-08)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEA