Re: Beam Summit Status Report - 2/2
Hi Danielle, Would it be possible in the future to inline the content of the status report to the email? This way it will be archived with the mailing list. Also it will make it more likely that casual readers of the list may skim it. Kenn On Wed, Feb 2, 2022 at 3:39 PM Danielle Syse wrote: > Hi everyone, > > I hope you're all having a great week thus far. Attached below is the > updated Beam Summit Status Report from today's meeting. Let me know if you > have any comments, questions, or concerns. > > We're also looking for speakers for our Beam Summit! Please fill out the > following form to help contribute: https://bit.ly/3o2D9FL > > 2/2 Status Report: https://bit.ly/3IAzR4u > > Thank you, > > Danielle Syse > >
Re: Beam Java starter project template
I'm convinced on all points. My main motivation was to keep it simple. But of course we should keep it simple for users, not us :-) I can take on the task of asking about MIT license and requesting the repos be created. Not sure if it needs my level of privileges but I'm happy to do it anyhow. Kenn On Wed, Feb 2, 2022 at 10:30 AM Robert Bradshaw wrote: > On Wed, Feb 2, 2022 at 10:12 AM David Cavazos wrote: > > > > MIT is much more permissive, but I also don't have any problems changing > it to Apache license. In any case, how about we create the following repos? > > For these starter projects, we don't want to encumber any users of > these templates with any particular licensing requirements (right?) > and we don't even care about attribution. We want these to be pretty > much as close to public domain as possible. That's not what the Apache > licence does. (If it's even relevant, a good argument could likely be > made for de minis or fair use, but I think it's best to be explicit > about this. Perhaps this'd be a good question for apache legal? > > > apache/beam-starter-java > > apache/beam-starter-python > > apache/beam-starter-go > > apache/beam-starter-kotlin > > apache/beam-starter-scala > > > > We'll start by populating the Java one which is the most pressing one > and the one that is ready, but the rest should be simpler. > > > > +David Huntsperger, tldr; these are minimal starter projects for every > language. Once we have Java, Python and Go, it might be a good idea to > change the quickstarts to use these instead of the word count. There is > already a dedicated word count walkthrough so I think that is already > covered. > > > > If we all agree on the repo names, who can help us create them? > > > > On Thu, Jan 27, 2022 at 12:58 PM Robert Bradshaw > wrote: > >> > >> On Tue, Jan 18, 2022 at 6:17 AM Kenneth Knowles > wrote: > >> > > >> > Agree with Luke here. "Just git clone and go" is a big part of it. > >> > > >> > But also the answer to "I simply don't know what one would put in a > Python repo than, other than a bare setup.py that lists a dependency on > apache_beam" is answered by David's initial email and his repo, namely: > >> > > >> > - GitHub Actions configuration > >> > - README.md > >> > - example that already runs > >> > >> OK, fair enough. > >> > >> > - LICENSE (notably you've got it as MIT but to be part of Apache > software it needs to be ASL2) > >> > >> On the topic of licence, it's a bit tricky because one doesn't want to > >> bind the users of such a template as being a derivative work of a > >> too-restrictive licence. The licence of the template itself should > >> generally be very permissive. > >> > >> > On Fri, Jan 14, 2022 at 2:34 PM Luke Cwik wrote: > >> >> > >> >> I think for consistency it makes sense to users to be told to > checkout this git repo for the language of your choice and run. Some repos > will have more/less than others when it comes to setup necessary. > >> >> > >> >> On Fri, Jan 14, 2022 at 2:26 PM Robert Bradshaw > wrote: > >> >>> > >> >>> +1 for doing this for Java, as setting up a project there is quite > >> >>> complicated. I simply don't know what one would put in a Python repo > >> >>> than, other than a bare setup.py that lists a dependency on > >> >>> apache_beam. We don't have recommendations on file layout, etc. more > >> >>> than that (though there's plenty of generic advice to be found out > >> >>> there on the topic). I have a hunch go is similar, and javascript > >> >>> would be as well (npm install apache-beam and your package.json file > >> >>> gets updated). > >> >>> > >> >>> On Fri, Jan 14, 2022 at 2:17 PM Luke Cwik wrote: > >> >>> > > >> >>> > There are several examples already within the Beam repo found in: > >> >>> > https://github.com/apache/beam/tree/master/examples > >> >>> > https://github.com/apache/beam/tree/master/sdks/go/examples > >> >>> > > https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples > >> >>> > > >> >>> > > >> >>> > On Fri, Jan 14, 2022 at 11:07 AM Sachin Agarwal < > sachi...@google.com> wrote: > >> >>> >> > >> >>> >> I'd love to do something other than Wordcount just for > novelty/freshness but agreed with the suggestion that having an example in > each quickstart would be ideal. > >> >>> >> > >> >>> >> On Fri, Jan 14, 2022 at 11:06 AM David Huntsperger < > dhuntsper...@google.com> wrote: > >> >>> >>> > >> >>> >>> + 1 to a separate repo for each language. > >> >>> >>> > >> >>> >>> Would it make sense to include the Wordcount example in each > repo? I know that makes the repos less minimal, but we could rewrite the > quickstarts around these repos instead of the current Wordcount examples. > Or maybe we don't need to use the Wordcount example in the quickstarts... > >> >>> >>> > >> >>> >>> On Wed, Jan 12, 2022 at 1:54 PM David Cavazos < > dcava...@google.com> wrote: > >> >>> > >> >>> I agree with dropping the archetypes. Less maintenance is > preferable, and the github r
Re: Timestamp Verification when Outputting from FinishBundleContext Vs. ProcessContext
One reason is the lack of `elem` used here: https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L440 That might be the whole reason. Still, your point is a good one that the sort of validation is different. The check for timestamp skew dates back to even before Beam and has been a source of various troubles. It is also probably obsolete. The purpose was really to make sure elements did not fall into time intervals that would be dropped immediately. Now that element dropping is associated with expired windows, it may be entirely obsolete or just ready for some updating. +Lara Schmidt is the person who has looked in detail most recently, I think. Kenn On Wed, Feb 2, 2022 at 8:00 AM Evan Galpin wrote: > Hey folks, > > I noticed through tracing code that when calling > ProcessContext#outputWithTimestamp, the method checkTimestamp is > invoked[1]. However, no similar check appears to be invoked when calling > FinishBundleContext#output, which explicitly requires passing a timestamp > as one of the arguments[2]. Instead, all that's checked is that the pane > and timestamp are not null. Is this difference intentional? Could someone > help me improve my understanding? > > Thanks, > Evan > > [1] > https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L422 > [2] > https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/transforms/DoFn.FinishBundleContext.html#output-org.apache.beam.sdk.values.TupleTag-T-org.joda.time.Instant-org.apache.beam.sdk.transforms.windowing.BoundedWindow- >
Re: @Timer and Kafka SDF not interacting properly
I know we chatted about this off-list, but I wanted to just follow up and see if you figured it out. Sounds like it could be an important bug in the DirectRunner. I don't recall whether it reproduces on e.g. local Flink/Spark Runner or the Python local portable runner. Kenn On Tue, Feb 1, 2022 at 1:10 PM John Casey wrote: > I'm investigating an issue where KafkaIO.read.withDynamicRead doesn't > appear to be working properly when used with the SDF based reader. > Specifically, it doesn't appear that the pipeline picks up any new topics > or partitions. > > I'm running locally using the DirectRunner, and I've set breakpoints at > the start of WatchKafkaTopPartitionDoFn::processElement and ~::onTimer. > > It looks like the initial processElement works fine. It is called once, > and populates the pipeline with the initial state of Kafka. However, the > onTimer method is never called. I've configured the timer to be 1 minute, > and I've waited about 20 minutes, but the method never gets called, which > means that no new partitions are set up. > > My current (unvalidated) suspicion is that the way we are creating splits > for Kafka is causing the timer to never hit that 1 minute mark, preventing > onTimer from being called. > > Is someone familiar with how Java Timers work or what might be causing a > timer to not trigger? > > Thanks, > John >
Re: Javascript SDK
The best way to not release is to have it in it's own branch off the mainline. That was the original tactic employed by the SDK, until that branch got merged in and was unable to be disentangled. Then it was mostly a matter of not doing any of the container type code for it. Ultimately once it's in master, it's part of the repo and will be part of a given release versions archive of the repo. Technically with Go, because Go package versions are automatically tied to the repos tags, we had been "releasing" versions of the SDK anyway. I don't think thats true for Node and it's package management. Specifically for Node, to do more not releasing, we also could avoid, publishing the code there. But i don't know anything about it, it could be as simple as Do Nothing. Personally, if it gets added to the repo at all I'd rather we rip off the band-aid and at least have all the tests regularly run, and various GitHub actions. Even if we aren't doing the container release activities, because it's experimental, that's much better than bit rot and being part of the main repo has a simpler contribution convention. Those are my 2 cents. Robert B Beam Go Busybody On Thu, Feb 3, 2022, 3:29 PM Kenneth Knowles wrote: > We did the same for the Go SDK for some time. I imagine just "not doing > the work to release it" suffices? Maybe +Robert Burke has > some other memories of how to not release. > > Kenn > > On Mon, Jan 31, 2022 at 1:05 PM Kerry Donny-Clark > wrote: > >> This project was a great way to kickstart a new SDK. I'd like to bring >> this into Beam and start cleanup. Are there any steps to take before making >> a PR? Is there a way to mark this as experimental/not for release? >> Kerry >> >> On Mon, Jan 17, 2022 at 1:22 AM Pablo Estrada wrote: >> >>> This project was fun, and I learned a lot putting some time into it. I'd >>> love for it to be brought into the main repository and worked over some >>> time to be fully supported. >>> Best >>> -P. >>> >>> On Fri, Jan 14, 2022 at 4:46 PM Ahmet Altay wrote: >>> Really nice! Congratulations to all who worked on this project. On Fri, Jan 14, 2022 at 4:41 PM Kenneth Knowles wrote: > This was super fun, and I really hope it can be an inspiration to > others that you can build a working Beam SDK in a week! > > (hint hint https://issues.apache.org/jira/browse/BEAM-4010 and > https://issues.apache.org/jira/browse/BEAM-12658 :-) > > On Fri, Jan 14, 2022 at 11:38 AM Robert Bradshaw > wrote: > >> And, of course, an example: >> >> >> https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/src/apache_beam/examples/wordcount.ts >> >> On Fri, Jan 14, 2022 at 11:35 AM Robert Bradshaw >> wrote: >> > >> > Last week at Google we had a hackathon to kick off the new year, and >> > one of the projects we came up with was seeing how far we could get >> in >> > putting together a typescript SDK. Starting from nothing we were >> able >> > to make a lot of progress and I wanted to share the results here. >> > >> > >> https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/README.md >> > >> > I think this is an exciting project and look forward to officially >> > supporting a new language. Clearly there is still a fair amount to >> do, >> > and we also need to figure out the best way to get this reviewed >> (we'd >> > especially welcome feedback (and contributions) from those, if any, >> in >> > the know about javascript/typescript/node even if they're not beam >> or >> > distributed computing experts) and into the main repository >> (assuming >> > the community is as interested in this as I am). >> > >> > The above link is a decent overview, but copying below for posterity >> > as that will likely evolve over time (e.g. as decisions get made and >> > TODOs get resolved). >> > >> > - Robert >> > >> > >> > >> > >> > # Node Beam SDK >> > >> > This is the start of a fully functioning Javascript (actually, >> > Typescript) SDK. There are two distinct aims with this SDK >> > >> > 1. Tap into the large (and relatively underserved, by existing data >> > processing frameworks) community of javascript developers with a >> > native SDK targeting this language. >> > >> > 1. Develop a new SDK which can serve both as a proof of concept and >> > reference that highlights the (relative) ease of porting Beam to new >> > languages, a differentiating feature of Beam and Dataflow. >> > >> > To accomplish this, we lean heavily on the portability framework. >> For >> > example, we make heavy use of cross-language transforms, in >> particular >> > for IOs (as a full SDF implementation may not fit into the week). In >> > addition, the direct runner is simply a
Re: Thoughts from a first time contributor
Welcome to the community Danny, and thanks for the feedback! With some of our planned improvements on Beam's landing page, we are hoping to address the 1st friction point you mentioned. We have received similar feedback in the past a few times. Regarding the 3rd feedback, I am hoping we can continue the discussion on GH vs Jira issues, and some improvements will be introduced there as well. Thanks again for the write up! On Thu, Feb 3, 2022 at 3:27 PM Kenneth Knowles wrote: > Thanks for writing all this up and for putting up PRs to improve things! > > Kenn > > On Mon, Jan 31, 2022 at 12:17 PM Danny McCormick < > dannymccorm...@google.com> wrote: > >> 👋 Hey folks, my name is Danny - I recently completed my first Beam >> PR[1] (a small extension to the Go Dataflow runner) and am planning on >> becoming a more regular part of the community. As such, I wanted to use my >> fresh newbie eyes and share some of what was nice and where there was >> friction about getting started. >> >> Disclaimer: this is coming from the perspective of someone who is pretty >> used to open source development, but has minimal experience with the Apache >> way, Beam, and the languages my change came in. I'm hoping my experience is >> helpful to those of you who have been around for a while and haven't seen >> things as a newcomer in a long time, but it may not be reflective of the >> experience of others. >> >> *Things that were really nice:* >> >> - The community has been really welcoming and encouraging of >> contributions, something I saw in my first code review, my first pr, and >> even the tone of the docs. Special thanks to @lostluck and @jrmccluskey for >> making my first interactions welcoming and prompt. That experience can be >> the difference between one time and repeat contributors. >> >> - Getting started writing my first pipeline, and then ramping up to more >> complex concepts was surprisingly easy - in particular, the docs, examples, >> and Katas made for a reasonably smooth process. It wasn't always clear how >> to go from that to more complex transforms and there's of course room for >> more clarity, but I appreciate the work that's gone into the getting >> started experience. >> >> - Overall, the code base is pretty easily understood/reasoned about, and >> the high quality of code made it pretty easy to make my first change. I'm >> pretty impressed at how simple/well composed this system is even as it >> approaches a tricky problem space (hopefully I'm saying the same thing >> after I make some bigger changes :)) >> >> *Friction Points:* >> >> - It was harder than expected for me to figure out what made Beam >> different/special from other tools in the space out there for users. >> Specifically, it wasn't immediately obvious why I would use Beam instead of >> just running my jobs directly on Spark or Flink or one of the other >> runners. One pretty big challenge here was that I didn't really get how >> easy it was to switch runner types/how powerful the portability (and >> unified streaming/batch) model was. I'm not sure exactly what would make >> this easier for someone new to the space, but some sort of graphic or brief >> statement of "this is where Beam adds value over most other frameworks" in >> the Readme would be cool. >> >> - There were some small paper cut usability things in the repo. >> Specifically, it looks like the labeler is broken (issue[2] and pr[3] added >> to address this) and there isn't a CONTRIBUTING.md (issue[4] and pr[5] >> added to address this), though the contribution guide is linked elsewhere. >> Both of these are probably non-issues for experienced contributors, but add >> a small amount of friction for people who are trying to get involved, >> especially those who navigate a fair amount of OSS repos, and they're >> pretty easy to fix. >> >> - I know there's some separate discussion about this in a different >> thread[6], but the use of Jira instead of GitHub issues added a layer of >> friction to getting started. Concretely, I would've put up my first pull >> request and created my first issue earlier if I didn't need to go through >> the process of creating a Jira account and getting permissions to assign >> tickets, and it was harder to find a good first issue to contribute to. I >> can imagine others might not have pushed through that. >> >> With all that said, I'm really excited to be joining the community and >> get to add to Beam, and I hope it was helpful or interesting to get a >> newbies perspective. >> >> [1] https://github.com/apache/beam/pull/16643 >> [2] https://issues.apache.org/jira/browse/BEAM-13779 >> [3] https://github.com/apache/beam/pull/16665 >> [4] https://issues.apache.org/jira/browse/BEAM-13780 >> [5] https://github.com/apache/beam/pull/1 >> [6] https://lists.apache.org/thread/q5nbwxqvfkzlz664c4kchzkbj26c3r89 >> >> Thanks, >> Danny >> >
Re: Javascript SDK
We did the same for the Go SDK for some time. I imagine just "not doing the work to release it" suffices? Maybe +Robert Burke has some other memories of how to not release. Kenn On Mon, Jan 31, 2022 at 1:05 PM Kerry Donny-Clark wrote: > This project was a great way to kickstart a new SDK. I'd like to bring > this into Beam and start cleanup. Are there any steps to take before making > a PR? Is there a way to mark this as experimental/not for release? > Kerry > > On Mon, Jan 17, 2022 at 1:22 AM Pablo Estrada wrote: > >> This project was fun, and I learned a lot putting some time into it. I'd >> love for it to be brought into the main repository and worked over some >> time to be fully supported. >> Best >> -P. >> >> On Fri, Jan 14, 2022 at 4:46 PM Ahmet Altay wrote: >> >>> Really nice! Congratulations to all who worked on this project. >>> >>> On Fri, Jan 14, 2022 at 4:41 PM Kenneth Knowles wrote: >>> This was super fun, and I really hope it can be an inspiration to others that you can build a working Beam SDK in a week! (hint hint https://issues.apache.org/jira/browse/BEAM-4010 and https://issues.apache.org/jira/browse/BEAM-12658 :-) On Fri, Jan 14, 2022 at 11:38 AM Robert Bradshaw wrote: > And, of course, an example: > > > https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/src/apache_beam/examples/wordcount.ts > > On Fri, Jan 14, 2022 at 11:35 AM Robert Bradshaw > wrote: > > > > Last week at Google we had a hackathon to kick off the new year, and > > one of the projects we came up with was seeing how far we could get > in > > putting together a typescript SDK. Starting from nothing we were able > > to make a lot of progress and I wanted to share the results here. > > > > > https://github.com/robertwb/beam-javascript/blob/javascript/sdks/node-ts/README.md > > > > I think this is an exciting project and look forward to officially > > supporting a new language. Clearly there is still a fair amount to > do, > > and we also need to figure out the best way to get this reviewed > (we'd > > especially welcome feedback (and contributions) from those, if any, > in > > the know about javascript/typescript/node even if they're not beam or > > distributed computing experts) and into the main repository (assuming > > the community is as interested in this as I am). > > > > The above link is a decent overview, but copying below for posterity > > as that will likely evolve over time (e.g. as decisions get made and > > TODOs get resolved). > > > > - Robert > > > > > > > > > > # Node Beam SDK > > > > This is the start of a fully functioning Javascript (actually, > > Typescript) SDK. There are two distinct aims with this SDK > > > > 1. Tap into the large (and relatively underserved, by existing data > > processing frameworks) community of javascript developers with a > > native SDK targeting this language. > > > > 1. Develop a new SDK which can serve both as a proof of concept and > > reference that highlights the (relative) ease of porting Beam to new > > languages, a differentiating feature of Beam and Dataflow. > > > > To accomplish this, we lean heavily on the portability framework. For > > example, we make heavy use of cross-language transforms, in > particular > > for IOs (as a full SDF implementation may not fit into the week). In > > addition, the direct runner is simply an extension of the worker > > suitable for running on portable runners such as the ULR, which will > > directly transfer to running on production runners such as Dataflow > > and Flink. The target audience should hopefully not be put off by > > running other language code encapsulated in docker images. > > > > ## API > > > > We generally try to apply the concepts from the Beam API in a > > Typescript idiomatic way, but it should be noted that few of the > > initial developers have extensive (if any) Javascript/Typescript > > development experience, so feedback is greatly appreciated. > > > > In addition, some notable departures are taken from the traditional > SDKs: > > > > * We take a "relational foundations" approach, where [schema'd > > data]( > https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit#heading=h.puuotbien1gf > ) > > is the primary way to interact with data, and we generally eschew the > > key-value requiring transforms in favor of a more flexible approach > > naming fields or expressions. Javascript's native Object is used as > > the row type. > > > > * As part of being schema-first we also de-emphasize Coders as a > > first-class concept in the SDK, relegating it to an advance feature >>
Re: Thoughts from a first time contributor
Thanks for writing all this up and for putting up PRs to improve things! Kenn On Mon, Jan 31, 2022 at 12:17 PM Danny McCormick wrote: > 👋 Hey folks, my name is Danny - I recently completed my first Beam > PR[1] (a small extension to the Go Dataflow runner) and am planning on > becoming a more regular part of the community. As such, I wanted to use my > fresh newbie eyes and share some of what was nice and where there was > friction about getting started. > > Disclaimer: this is coming from the perspective of someone who is pretty > used to open source development, but has minimal experience with the Apache > way, Beam, and the languages my change came in. I'm hoping my experience is > helpful to those of you who have been around for a while and haven't seen > things as a newcomer in a long time, but it may not be reflective of the > experience of others. > > *Things that were really nice:* > > - The community has been really welcoming and encouraging of > contributions, something I saw in my first code review, my first pr, and > even the tone of the docs. Special thanks to @lostluck and @jrmccluskey for > making my first interactions welcoming and prompt. That experience can be > the difference between one time and repeat contributors. > > - Getting started writing my first pipeline, and then ramping up to more > complex concepts was surprisingly easy - in particular, the docs, examples, > and Katas made for a reasonably smooth process. It wasn't always clear how > to go from that to more complex transforms and there's of course room for > more clarity, but I appreciate the work that's gone into the getting > started experience. > > - Overall, the code base is pretty easily understood/reasoned about, and > the high quality of code made it pretty easy to make my first change. I'm > pretty impressed at how simple/well composed this system is even as it > approaches a tricky problem space (hopefully I'm saying the same thing > after I make some bigger changes :)) > > *Friction Points:* > > - It was harder than expected for me to figure out what made Beam > different/special from other tools in the space out there for users. > Specifically, it wasn't immediately obvious why I would use Beam instead of > just running my jobs directly on Spark or Flink or one of the other > runners. One pretty big challenge here was that I didn't really get how > easy it was to switch runner types/how powerful the portability (and > unified streaming/batch) model was. I'm not sure exactly what would make > this easier for someone new to the space, but some sort of graphic or brief > statement of "this is where Beam adds value over most other frameworks" in > the Readme would be cool. > > - There were some small paper cut usability things in the repo. > Specifically, it looks like the labeler is broken (issue[2] and pr[3] added > to address this) and there isn't a CONTRIBUTING.md (issue[4] and pr[5] > added to address this), though the contribution guide is linked elsewhere. > Both of these are probably non-issues for experienced contributors, but add > a small amount of friction for people who are trying to get involved, > especially those who navigate a fair amount of OSS repos, and they're > pretty easy to fix. > > - I know there's some separate discussion about this in a different > thread[6], but the use of Jira instead of GitHub issues added a layer of > friction to getting started. Concretely, I would've put up my first pull > request and created my first issue earlier if I didn't need to go through > the process of creating a Jira account and getting permissions to assign > tickets, and it was harder to find a good first issue to contribute to. I > can imagine others might not have pushed through that. > > With all that said, I'm really excited to be joining the community and get > to add to Beam, and I hope it was helpful or interesting to get a newbies > perspective. > > [1] https://github.com/apache/beam/pull/16643 > [2] https://issues.apache.org/jira/browse/BEAM-13779 > [3] https://github.com/apache/beam/pull/16665 > [4] https://issues.apache.org/jira/browse/BEAM-13780 > [5] https://github.com/apache/beam/pull/1 > [6] https://lists.apache.org/thread/q5nbwxqvfkzlz664c4kchzkbj26c3r89 > > Thanks, > Danny >
Google Healthcare API Java SDK: HTTP Client Retry
Hello, Did anyone work on HTTP Retry Client in the HealthCareAPI client. https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L587 Is it possible to add a param to the initClient and expose it as an API so the dataflow pipeline can pass a boolean if a retry is needed? https://github.com/apache/beam/blob/6814a06ac3d7f4f4a3e84a72bb5fae6d63a3b71a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HttpHealthcareApiClient.java#L732 Execute Bundle right now only take an input of FHIR Store name: https://github.com/apache/beam/blob/6814a06ac3d7f4f4a3e84a72bb5fae6d63a3b71a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L904 We need an option to send a retry flag so the HTTP client could retry failed REST API calls. Please let me know if this was tried before or if you need further discussion? Please invite me to slack channel mailing...@gmail.com so we could discuss in slack. Thanks, Sampat.
Re: Test
Hello, Could you invite me to the beam slack channel: mailing...@gmail.com Thanks, Sampat. On Thu, Feb 3, 2022 at 4:11 PM Sampat P wrote: > >
Flaky test issue report (48)
This is your daily summary of Beam's current flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake) These are P1 issues because they have a major negative impact on the community and make it hard to determine the quality of the software. https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing examples tests (created 2022-02-03) https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build daemon disappeared unexpectedly (created 2022-02-03) https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load cache entry (created 2022-02-01) https://issues.apache.org/jira/browse/BEAM-13783: apache_beam.transforms.combinefn_lifecycle_test.LocalCombineFnLifecycleTest.test_combine is flaky (created 2022-02-01) https://issues.apache.org/jira/browse/BEAM-13741: :sdks:java:extensions:sql:hcatalog:compileJava failing in beam_Release_NightlySnapshot (created 2022-01-25) https://issues.apache.org/jira/browse/BEAM-13708: flake: FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20) https://issues.apache.org/jira/browse/BEAM-13693: beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours (created 2022-01-19) https://issues.apache.org/jira/browse/BEAM-13575: Flink testParDoRequiresStableInput flaky (created 2021-12-28) https://issues.apache.org/jira/browse/BEAM-13525: Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests (created 2021-12-22) https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky (timing out) (created 2021-12-22) https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable ValidatesRunner streaming suite (created 2021-12-21) https://issues.apache.org/jira/browse/BEAM-13453: Flake in org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use (created 2021-12-13) https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is failing (created 2021-12-07) https://issues.apache.org/jira/browse/BEAM-13367: [beam_PostCommit_Python36] [ apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary (created 2021-12-01) https://issues.apache.org/jira/browse/BEAM-13312: org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle is flaky in Java Spark ValidatesRunner suite (created 2021-11-23) https://issues.apache.org/jira/browse/BEAM-13311: org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23) https://issues.apache.org/jira/browse/BEAM-13234: Flake in StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12) https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT flaky in beam_PostCommit_Java_DataflowV2 (created 2021-10-08) https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 - CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21) https://issues.apache.org/jira/browse/BEAM-12859: org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer is flaky (created 2021-09-08) https://issues.apache.org/jira/browse/BEAM-12858: org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler is flaky (created 2021-09-08) https://issues.apache.org/jira/browse/BEAM-12809: testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26) https://issues.apache.org/jira/browse/BEAM-12794: PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24) https://issues.apache.org/jira/browse/BEAM-12793: beam_PostRelease_NightlySnapshot failed (created 2021-08-24) https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16) https://issues.apache.org/jira/browse/BEAM-12673: apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey (created 2021-07-28) https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking in PipelineOptionsTest.test_display_data (created 2021-06-18) https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: Failed to read inputs in the data plane (created 2021-05-10) https://issues.apache.org/jira/browse/BEAM-12320: PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL PostCommit (created 2021-05-10) https://issues.apache.org/jira/browse/BEAM-12291: org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: false] is flaky (created 2021-05-05) https://issues.apache.org/jira/browse/BEAM-12200: SamzaStoreStateInternalsTest is flaky (created 2021-04-20) https:
P1 issues report (72)
This is your daily summary of Beam's current P1 issues, not including flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake). See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the meaning and expectations around P1 issues. https://issues.apache.org/jira/browse/BEAM-13811: Python postcommit failing examples tests (created 2022-02-03) https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink flaky: Connection refused (created 2022-02-03) https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override for Dev versions of the Go SDK. (created 2022-02-02) https://issues.apache.org/jira/browse/BEAM-13798: Upgrade Kubernetes Clusters (created 2022-02-01) https://issues.apache.org/jira/browse/BEAM-13787: Job in Flink + K8S Cluster exits without executing the pipeline (created 2022-02-01) https://issues.apache.org/jira/browse/BEAM-13781: grpc-netty-shaded version conflict (created 2022-01-31) https://issues.apache.org/jira/browse/BEAM-13769: beam_PreCommit_Python_Cron failing on test_create_uses_coder_for_pickling (created 2022-01-28) https://issues.apache.org/jira/browse/BEAM-13763: Rotate credentials for 'io-datastores' Kubernetes cluster (created 2022-01-28) https://issues.apache.org/jira/browse/BEAM-13741: :sdks:java:extensions:sql:hcatalog:compileJava failing in beam_Release_NightlySnapshot (created 2022-01-25) https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle (created 2022-01-21) https://issues.apache.org/jira/browse/BEAM-13694: beam_PostCommit_Java_Hadoop_Versions failing with ClassDefNotFoundError (created 2022-01-19) https://issues.apache.org/jira/browse/BEAM-13693: beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming timing out at 9 hours (created 2022-01-19) https://issues.apache.org/jira/browse/BEAM-13686: OOM while logging a large pipeline even when logging level is higher (created 2022-01-19) https://issues.apache.org/jira/browse/BEAM-13668: Java Spanner IO Request Count metrics broke backwards compatibility (created 2022-01-15) https://issues.apache.org/jira/browse/BEAM-13615: Bumping up FnApi environment version to 9 in Java, Python SDK (created 2022-01-07) https://issues.apache.org/jira/browse/BEAM-13606: bigtable io doesn't handle non-ok row mutations (created 2022-01-07) https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit mentions broken links, but passes. (created 2021-12-30) https://issues.apache.org/jira/browse/BEAM-13579: Cannot run python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 2021-12-29) https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic table destinations returns wrong tableId (created 2021-12-17) https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is failing (created 2021-12-07) https://issues.apache.org/jira/browse/BEAM-13376: Missing error for nonexistent column family BigTable (created 2021-12-03) https://issues.apache.org/jira/browse/BEAM-13237: org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView flaky on Dataflow Runner V2 (created 2021-11-12) https://issues.apache.org/jira/browse/BEAM-13203: Potential data loss when using SnsIO.writeAsync (created 2021-11-08) https://issues.apache.org/jira/browse/BEAM-13164: Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory (created 2021-11-01) https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi (created 2021-10-27) https://issues.apache.org/jira/browse/BEAM-13087: apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible (created 2021-10-20) https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does not emit data at GC time (created 2021-10-18) https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll do not follow spec (created 2021-10-18) https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files (created 2021-10-06) https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with random prefix (created 2021-10-04) https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in CombinePerKey operation (created 2021-09-26) https://issues.apache.org/jira/browse/BEAM-12867: Either Create or DirectRunner fails to produce all elements to the following transform (created 2021-09-09) https://issues.apache.org/jira/browse/BEA