Re: A lesson about DoFn retries

2022-09-01 Thread Brian Hulette via dev
Thanks for sharing the learnings Ahmed! > The solution lies in keeping the retry of each step separate. A good example of this is in how steps 2 and 3 are implemented [3]. They are separated into different DoFns and step 3 can start only after step 2 completes successfully. This way, any failure

Re: Cannot find beam in project list on jira when I create issue

2022-09-07 Thread Brian Hulette via dev
Thank you Moritz for updating the docs! On Wed, Sep 7, 2022 at 3:06 AM Moritz Mack wrote: > Sorry for the confusion. Beam migrated to using Github issues just > recently and the confluence docs haven’t been updated yet. > > > > Please create a new issue under

Re: Incomplete Beam Schema -> Avro Schema conversion

2022-08-22 Thread Brian Hulette via dev
I don't think there's a reason for this, it's just that these logical types were defined after the Avro <-> Beam schema conversion. I think it would be worthwhile to add support for them, but we'd also need to look at the reverse (avro to beam) direction, which would map back to the catch-all

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-25 Thread Brian Hulette via dev
Thanks for writing this up Valentyn! I'm curious Jarek, does Airflow take any dependencies on popular libraries like pandas, numpy, pyarrow, scipy, etc... which users are likely to have their own dependency on? I think these dependencies are challenging in a different way than the client

Re: Beam Website Feedback

2022-10-27 Thread Brian Hulette via dev
I proposed https://github.com/apache/beam/pull/23877 to address this. On Thu, Oct 27, 2022 at 2:12 PM Sachin Agarwal wrote: > No objections here. The latter (the surviving) is the one linked in > the top navigation bar and has the x-lang details that help. > > On Thu, Oct 27, 2022 at 2:09 PM

Re: Beam starter projects dependency updates

2022-10-27 Thread Brian Hulette via dev
Could we just use the same set of reviewers as pr-bot in the main repo [1]? I don't think that we could avoid duplicating the data though. [1] https://github.com/apache/beam/blob/728e8ecc8a40d3d578ada7773b77eca2b3c68d03/.github/REVIEWERS.yml On Thu, Oct 27, 2022 at 12:20 PM David Cavazos via dev

Re: Beam Website Feedback

2022-10-27 Thread Brian Hulette via dev
Hm, it seems like we need to drop https://beam.apache.org/documentation/io/built-in/ as it's been superseded by https://beam.apache.org/documentation/io/connectors/ Would there be any objections to that? On Thu, Oct 27, 2022 at 2:04 PM Sachin Agarwal via dev wrote: > JDBCIO is available as a

Re: Cartesian product of PCollections

2022-09-19 Thread Brian Hulette via dev
In SQL we just don't support cross joins currently [1]. I'm not aware of an existing implementation of a cross join/cartesian product. > My team has an internal implementation of a CartesianProduct transform, based on using hashing to split a pcollection into a finite number of groups and

Re: Out of band pickling in Python (pickle5)

2022-09-19 Thread Brian Hulette via dev
I got to thinking about this again and ran some benchmarks. The result is documented in the GitHub issue [1]. tl;dr: we can't realize a huge benefit since we don't actually have an out-of-band path for exchanging the buffers. However, pickle 5 can yield improved in-band performance as well, and I

Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-15 Thread Brian Hulette via dev
Is there somewhere we could document this? On Thu, Sep 15, 2022 at 6:45 AM Moritz Mack wrote: > Thank you, Andrew! > > Exactly what I was looking for, that’s awesome! > > > > On 15.09.22, 06:37, "Alexey Romanenko" wrote: > > > > > > Ahh, great! I didn’t know that 'beam-perf’ label is used for

Re: What to do about issues that track flaky tests?

2022-09-15 Thread Brian Hulette via dev
I agree with Austin on this one, it makes sense to be realistic, but I'm concerned about just blanket reducing the priority on all flakes. Two classes of issues that could certainly be dropped to P2: - Issues tracking flakes that have not been sickbayed yet (e.g.

Re: Beam Website Feedback

2022-10-04 Thread Brian Hulette via dev
On Tue, Oct 4, 2022 at 8:58 AM Alexey Romanenko wrote: > Thanks for your feedback. > > At the time, using a Google website search was a simplest solution since, > before, we didn’t have a search at all. I agree that it could be > frustrating to have ad links before the actual results (not sure

Re: Beam Website Feedback

2022-10-03 Thread Brian Hulette via dev
Thanks Borris, that is helpful feedback. I filed an issue [1] to track improving this. [1] https://github.com/apache/beam/issues/23472 On Mon, Oct 3, 2022 at 2:32 PM Borris wrote: > This is my experience of trying the search capability. > >- I know I want to read about dataframes (I was

Re: Join a meeting to help coordinate implementing a Dask Runner for Beam

2022-08-03 Thread Brian Hulette via dev
I wanted to share that Ryan gave a presentation about his (and Charles') work on Pangeo Forge at Scipy 2022 (in Austin just before Beam Summit!), with a couple mentions of their transition to Beam [1]. There were also a couple of other talks about Pangeo [2,3] with some Beam/xarray-beam references

Re: Easy Multi-language via a SchemaTransform-aware Expansion Service

2022-08-05 Thread Brian Hulette via dev
Thanks Cham! I really like the proposal, I left a few comments. I also had one higher-level point I wanted to elevate here: > Pipeline SDKs can generate user-friendly stub-APIs based on transforms registered with an expansion service, eliminating the need to develop language-specific wrappers.

Re: Design Doc for Controlling Batching in RunInference

2022-08-12 Thread Brian Hulette via dev
Hi Andy, Thanks for writing this up! This seems like something that Batched DoFns could help with. Could we make a BatchConverter [1] that represents the necessary transformations here, and define RunInference as a Batched DoFn? Note that the Numpy BatchConverter already enables users to specify

Re: Representation of logical type beam:logical_type:datetime:v1

2022-08-12 Thread Brian Hulette via dev
Ah sorry, I forgot that INT64 is encoded with VarIntCoder, so we can't simulate TimestampCoder with a logical type. I think the ideal end state would be to have a well-defined beam:logical_type:millis_instant that we use for cross-language (when appropriate), and never use DATETIME at

[Python][Bikeshed] typehint vs. type-hint vs. "type hint"

2022-11-07 Thread Brian Hulette via dev
Hi everyone, In a recent code review we noticed that we are not consistent when describing python type hints in documentation. Depending on who wrote the patch, we switch between typehint, type-hint, and "type hint" [1]. I think we should standardize on "type hint" as this is what Guido used in

Re: Beam starter projects dependency updates

2022-11-07 Thread Brian Hulette via dev
These have all been addressed. I went through and merged all of them, except for the slf4j-jdk14 dependency in Java and Kotlin. After consulting with Luke [1] I told dependabot to ignore this dependency. [1] https://github.com/apache/beam-starter-java/pull/26#issuecomment-130263Java9941

Re: [ANNOUNCE] New committer: Yi Hu

2022-11-09 Thread Brian Hulette via dev
Well deserved! Congratulations Yi On Wed, Nov 9, 2022 at 11:25 AM Valentyn Tymofieiev via dev < dev@beam.apache.org> wrote: > I am with the Beam PMC on this, congratulations and very well deserved, Yi! > > On Wed, Nov 9, 2022 at 11:08 AM Byron Ellis via dev > wrote: > >> Congratulations! >> >>