Re: [Dataflow][Java] Guidance on Transform Mapping Streaming Update

2022-07-08 Thread Luke Cwik via dev
I was suggesting GCP support mainly because I don't think you want to share the 2.36 and 2.40 version of your job file publicly as someone familiar with the layout and format may spot a meaningful difference. Also, if it turns out that there is no meaningful difference between the two then the

Re: [RFC] Gather JMH performance metrics in Beam community-metrics

2022-07-12 Thread Luke Cwik via dev
This sounds great. Since every language has a benchmarking tool, we can start with JMH and expand from there. A key point is that we will want to dedicate a Jenkins machine exclusively to this when the microbenchmarks are running, otherwise we will have other competing Jenkins jobs using up CPU

Re: [Dataflow][Java] Guidance on Transform Mapping Streaming Update

2022-07-06 Thread Luke Cwik via dev
Does doing a pipeline update in 2.36 work or do you want to do an update to get the latest version? Feel free to share the job files with GCP support. It could be something internal but the coders for ephemeral steps that Dataflow adds are based upon existing coders within the graph. On Tue, Jul

[ANNOUNCE] New committer: Steven Niemitz

2022-07-19 Thread Luke Cwik via dev
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Steven Niemitz (sniemitz@) Steven started contributing to Beam in 2017 fixing bugs and improving logging and usability. Stevens most recent focus has been on performance optimizations within the Java SDK.

Re: Fun with WebAssembly transforms

2022-07-14 Thread Luke Cwik via dev
to >> try out the design options. I think we can simplify the problem by >> insisting that they are pure functions that do not access state or side >> inputs. >> >> On Wed, Jul 13, 2022 at 7:52 PM Luke Cwik via dev >> wrote: >> >>> I think a

Re: Fun with WebAssembly transforms

2022-07-13 Thread Luke Cwik via dev
First we'll want to choose whether we want to target Wasm, WASI or Wagi. WASI adds a lot of simple things like access to a clock, random number generator, ... that would expand the scope of what transpiled code can do. It is debatable whether we'll want the power to run the transpiled code as a

Re: Fun with WebAssembly transforms

2022-07-13 Thread Luke Cwik via dev
I think an easier target would be to support things like DynamicDestinations for Java IO connectors that are exposed as XLang for Go/Python. This is because Go/Python have good transpiling support to WebAssembly and we already exposed several Java IO XLang connectors already so its about plumbing

Re: [JmsIO] => Pull Request to fix message acknowledgement issue

2022-09-01 Thread Luke Cwik via dev
I have a better understanding of the problem after reviewing the doc and we need to decide on what lifecycle scope we want the `Connection`, `Session`, and `MessageConsumer` to have. It looks like for the `Connection` we should try to have at most one instance for the entire process per

Re: Upcoming potentially breaking change to CoGroupByKey

2022-09-06 Thread Luke Cwik via dev
We should send this out to us...@beam.apache.org so that they are aware of this change once commenting in the doc has settled. On Tue, Sep 6, 2022 at 1:59 PM Robert Burke wrote: > Thank you for already planning to *NOT* have this merged until after this > week's 2.42.0 cut. This Release Manager

Re: [JmsIO] => Pull Request to fix message acknowledgement issue

2022-09-08 Thread Luke Cwik via dev
n is active in “advance” in order to receive > message. > > Are we sure that all checkpoints are finalized when the reader is closed? > > > >1. Session scoped to the reader start/close > > It seems to be more or less the case currently. > > > > Regards &

Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-13 Thread Luke Cwik via dev
Thanks, I missed that when I was reviewing the issue. On Tue, Oct 11, 2022 at 5:01 PM Robert Burke wrote: > That merge commit doesn't appear in the 2.42.0 release branch, so I've > moved that issue to the 2.43.0 release milestone. > > On Tue, Oct 11, 2022, 4:07 PM Luke Cwik via

Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-11 Thread Luke Cwik via dev
I would like to point out that I found another regression due to the bigdataoss library upgrade from 2.2.6 to 2.2.8 ( https://github.com/apache/beam/pull/23300), filed https://github.com/apache/beam/issues/23588. On Mon, Oct 10, 2022 at 1:17 PM Robert Burke wrote: > Due to a process error on my

Vendored gRPC update

2022-08-04 Thread Luke Cwik via dev
I was looking to update gRPC that we use to the latest (1.48.1) version to move off of a vulnerable version of Netty that a user pointed out in BEAM-14118. This would supersede the work done in https://github.com/apache/beam/pull/17206 as that PR has stalled. If there aren't any concerns I'll

Re: [VOTE] Vendored Dependencies Release

2022-08-05 Thread Luke Cwik via dev
+1 I verified the signatures of the artifacts, that the jar doesn't contain classes outside of the org/apache/beam/vendor/grpc/v1p48p1 package and I tested the artifact against our precommits using https://github.com/apache/beam/pull/22595 On Fri, Aug 5, 2022 at 1:42 PM Luke Cwik wrote: >

[VOTE] Vendored Dependencies Release

2022-08-05 Thread Luke Cwik via dev
Please review the release of the following artifacts that we vendor: * beam-vendor-grpc-1_48_1 Hi everyone, Please review and vote on the release candidate #1 for the version 0.1, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The

Re: Beam gRPC depedency tracing

2022-08-08 Thread Luke Cwik via dev
I think you missed Kenn's earlier reply: https://lists.apache.org/thread/v0nr6mv0rqhd76ox1bwt6qwo4q3g7w58 The vendored gRPC is built by transforming the released gRPC jar. Here is where in the Beam git history you can find the source for the transformation:

Re: Beam Website Feedback

2022-08-08 Thread Luke Cwik via dev
Thanks. On Mon, Aug 8, 2022 at 8:12 AM Peter Simon wrote: > Awesome web UI > > Peter Simon > > *Data Scientist* > > > > e peter.si...@fanatical.com > > w fanatical.com > > Focus Multimedia Limited. > > The Studios, Lea Hall Enterprise Park, > > Wheelhouse Road, Brereton, Rugeley, > >

[RESULT] [VOTE] Vendored Dependencies Release

2022-08-08 Thread Luke Cwik via dev
Thanks! > -P. > > On Mon, Aug 8, 2022 at 9:24 AM Chamikara Jayalath via dev < > dev@beam.apache.org> wrote: > >> +1 >> >> Thanks, >> Cham >> >> On Fri, Aug 5, 2022 at 1:49 PM Luke Cwik via dev >> wrote: >> >>> +1 >>

Re: BigTable reader for Python?

2023-01-05 Thread Luke Cwik via dev
By default Beam Java only uploads artifacts that have changed but it looks like this is not the case for Beam Python and you need to explicitly opt in with the --enable_artifact_caching flag[1]. It looks like this feature was added 1 year ago[2], should we make this on by default? 1:

Re: BigTable reader for Python?

2023-01-06 Thread Luke Cwik via dev
The proto (java) -> bytes -> proto (python) sounds good. Have you tried moving your DoFn outside of your main module into a new module as per [1]. Other suggestions are to do the import in the function. Can you do the import once in the setup()[2] function? Have you considered using the cloud

Re: BigTable reader for Python?

2023-01-03 Thread Luke Cwik via dev
I would suggest using BigtableIO which also returns a protobuf com.google.bigtable.v2.Row. This should allow you to replicate what SpannerIO is doing. Alternatively you could provide a way to convert the HBase result into a Beam row by specifying a converter and a schema for it and then you could

Re: Beam Java SDK - ReadableState.read() shouldn't it be Nullable?

2023-01-03 Thread Luke Cwik via dev
It looks like there is an existing issue[1]. I updated our correspondence there and we should continue our communication there. 1: https://github.com/apache/beam/issues/24801, On Tue, Jan 3, 2023 at 1:22 PM Reuven Lax wrote: > Ah, that is fair. However right now that doesn't happen either. > >

Re: Beam Java SDK - ReadableState.read() shouldn't it be Nullable?

2023-01-03 Thread Luke Cwik via dev
I think in general ReadableState.read() should not be @Nullable but we should allow for the overrides like ValueState to specify that T can be @Nullable while others like ListState we should have List<@Nullable T>. On Tue, Jan 3, 2023 at 12:37 PM Reuven Lax via dev wrote: > It should be

Re: BigTable reader for Python?

2022-12-29 Thread Luke Cwik via dev
I would have expected a META-INF/services/org.apache.beam.sdk.expansion.ExternalTransformRegistrar file in the jar containing the fully qualified class name of BigtableRegistrar in it. See

Re: BigTable reader for Python?

2022-12-29 Thread Luke Cwik via dev
AutoService relies on Java's compiler annotation processor. https://github.com/google/auto/tree/main/service#getting-started shows that you need to configure Java's compiler to use the annotation processors within AutoService. I saw this public gist that seemed to enable using the AutoService

Re: Gradle Task Configuration Avoidance

2022-12-08 Thread Luke Cwik via dev
I have found the Gradle build reports very useful to enumerate deprecations and an easier thing to look at over the command line output. On Thu, Dec 8, 2022 at 8:26 AM Damon Douglas via dev wrote: > Thank you, Kerry, for your kind and encouraging words! > > Kenn, I wondered as well whether

Re: Gradle Task Configuration Avoidance

2022-12-09 Thread Luke Cwik via dev
checks. > > Best, > > Damon > > On Thu, Dec 8, 2022 at 8:59 AM Daniel Collins > wrote: > >> We could probably add a lint that rejects the spelling `task("` pretty >> easily that would catch most of these. >> >> On Thu, Dec 8, 2022 at 11:34 A

Re: @RequiresStableInput and Pipeline fusion

2022-12-13 Thread Luke Cwik via dev
This is definitely not working for portable pipelines since the GreedyPipelineFuser doesn't create a fusion boundary which as you pointed out causes a single stage that has a non-deterministic function followed by one that requires stable input. It seems as though we should have runners check the

Re: [DISCUSSION][JAVA] Current state of Java 17 support

2022-12-01 Thread Luke Cwik via dev
We do support JDK8, JDK11 and JDK17. Our story around newer features within JDKs 9+ like modules is mostly non-existent though. We rarely run into JDK specific issues, the latest were the TLS1 and TLS1.1 deprecation in newer patch versions of the JDK and also the docker cpu share issues with

Re: Thoughts on extensions/datasketches vs adding to the existing sketching library?

2023-01-18 Thread Luke Cwik via dev
I would suggest adding it to the existing package(s) (either sdks/java/extensions or sdks/java/zetasketch or both depending on if you're replacing existing sketches or adding new ones) since we shouldn't expose sketching libraries API surface. We should make the API take all the relevant

Re: [Proposal] Beam MultimapState API

2022-10-31 Thread Luke Cwik via dev
Thanks, I took a look and left some comments. On Mon, Oct 31, 2022 at 12:47 PM Ahmet Altay wrote: > Thank you for the message Buqian. Adding @Reuven Lax > @Lukasz > Cwik explicitly (who are mentioned on the doc). > > On Mon, Oct 31, 2022 at 12:17 PM 郑卜千 wrote: > >> Gentle ping. Thanks! >>

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread Luke Cwik via dev
gt; > >>>> > On Mon, Feb 13, 2023 at 5:17 AM Bruno Volpato via dev < >>>> dev@beam.apache.org> wrote: >>>> >> >>>> >> +1 (non-binding) >>>> >> >>>> >> Tested with https://github.com/GoogleC

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread Luke Cwik via dev
> > >>>>> > On 13 Feb 2023, at 17:54, Ahmet Altay via dev >>>>> wrote: >>>>> > >>>>> > +1 (binding) - I validated python quick starts on direct runner and >>>>> python streaming quickstart o

Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-16 Thread Luke Cwik via dev
I upgraded the docker version on Jenkins workers and the tests passed. (also installed Python 3.11 so we are ready for that) On Tue, Feb 14, 2023 at 3:21 PM Kenneth Knowles wrote: > SGTM. I asked on the PR if this could impact users, but having read the > docker release calendar I am not

Re: [ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-16 Thread Luke Cwik via dev
Congrats, well deserved. On Thu, Feb 16, 2023 at 10:32 AM Anand Inguva via dev wrote: > Congratulations!! > > On Thu, Feb 16, 2023 at 12:42 PM Chamikara Jayalath via dev < > dev@beam.apache.org> wrote: > >> Congrats Jan! >> >> On Thu, Feb 16, 2023 at 8:35 AM John Casey via dev >> wrote: >> >>>

Re: A user-deployable Beam Transform Service

2023-02-10 Thread Luke Cwik via dev
Seems like a useful thing to me and will make it easier for Beam users overall. On Fri, Feb 10, 2023 at 3:56 PM Robert Bradshaw via dev wrote: > Thanks. I added some comments to the doc. > > On Mon, Feb 6, 2023 at 1:33 PM Chamikara Jayalath via dev > wrote: > > > > Hi All, > > > > Beam

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-10 Thread Luke Cwik via dev
+1 Validated release artifact signatures and verified the Java Flink and Spark quickstarts. On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev wrote: > Addendum to above email. > > Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362 > > On Fri, Feb 10, 2023 at 11:14 AM John Casey

Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-09 Thread Luke Cwik via dev
Our current container java 8 container is 262 MiBs and layers on top of openjdk:8-bullseye which is 226 MiBs compressed while eclipse-temurin:8 is 92 MiBs compressed and eclipse-temurin:8-alpine is 65 MiBs compressed. I would rather not get into issues with C library differences caused by the

Re: Portable v.s. non-portable PTransform names

2023-01-31 Thread Luke Cwik via dev
The PCollection value comes from the key on the pipeline proto[1]. That key is populated during pipeline construction time[2] and is based upon the unique name of the PTransform + the name of the output being used (aka tag with .output being a default). It looks like the counter PTRANFORM is

OpenJDK8 / OpenJDK11 container deprecation

2023-02-07 Thread Luke Cwik via dev
As per [1], the JDK8 and JDK11 containers that Apache Beam uses have stopped being built and supported since July 2022. I have filed [2] to track the resolution of this issue. Based upon [1], almost everyone is swapping to the eclipse-temurin container[3] as their base based upon the linked

Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-14 Thread Luke Cwik via dev
I made some progress in testing the container and did hit an issue where Ubuntu 22.04 "Jammy" is dependent on the version of Docker installed. It turns out that our boot.go crashes with "runtime/cgo: pthread_create failed: Operation not permitted" because the Ubuntu 22.04 is using new syscalls