Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-16 Thread Luke Cwik via dev
r I am not concerned. The last update to the old > version was in 2019, and the introduction of compatible versions was 2020. > > On Tue, Feb 14, 2023 at 3:01 PM Byron Ellis via user > wrote: > >> FWIW I am Team Upgrade Docker :-) >> >> On Tue, Feb 14, 2023 at 2:53 PM Luke

Re: [ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-16 Thread Luke Cwik via dev
Congrats, well deserved. On Thu, Feb 16, 2023 at 10:32 AM Anand Inguva via dev wrote: > Congratulations!! > > On Thu, Feb 16, 2023 at 12:42 PM Chamikara Jayalath via dev < > dev@beam.apache.org> wrote: > >> Congrats Jan! >> >> On Thu, Feb 16, 2023 at 8:35 AM John Casey via dev >> wrote: >> >>>

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread Luke Cwik via dev
All the PMC finalization tasks have been completed. On Thu, Feb 16, 2023 at 8:56 AM Luke Cwik wrote: > I'll help out. > > On Thu, Feb 16, 2023 at 7:08 AM John Casey via dev > wrote: > >> Can a PMC member help me with PMC only release finalization? >> https://beam.ap

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread Luke Cwik via dev
M John Casey > wrote: > >> With 7 approving votes, of which 5 are binding, 2.45.0 RC1 has been >> approved. >> >> Binding votes are: >> >> Luke Cwik >> Chamikara Jayalath >> Ahmet Altay >> Alexey Romanenko >> Robert Bradshaw >> >>

Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-14 Thread Luke Cwik via dev
ideas? On Thu, Feb 9, 2023 at 10:20 AM Luke Cwik wrote: > Our current container java 8 container is 262 MiBs and layers on top of > openjdk:8-bullseye which is 226 MiBs compressed while eclipse-temurin:8 is > 92 MiBs compressed and eclipse-temurin:8-alpine is 65 MiBs compressed. > &

Re: A user-deployable Beam Transform Service

2023-02-10 Thread Luke Cwik via dev
Seems like a useful thing to me and will make it easier for Beam users overall. On Fri, Feb 10, 2023 at 3:56 PM Robert Bradshaw via dev wrote: > Thanks. I added some comments to the doc. > > On Mon, Feb 6, 2023 at 1:33 PM Chamikara Jayalath via dev > wrote: > > > > Hi All, > > > > Beam

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-10 Thread Luke Cwik via dev
+1 Validated release artifact signatures and verified the Java Flink and Spark quickstarts. On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev wrote: > Addendum to above email. > > Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362 > > On Fri, Feb 10, 2023 at 11:14 AM John Casey

Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-09 Thread Luke Cwik via dev
; On Tue, Feb 7, 2023 at 5:18 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> Seams reasonable to me. >> >> On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user >> wrote: >> > >> > As per [1], the JDK8 and JDK11 containers tha

OpenJDK8 / OpenJDK11 container deprecation

2023-02-07 Thread Luke Cwik via dev
As per [1], the JDK8 and JDK11 containers that Apache Beam uses have stopped being built and supported since July 2022. I have filed [2] to track the resolution of this issue. Based upon [1], almost everyone is swapping to the eclipse-temurin container[3] as their base based upon the linked

Re: Portable v.s. non-portable PTransform names

2023-01-31 Thread Luke Cwik via dev
The PCollection value comes from the key on the pipeline proto[1]. That key is populated during pipeline construction time[2] and is based upon the unique name of the PTransform + the name of the output being used (aka tag with .output being a default). It looks like the counter PTRANFORM is

Re: Thoughts on extensions/datasketches vs adding to the existing sketching library?

2023-01-18 Thread Luke Cwik via dev
I would suggest adding it to the existing package(s) (either sdks/java/extensions or sdks/java/zetasketch or both depending on if you're replacing existing sketches or adding new ones) since we shouldn't expose sketching libraries API surface. We should make the API take all the relevant

Re: BigTable reader for Python?

2023-01-06 Thread Luke Cwik via dev
apiclient:Completed GCS >>>> upload to >>>> gs://hce-mimo-inbox/beam_temp/beamapp-builder-0105191153-992959-3fhktuyb.1672945913.993243/java_bigtable_deploy-Ed1r7YOeLKLTmg2RGNktkym9sVYciCiielpk61r6CJ4.jar >>>> in 295 seconds. >>>> I have a total

Re: BigTable reader for Python?

2023-01-05 Thread Luke Cwik via dev
t about my own jar - it's not bound to change very often, so would > it be possible to upload somewhere and then fetch it from there? > > Thanks! > -Lina > > On Tue, Jan 3, 2023 at 1:23 PM Luke Cwik wrote: > >> I would suggest using BigtableIO which also returns a >

Re: Beam Java SDK - ReadableState.read() shouldn't it be Nullable?

2023-01-03 Thread Luke Cwik via dev
her. > > On Tue, Jan 3, 2023 at 12:59 PM Luke Cwik wrote: > >> I think in general ReadableState.read() should not be @Nullable but we >> should allow for the overrides like ValueState to specify that T can >> be @Nullable while others like ListState we should have List<

Re: BigTable reader for Python?

2023-01-03 Thread Luke Cwik via dev
t;> ], >> >> ) >> >> >> java_binary( >> >> name = "java_hbase", >> >> main_class = "energy.camus.beam.BigtableRegistrar", >> >> plugins = [":auto_service_processor"], >> >> srcs = ["src/main/ja

Re: Beam Java SDK - ReadableState.read() shouldn't it be Nullable?

2023-01-03 Thread Luke Cwik via dev
I think in general ReadableState.read() should not be @Nullable but we should allow for the overrides like ValueState to specify that T can be @Nullable while others like ListState we should have List<@Nullable T>. On Tue, Jan 3, 2023 at 12:37 PM Reuven Lax via dev wrote: > It should be

Re: BigTable reader for Python?

2022-12-29 Thread Luke Cwik via dev
AutoService relies on Java's compiler annotation processor. https://github.com/google/auto/tree/main/service#getting-started shows that you need to configure Java's compiler to use the annotation processors within AutoService. I saw this public gist that seemed to enable using the AutoService

Re: BigTable reader for Python?

2022-12-29 Thread Luke Cwik via dev
I would have expected a META-INF/services/org.apache.beam.sdk.expansion.ExternalTransformRegistrar file in the jar containing the fully qualified class name of BigtableRegistrar in it. See

Re: @RequiresStableInput and Pipeline fusion

2022-12-13 Thread Luke Cwik via dev
This is definitely not working for portable pipelines since the GreedyPipelineFuser doesn't create a fusion boundary which as you pointed out causes a single stage that has a non-deterministic function followed by one that requires stable input. It seems as though we should have runners check the

Re: Gradle Task Configuration Avoidance

2022-12-09 Thread Luke Cwik via dev
checks. > > Best, > > Damon > > On Thu, Dec 8, 2022 at 8:59 AM Daniel Collins > wrote: > >> We could probably add a lint that rejects the spelling `task("` pretty >> easily that would catch most of these. >> >> On Thu, Dec 8, 2022 at 11:34 A

Re: Gradle Task Configuration Avoidance

2022-12-08 Thread Luke Cwik via dev
I have found the Gradle build reports very useful to enumerate deprecations and an easier thing to look at over the command line output. On Thu, Dec 8, 2022 at 8:26 AM Damon Douglas via dev wrote: > Thank you, Kerry, for your kind and encouraging words! > > Kenn, I wondered as well whether

Re: [DISCUSSION][JAVA] Current state of Java 17 support

2022-12-01 Thread Luke Cwik via dev
We do support JDK8, JDK11 and JDK17. Our story around newer features within JDKs 9+ like modules is mostly non-existent though. We rarely run into JDK specific issues, the latest were the TLS1 and TLS1.1 deprecation in newer patch versions of the JDK and also the docker cpu share issues with

Re: [Proposal] Beam MultimapState API

2022-10-31 Thread Luke Cwik via dev
Thanks, I took a look and left some comments. On Mon, Oct 31, 2022 at 12:47 PM Ahmet Altay wrote: > Thank you for the message Buqian. Adding @Reuven Lax > @Lukasz > Cwik explicitly (who are mentioned on the doc). > > On Mon, Oct 31, 2022 at 12:17 PM 郑卜千 wrote: > >> Gentle ping. Thanks! >>

Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-13 Thread Luke Cwik via dev
Thanks, I missed that when I was reviewing the issue. On Tue, Oct 11, 2022 at 5:01 PM Robert Burke wrote: > That merge commit doesn't appear in the 2.42.0 release branch, so I've > moved that issue to the 2.43.0 release milestone. > > On Tue, Oct 11, 2022, 4:07 PM Luke Cwik via

Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-11 Thread Luke Cwik via dev
I would like to point out that I found another regression due to the bigdataoss library upgrade from 2.2.6 to 2.2.8 ( https://github.com/apache/beam/pull/23300), filed https://github.com/apache/beam/issues/23588. On Mon, Oct 10, 2022 at 1:17 PM Robert Burke wrote: > Due to a process error on my

Re: [JmsIO] => Pull Request to fix message acknowledgement issue

2022-09-08 Thread Luke Cwik via dev
n is active in “advance” in order to receive > message. > > Are we sure that all checkpoints are finalized when the reader is closed? > > > >1. Session scoped to the reader start/close > > It seems to be more or less the case currently. > > > > Regards &

Re: Upcoming potentially breaking change to CoGroupByKey

2022-09-06 Thread Luke Cwik via dev
We should send this out to us...@beam.apache.org so that they are aware of this change once commenting in the doc has settled. On Tue, Sep 6, 2022 at 1:59 PM Robert Burke wrote: > Thank you for already planning to *NOT* have this merged until after this > week's 2.42.0 cut. This Release Manager

Re: [JmsIO] => Pull Request to fix message acknowledgement issue

2022-09-01 Thread Luke Cwik via dev
I have a better understanding of the problem after reviewing the doc and we need to decide on what lifecycle scope we want the `Connection`, `Session`, and `MessageConsumer` to have. It looks like for the `Connection` we should try to have at most one instance for the entire process per

[RESULT] [VOTE] Vendored Dependencies Release

2022-08-08 Thread Luke Cwik via dev
I'm happy to announce that we have unanimously approved this release. There are 3 approving votes, 3 of which are binding: * Luke Cwik * Pablo Estrada * Chamikara Jayalath There are no disapproving votes. Thanks everyone! On Mon, Aug 8, 2022 at 9:47 AM Pablo Estrada wrote: > +1 >

Re: Beam Website Feedback

2022-08-08 Thread Luke Cwik via dev
Thanks. On Mon, Aug 8, 2022 at 8:12 AM Peter Simon wrote: > Awesome web UI > > Peter Simon > > *Data Scientist* > > > > e peter.si...@fanatical.com > > w fanatical.com > > Focus Multimedia Limited. > > The Studios, Lea Hall Enterprise Park, > > Wheelhouse Road, Brereton, Rugeley, > >

Re: Beam gRPC depedency tracing

2022-08-08 Thread Luke Cwik via dev
I think you missed Kenn's earlier reply: https://lists.apache.org/thread/v0nr6mv0rqhd76ox1bwt6qwo4q3g7w58 The vendored gRPC is built by transforming the released gRPC jar. Here is where in the Beam git history you can find the source for the transformation:

Re: [VOTE] Vendored Dependencies Release

2022-08-05 Thread Luke Cwik via dev
+1 I verified the signatures of the artifacts, that the jar doesn't contain classes outside of the org/apache/beam/vendor/grpc/v1p48p1 package and I tested the artifact against our precommits using https://github.com/apache/beam/pull/22595 On Fri, Aug 5, 2022 at 1:42 PM Luke Cwik wrote

[VOTE] Vendored Dependencies Release

2022-08-05 Thread Luke Cwik via dev
Please review the release of the following artifacts that we vendor: * beam-vendor-grpc-1_48_1 Hi everyone, Please review and vote on the release candidate #1 for the version 0.1, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The

Vendored gRPC update

2022-08-04 Thread Luke Cwik via dev
I was looking to update gRPC that we use to the latest (1.48.1) version to move off of a vulnerable version of Netty that a user pointed out in BEAM-14118. This would supersede the work done in https://github.com/apache/beam/pull/17206 as that PR has stalled. If there aren't any concerns I'll

[ANNOUNCE] New committer: Steven Niemitz

2022-07-19 Thread Luke Cwik via dev
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Steven Niemitz (sniemitz@) Steven started contributing to Beam in 2017 fixing bugs and improving logging and usability. Stevens most recent focus has been on performance optimizations within the Java SDK.

Re: Fun with WebAssembly transforms

2022-07-14 Thread Luke Cwik via dev
to >> try out the design options. I think we can simplify the problem by >> insisting that they are pure functions that do not access state or side >> inputs. >> >> On Wed, Jul 13, 2022 at 7:52 PM Luke Cwik via dev >> wrote: >> >>> I think a

Re: Fun with WebAssembly transforms

2022-07-13 Thread Luke Cwik via dev
system and the transpiled WebAssembly code that wraps the users UDF/UDAF and what if the UDF wants access to side inputs or user state ... On Wed, Jul 13, 2022 at 4:09 PM Chamikara Jayalath wrote: > > > On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik wrote: > >> First we'll want to choo

Re: Fun with WebAssembly transforms

2022-07-13 Thread Luke Cwik via dev
2 at 5:51 PM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > > > On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik wrote: > >> I have had interest in integrating Wasm within Beam as well as I have had >> a lot of interest in improving language portability. >>

Re: [RFC] Gather JMH performance metrics in Beam community-metrics

2022-07-12 Thread Luke Cwik via dev
This sounds great. Since every language has a benchmarking tool, we can start with JMH and expand from there. A key point is that we will want to dedicate a Jenkins machine exclusively to this when the microbenchmarks are running, otherwise we will have other competing Jenkins jobs using up CPU

Re: [Dataflow][Java] Guidance on Transform Mapping Streaming Update

2022-07-08 Thread Luke Cwik via dev
ps like > "Unzipped-2/FlattenReplace" that aren't in the job file. > > Thanks, > Evan > > On Wed, Jul 6, 2022 at 4:21 PM Luke Cwik via user > wrote: > >> Does doing a pipeline update in 2.36 work or do you want to do an update >> to get the latest version

Re: [Dataflow][Java] Guidance on Transform Mapping Streaming Update

2022-07-06 Thread Luke Cwik via dev
was previously unaware), and the >> "output_info" was identical for both the running pipeline and the pipeline >> I was hoping to update it with. I ended up opting to just drain and submit >> the updated pipeline as a new job. Thanks for the tips! >> >> Tha

Re: Force SDF to run on every task the JAR is loaded on?

2021-10-07 Thread Luke Cwik
l outstanding though, is there a way to > ensure that on all tasks the pipeline JAR is loaded on, it actually will > run to avoid stranding user messages? > > Not that I'm aware of. > -Dan > > On Thu, Oct 7, 2021 at 12:53 PM Luke Cwik wrote: > >> I would suggest that

Re: Doubts on KafkaIO/SourceIO

2021-09-03 Thread Luke Cwik
https://beam.apache.org/documentation/io/developing-io-java/#implementing-the-reader-subclass is out of date and at the top says IMPORTANT: Use Splittable DoFn to develop your new I/O. For more details, read the new I/O connector overview. On Fri, Sep 3, 2021 at 9:55 AM Alexey Romanenko wrote:

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Luke Cwik
SDF isn't required as users already try to do things like this using UnboundedSource and Pubsub. On Wed, Jun 23, 2021 at 11:39 AM Reuven Lax wrote: > This was explored in the past, though the design started getting very > complex (watermarks of unbounded dimension, where each iteration has its

Re: [Proposal] Support State Batching and Prefetching over FnApi

2021-06-14 Thread Luke Cwik
d see the Fn API streaming protocol might mean the implementation is > different than it is in ReduceFnRunner. > > Kenn > > On Mon, Jun 14, 2021 at 2:46 PM Luke Cwik wrote: > >> The third approach prevents you from batching across state keys which >> would be the most c

Re: [Proposal] Support State Batching and Prefetching over FnApi

2021-06-14 Thread Luke Cwik
The third approach prevents you from batching across state keys which would be the most common type of batching. On Thu, May 6, 2021 at 3:13 PM Rui Wang wrote: > At this moment, the third approach in the doc is preferred. To recap, the > third approach is the one that only changes FnApi by

Re: Removing deprecated oauth2client dependency for Python SDK

2021-06-10 Thread Luke Cwik
I did something very similar during the Dataflow Java 1.x to Beam Java 2.x migration. The work boiled down to: * swapping to a different library to get the application default credentials (including fixing upstream bugs at Google and improving some documentation) * swapping existing API calls to

Re: beam new feature

2021-06-08 Thread Luke Cwik
Thanks, I left a few comments in the doc. On Tue, Jun 8, 2021 at 12:26 PM Daria Malkova wrote: > Hi community! > > I've noticed that there is no possibility in Beam JDBC to use partitioning > for reading a very large table with millions of rows in parallel (for > example when migrating legacy

Re: [DISCUSS] Sensible dependency upgrades

2020-10-23 Thread Luke Cwik
An additional thing I forgot to mention was that if we only had portable runners our BOM story would be simplified since we wouldn't have the runner on the classpath and users would have a consistent experience across runners with regards to dependency convergence. On Fri, Oct 23, 2020 at 6:15 AM

Re: [DISCUSS] Sensible dependency upgrades

2020-10-22 Thread Luke Cwik
Traditionally I have been pushing for as many versions of deps to use the same version across all the Beam modules (the purpose of the list of deps in BeamModulePlugin.groovy) to simplify dependency convergence. One solution is to test and publish BOMs for the various common platform

Re: Contributor permissions for Beam Jira - tomasz.szerszen

2020-10-21 Thread Luke Cwik
Welcome to the community. I have added you as a contributor. Please take a look at our contribution guide[1] if you haven't already done so. 1: https://beam.apache.org/contribute/ On Wed, Oct 21, 2020 at 9:32 AM Tomasz Szerszeń wrote: > Hello, > > I'm looking forward to contribute to Beam

Re: Apache Beam case studies

2020-10-21 Thread Luke Cwik
+Mariann Nagy has been doing things like this for years now and may be interested. On Wed, Oct 21, 2020 at 12:50 AM Karolina Rosół wrote: > Hi folks, > > With some people from Polidea we came up with an idea to carry out > interviews with Apache Beam users to spread the news about the Beam

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-10-19 Thread Luke Cwik
at 10:51 AM Luke Cwik wrote: > Thanks Alexey, that is correct. > > On Wed, Oct 14, 2020 at 10:33 AM Alexey Romanenko < > aromanenko@gmail.com> wrote: > >> Thanks Luke, just I guess that the proper link should be this one: >> >> https://docs.google.com/do

Re: Please add me to the mailing list

2020-10-19 Thread Luke Cwik
Send an e-mail to dev-subscr...@beam.apache.org to subscribe as per https://beam.apache.org/community/contact-us/ On Mon, Oct 19, 2020 at 9:24 AM Mike Lo wrote: > Thanks! > > Best, > Mike > > PhD, Bioengineering > San Francisco Bay Area > Mobile: 510-710-4906 <(510)%20710-4906> > LinkedIn

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-10-14 Thread Luke Cwik
Thanks Alexey, that is correct. On Wed, Oct 14, 2020 at 10:33 AM Alexey Romanenko wrote: > Thanks Luke, just I guess that the proper link should be this one: > > https://docs.google.com/document/d/1kpn0RxqZaoacUPVSMYhhnfmlo8fGT-p50fEblaFr2HE > > On 13 Oct 2020, at 00:23, Luke Cwi

Re: Adding transactional writer to SpannerIO

2020-10-13 Thread Luke Cwik
+user for feedback from users. As long as users know that they must structure their transactions to be repeatable and/or are ok with a transaction occurring multiple times then that should be fine. Has most of the focus been around a serializable function from customers or would something using

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-10-12 Thread Luke Cwik
I have a draft[1] off the blog ready. Please take a look. 1: http://doc/1kpn0RxqZaoacUPVSMYhhnfmlo8fGT-p50fEblaFr2HE#heading=h.tbab2n97o3eo On Mon, Oct 5, 2020 at 4:28 PM Luke Cwik wrote: > > > On Mon, Oct 5, 2020 at 3:45 PM Kenneth Knowles wrote: > >> >> >>

Re: Throttling stream outputs per trigger?

2020-10-07 Thread Luke Cwik
Vincent Marquez wrote: > Thanks for the response. Is my understanding correct that SplittableDoFns > are only applicable to Batch pipelines? I'm wondering if there's any > proposals to address backpressure needs? > *~Vincent* > > > On Tue, Oct 6, 2020 at 1:3

Re: Throttling stream outputs per trigger?

2020-10-06 Thread Luke Cwik
k Overflow I found this > answer: https://stackoverflow.com/a/57275557/25658 that makes it seem > challenging, but I wasn't sure if things had changed since then or you had > better ideas. > > *~Vincent* > > > On Thu, Oct 1, 2020 at 2:57 PM Luke Cwik wrote: > >> Look at

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-10-05 Thread Luke Cwik
eam/pull/13006 On Fri, Aug 28, 2020 at 1:45 AM Maximilian Michels wrote: > Thanks Luke! I've had a pass. > > -Max > > On 28.08.20 01:22, Luke Cwik wrote: > > As an update. > > > > Direct and Twister2 are done. > > Samza: is ready for review[1]. > >

Re: Self-checkpoint Support on Portable Flink

2020-10-05 Thread Luke Cwik
Thanks Boyuan, I left a few comments. On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang wrote: > Hi team, > > I'm looking at adding self-checkpoint support to portable Flink runner( > BEAM-10940 ) for both > batch and streaming. I summarized the

Re: Support streaming side-inputs in the Spark runner

2020-10-02 Thread Luke Cwik
Support for watermark holds is missing for both Spark streaming implementations (DStream and structured streaming) so watermark based triggers don't produce the correct output. Excluding the direct runner, Flink is the OSS runner with the most people working on it adding features and fixing bugs

Re: Throttling stream outputs per trigger?

2020-10-01 Thread Luke Cwik
> > *~Vincent* > > > On Thu, Oct 1, 2020 at 2:29 PM Luke Cwik wrote: > >> Why do you want to only emit X? (e.g. running out of memory in the runner) >> >> On Thu, Oct 1, 2020 at 2:08 PM Vincent Marquez >> wrote: >> >>> Hello all. If I want

Re: Throttling stream outputs per trigger?

2020-10-01 Thread Luke Cwik
Why do you want to only emit X? (e.g. running out of memory in the runner) On Thu, Oct 1, 2020 at 2:08 PM Vincent Marquez wrote: > Hello all. If I want to 'throttle' the number of messages I pull off say, > Kafka or some other queue, in order to make sure I only emit X amount per > trigger, is

Re: Support streaming side-inputs in the Spark runner

2020-10-01 Thread Luke Cwik
I would suggest trying FlinkRunner as it is a much more complete streaming implementation. SparkRunner has several key things that are missing that won't allow your pipeline to function correctly. If you're really invested in getting SparkRunner working though feel free to contribute the necessary

Re: [ANNOUNCE] Beam Java 8 image rename starting from 2.26.0 (to apache/beam_java8_sdk)

2020-10-01 Thread Luke Cwik
Can we copy the beam_java_sdk image to beam_java8_sdk for a few prior releases so people who are on an older release can migrate now and not have to remember to do it with 2.26? On Tue, Sep 29, 2020 at 5:37 PM Emily Ye wrote: > Starting with the release of 2.26.0, the Java 8 SDK container image

Re: Contributor permission for Beam Jira tickets

2020-10-01 Thread Luke Cwik
Welcome to the community. I have added you as a contributor. Please take a look at the contribute guide[1]. 1: https://beam.apache.org/contribute/ On Thu, Oct 1, 2020 at 9:49 AM George Pearman wrote: > Hi > > My name is George. I have been developing Beam applications at LinkedIn > for the

Re: I/O connectors for streaming GRPC source

2020-09-30 Thread Luke Cwik
There is no generic gRPC connector and it is unlikely that there ever will be one. A lot of the time integration with external systems is for ingesting large amounts of data which works best with certain features which gRPC doesn't natively support but an application protocol built on top of gRPC

Re: How to write a Python wrapper for MQTT io

2020-09-28 Thread Luke Cwik
It is also very important to document the URN and describe its configuration payload so that a new Beam SDK who wants to use the XLang transform knows what the spec is and that if the XLang implementation were to change it can still honor the original spec. +Chamikara Jayalath , is there a good

Re: Kafka Streams Runner [BEAM-2466]

2020-09-25 Thread Luke Cwik
That's exciting. I would suggest that you take a look at implementing a portable runner so that you get cross language pipelines and the ability to execute Python and Go pipelines. Looking at https://s.apache.org/beam-fn-api and the Flink or Samza implementations would be good starting points.

Re: Intro and ticket BEAM-10938

2020-09-22 Thread Luke Cwik
Thanks for reaching out. "Triage needed" is the default state when a bug is opened and does not mean that it is yet to be decided. Typically if there is something of note, either the contributor asks on the dev@ mailing list about it and works with the community or opens a PR and a reviewer will

Re: Output from Window not getting materialized

2020-09-21 Thread Luke Cwik
er to make sure they are sorted before passing to a downstream stateful DoFn. 1: https://lists.apache.org/thread.html/9cdac2a363e18be58fa1f14c838c61e8406ae3407e4e2d05e423234c%40%3Cdev.beam.apache.org%3E > > Regards, > Praveen > > On Fri, Sep 18, 2020 at 10:06 AM Luke Cwik wrote: > >>

Re: What is the process to remove a Jenkins job?

2020-09-21 Thread Luke Cwik
When the seed job runs next time, any job that isn't explicitly part of the seed job is disabled. The existing job history will stick around and eventually someone should delete them manually from Jenkins. On Mon, Sep 21, 2020 at 10:46 AM Valentyn Tymofieiev wrote: > We are removing several

Re: How to gracefully stop a beam application

2020-09-21 Thread Luke Cwik
+user On Mon, Sep 21, 2020 at 9:16 AM Luke Cwik wrote: > You need the "sources" to stop and advance the watermark to infinity and > have that propagate through the entire pipeline. There are propoosals for > pipeline drain[1] and also for snapshot and update[2] for Apache Be

[DISCUSS] Clearing timers (https://github.com/apache/beam/pull/12836)

2020-09-18 Thread Luke Cwik
PR 12836[1] is adding support for clearing timers and there is a discussion about what the semantics for a cleared timer should be. So far we have: 1) Clearing an unset timer is a no-op 2) If the last action on the timer was to clear it, then a future bundle should not see it fire Ambiguity

Re: Output from Window not getting materialized

2020-09-18 Thread Luke Cwik
en could you please clarify on how we > should set the watermark for this manual watermark estimator? > > receiver.outputWithTimestamp(ossRecord, Instant.now()); > > Thanks, > Praveen > > On Mon, Sep 14, 2020 at 9:10 AM Luke Cwik wrote: > >> Is the watermark advancing[1, 2]

Re: Check for enough access to classes/methods of public API

2020-09-18 Thread Luke Cwik
I only know of tooling that can be used to make sure that the existing API doesn't change. Also there isn't an explicit "friend" concept in Java but people do sometimes orchestrate public classes with methods that restrict who can use them[1] but I don't think this is a case of that. 1:

Re: I would like to assign a BEAM ticke to myself

2020-09-14 Thread Luke Cwik
Welcome to the community. I have added you as a contributor and assigned BEAM-10875 to you. On Mon, Sep 14, 2020 at 4:16 PM terry xian wrote: > Hi, > > I have created a jira ticket: [BEAM-10875] Support NUMERIC type in > spanner schema parser - ASF JIRA >

Re: Jira contributor permissions

2020-09-11 Thread Luke Cwik
Welcome Kiley, I have done as you have requested. On Fri, Sep 11, 2020 at 1:18 PM Kiley Sok wrote: > Hello, > > I'm Kiley, a SWE at Google working on Beam. Can I be added as a > contributor to Jira? My username is kileys. > > Thanks, > Kiley >

Re: Clear Timer in Java SDK

2020-09-03 Thread Luke Cwik
Java SDK hasn't exposed the ability to remove timers. On Wed, Sep 2, 2020 at 11:00 AM Boyuan Zhang wrote: > Hi team, > > I'm looking for something similar to timer.clear() from Python SDK[1] in > Java SDK but it seems like we haven't exposed clearing timer API from Java > Timer. Does Java SDK

Re: Contributor permission for Beam Jira tickets

2020-08-28 Thread Luke Cwik
Welcome to the community. It looks like someone has already added you. Check out the contribution guide[1]. 1: https://beam.apache.org/contribute/ On Fri, Aug 28, 2020 at 12:03 PM Omkar Deshpande wrote: > Hi, my name is Omkar Deshpande. I am interested in contributing java kafka > io module

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-08-27 Thread Luke Cwik
le and get back to you. > > Best Regards, > Pulasthi > > On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik wrote: > >> I have made some good progress here and have gotten to the following >> state for non-portable runners: >> >> DirectRunner[1]: Merged. Suppor

Re: How does groupIntoBatches behave when there are too few elements for a key?

2020-08-26 Thread Luke Cwik
GroupIntoBatches should always emit any buffered elements on window expiration. On Wed, Aug 26, 2020 at 8:55 AM Alex Amato wrote: > How does groupIntoBatches behave when there are too few elements for a key > (less than the provided batch size)? > > Based on how its described >

Re: Splittable-Dofn not distributing the work to multiple workers

2020-08-21 Thread Luke Cwik
ltiple workers. > > On Fri, Aug 21, 2020 at 11:00 AM Luke Cwik wrote: > >> Are you using Dataflow runner v2[1] since the default for Beam Java still >> uses Dataflow runner v1? >> Dataflow runner v2 is the only one that supports autoscaling and dynamic >> splitting o

Re: @StateId uniqueness across DoFn(s)

2020-08-21 Thread Luke Cwik
The DoFn is associated with a PTransform and in the pipeline proto there is a unique id associated with each PTransform. You can use that to generate a composite key (ptransformid, stateid) which will be unique within the pipeline. On Fri, Aug 21, 2020 at 11:26 AM Ke Wu wrote: > Thank you

Re: Splittable-Dofn not distributing the work to multiple workers

2020-08-21 Thread Luke Cwik
Are you using Dataflow runner v2[1] since the default for Beam Java still uses Dataflow runner v1? Dataflow runner v2 is the only one that supports autoscaling and dynamic splitting of splittable dofns in bounded pipelines. 1:

Re: Suggestion to let KafkaIO support the deserializer API with headers

2020-08-21 Thread Luke Cwik
Sounds good. Note that you'll also want to update ReadFromKafkaDoFn[1] and provide tests that cover both to make sure we don't regress and stop providing headers. 1:

Re: Is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK?

2020-08-20 Thread Luke Cwik
+user On Thu, Aug 20, 2020 at 9:47 AM Luke Cwik wrote: > Are you using Dataflow runner v2[1]? > > If so, then you can use: > --number_of_worker_harness_threads=X > > Do you know where/why the OOM is occurring? > > 1: > https://cloud.google.com/dataflow/docs/guides/de

Re: Is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK?

2020-08-20 Thread Luke Cwik
Are you using Dataflow runner v2[1]? If so, then you can use: --number_of_worker_harness_threads=X Do you know where/why the OOM is occurring? 1: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 2:

Re: [BEAM-10292] change proposal to DefaultFilenamePolicy.ParamsCoder

2020-08-19 Thread Luke Cwik
, e.g. S3, already have a check for that. > So I added a similar check to the local filesystem too. The implementation > is in the same pull request https://github.com/apache/beam/pull/12050. > > Can you take a look at it, please? > > Thanks, > David > > út 11. 8. 2020

Re: Percentile metrics in Beam

2020-08-18 Thread Luke Cwik
t;>>https://s.apache.org/beam-histogram-metrics: >>>- In addition to the moment sketch variables >>> >>> <https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/> >>>. >

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-08-18 Thread Luke Cwik
and hence ready for review? 1: https://github.com/apache/beam/pull/12519 2: https://github.com/apache/beam/pull/12594 3: https://github.com/apache/beam/pull/12603 4: https://github.com/apache/beam/pull/12616 5: https://github.com/apache/beam/pull/12617 On Tue, Aug 11, 2020 at 10:55 AM Luke Cwik

Re: Percentile metrics in Beam

2020-08-18 Thread Luke Cwik
. > > I believe that would be feasible, as we would still retain the Histogram > data. I don't think we can restore the Histograms with just the Sketch, if > that was the suggestion. Please let me know if I misunderstood. > > If that's correct, I can write up the benefits a

Re: BEAM-9045 Implement an Ignite runner using Apache Ignite compute grid

2020-08-17 Thread Luke Cwik
At this point in time I would recommend that you build a runner that executes pipelines using only the portability layer like Flink/Samza/Spark [1,2,3]. 1: https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPortablePipelineTranslator.java 2:

Re: Percentile metrics in Beam

2020-08-17 Thread Luke Cwik
That is an interesting suggestion to change to use a sketch. I believe having one metric URN that represents all this information grouped together would make sense instead of attempting to aggregate several metrics together. The underlying implementation of using sum/count/max/min would stay the

Re: Output timestamp for Python event timers

2020-08-11 Thread Luke Cwik
+1 on what Boyuan said. It is important that the defaults for processing time domain differ from the defaults for the event time domain. On Tue, Aug 11, 2020 at 12:36 PM Yichi Zhang wrote: > +1 to expose set_output_timestamp and enrich python set timer api. > > On Tue, Aug 11, 2020 at 12:01 PM

Re: [DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-08-11 Thread Luke Cwik
should help to avoid this? > > On 10 Aug 2020, at 22:59, Luke Cwik wrote: > > In the past couple of months wrappers[1, 2] have been added to the Beam > Java SDK which can execute BoundedSource and UnboundedSource as Splittable > DoFns. These have been opt-out for portable pipelin

Re: [BEAM-10292] change proposal to DefaultFilenamePolicy.ParamsCoder

2020-08-11 Thread Luke Cwik
er job of file/dir matching > > This is a bug we probably have to fix anyways for the local filesystem > and/or HDFS and this will also give us a solution that does not break > update compatibility. > > Thanks, > Cham > > On Wed, Aug 5, 2020 at 3:41 PM Luke Cwik

Re: [DISCUSS] Better alignment of Apache Flink and Apache Beam releases

2020-08-10 Thread Luke Cwik
Is there a way we could use a fixed point in time Flink nightly that passes all the tests/validation and bump up the nightly version manually to get "closer" to the release candidate instead of doing another branch? This would mean that any changes that impact the Flink runner that are related to

[DISCUSS][BEAM-10670] Migrating BoundedSource/UnboundedSource to execute as a Splittable DoFn for non-portable Java runners

2020-08-10 Thread Luke Cwik
In the past couple of months wrappers[1, 2] have been added to the Beam Java SDK which can execute BoundedSource and UnboundedSource as Splittable DoFns. These have been opt-out for portable pipelines (e.g. Dataflow runner v2, XLang pipelines on Flink/Spark) and opt-in using an experiment for all

  1   2   3   4   5   >