Capturing Statement Executions within JdbcSink

2021-03-17 Thread Rion Williams
Hey all, Is there any mechanism in place (or out of the box) to support reading the results of statements executed by the JdbcSink or would I need to implement my own to support it? The problem that I'm trying to solve relates to observability (i.e. metrics) and incrementing specific counters bas

Using Prometheus Client Metrics in Beam with FlinkRunner

2021-02-22 Thread Rion Williams
Is anyone aware of a way to get Prometheus metrics to output via Beam (i.e. using them over the Beam abstractions)? I seem to be able to define them as expected, however I don’t see them being emitted with the rest of my Prometheus metrics: private class RouteEvent(): DoFn<...>() { private

Fwd: Defining Custom Labels / Label Support in Beam Metrics

2021-02-17 Thread Rion Williams
+user for additional reach -- Forwarded message - From: Rion Williams Date: Mon, Feb 15, 2021 at 7:58 PM Subject: Defining Custom Labels / Label Support in Beam Metrics To: dev Hey all, I've been working extensively on the observability stories surrounding some o

Re: Unit Testing Kafka in Apache Beam

2021-02-09 Thread Rion Williams
sages after it has > started? We use this approach for some tests against Cloud PubSub. > Note if using the DirectRunner you need to set the blockOnRun pipeline option > to False to do this. > > Brian > >> On Mon, Feb 8, 2021 at 2:10 PM Rion Williams wrote: >> Hey f

Unit Testing Kafka in Apache Beam

2021-02-08 Thread Rion Williams
Hey folks, I’ve been working on fleshing out a proof-of-concept pipeline that deals with some out of order data (I.e. mismatching processing times / event-times) and does quite a bit of windowing depending on the data. Most of the work I‘ve done in a lot of streaming systems relies heavily on w

Re: Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

2021-02-01 Thread Rion Williams
> watermark instead of having a watermark over these 3 partitions. Do I > understand this correctly? > > If so, I think you need separated pipelines. If you only want to know > which records come from which partitions, ReadFromKafkaDoFn emits a KV pair > where the KafkaSourc

Re: Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

2021-02-01 Thread Rion Williams
hat generates KafkaSourceDescriptor) > .apply(ParDo.of(ReadFromKafkaDoFn)) > .apply(other parts) > >> On Mon, Feb 1, 2021 at 8:06 AM Rion Williams wrote: >> Hey all, >> >> I'm currently in a situation where I have a single Kafka topic with data >> acro

Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

2021-02-01 Thread Rion Williams
Hey all, I'm currently in a situation where I have a single Kafka topic with data across multiple partitions and covers data from multiple sources. I'm trying to see if there's a way that I'd be able to accomplish reading from these different sources as different pipelines and if a Splittable DoFn

Handling Out-of-Order Windowing Event Times from Kafka to GCS

2021-01-29 Thread Rion Williams
Hey folks, I’ve been mulling over how to solve a given problem in Beam and thought I’d reach out to a larger audience for some advice. At present things seem to be working sparsely and I was curious if someone could provide a sounding-board to see if this workflow makes sense. The primary high-l

Fwd: Accessing Custom Beam Metrics in Dataproc

2021-01-12 Thread Rion Williams
+user Begin forwarded message: > From: Rion Williams > Date: January 12, 2021 at 4:09:34 PM CST > To: d...@beam.apache.org > Subject: Accessing Custom Beam Metrics in Dataproc > Reply-To: d...@beam.apache.org > > Hi all, > > I'm currently in the process of ad

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-14 Thread Rion Williams
Hi Teodor, Although I’m sure you’ve come across it, this might have some valuable resources or methodologies to consider as you explore this a bit more: https://arxiv.org/pdf/1907.08302.pdf I’m looking forward to reading about your finding, especially using a more recent iteration of Beam! Ri

Transform Logging Issues with Spark/Dataproc in GCP

2020-10-27 Thread Rion Williams
+user for visibility Begin forwarded message: > From: Rion Williams > Date: October 27, 2020 at 12:28:37 PM CDT > To: d...@beam.apache.org > Subject: Transform Logging Issues with Spark/Dataproc in GCP > > Hi all, > > Recently, I deployed a very simple Apache B

Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Rion Williams
Hi Wesley, In essence, yes, that’s what they did. The three authors of Streaming Systems are folks that work on Google’s Dataflow Project, which for all intents and purposes is essentially an implementation of the Beam Model. Two of them are members of the Beam PMC (essentially a steering comm

Re: Apache Beam a Complete Guide - Review?

2020-06-28 Thread Rion Williams
Hi Wesley, I considered that one as well but was in the same boat in terms of not pulling the trigger (lack of reviews, price point, etc.). I eventually landed on Streaming Systems, which I highly, highly recommend if you want to learn more about the Beam model: - http://streamingsystems.net/

New Addition to the Katas Family: Kotlin

2020-06-02 Thread Rion Williams
process, as well as Pablo Estrada for helping this thing get merged in. Thanks everyone and feel free to reach out if you run into any issues with the course! Rion Williams

Re: Interested in Contributing to Beam

2020-05-18 Thread Rion Williams
.apache.org/contribute/ > 2: https://s.apache.org/beam-starter-tasks > > >> On Sun, May 17, 2020 at 8:15 AM Rion Williams wrote: >> Hi all, >> >> As I’ve been digging into Beam more, I’d love to have a chance to contribute >> to the project (eithe

Interested in Contributing to Beam

2020-05-17 Thread Rion Williams
Hi all, As I’ve been digging into Beam more, I’d love to have a chance to contribute to the project (either through the existing ones or by assigning new ones to myself). I already have an existing ASF account (rionmonster), but it would be great to have access to the Beam project itself. Tha

Re: Try Beam Katas Today

2020-05-14 Thread Rion Williams
+1 on the contributions front. My team and I have been working with Beam primarily with Kotlin and I recently added the appropriate dependencies to Gradle and performed a bit of conversions and have it working as expected against the existing Java course. I don’t know how many others are active

Re: Enriching a stream by looking up from a huge table (2 TiB)+

2020-05-13 Thread Rion Williams
I've been exploring the pattern that Luke's thread refers to (as I'm the one that asked the question in that thread) however I've been pulled away briefly from getting around to implementing it. Just to chime in, another pattern that I've seen recommended quite frequently when dealing with enric

Re: Pattern for Enrichment Against Unbounded Source (Kafka)

2020-05-04 Thread Rion Williams
> I have a lot of work stuff going on so I'll try to provide a response but it > might take days. Also, if you find an answer to one of your questions or have > additional questions while you investigate, feel free to update this thread. > >> On Mon, May 4, 2020 at 2:5

Re: Pattern for Enrichment Against Unbounded Source (Kafka)

2020-05-04 Thread Rion Williams
g the stateful DoFn would mean that you would only be > retrieving/writing as much data that is ever associated with the key that > you use and would have good parallelism if you have a lot of keys. > > 1: https://beam.apache.org/documentation/patterns/side-inputs/ > 2: > https://lists.ap

Re: Recommended Approach for Parallel / Multi-Stage Enrichment

2020-05-04 Thread Rion Williams
sts.apache.org/thread.html/r2272040a06457cfdb867832a61f2933d1a3ba832057cffda89ee248a%40%3Cuser.beam.apache.org%3E > ? > > > On Tue, Apr 28, 2020 at 9:26 AM Rion Williams wrote: > > > Hi all, > > > > I'm trying to implement a process and I'm not quite sure what the best > > approac

Pattern for Enrichment Against Unbounded Source (Kafka)

2020-05-03 Thread Rion Williams
Hi all, I'm in the process of migrating an existing pipeline over from the Kafka Streams framework to Apache Beam and I'm reaching out in hopes of finding a pattern that could address the use-case I'm currently dealing with. In the most trivial sense, the scenario just involves an incoming mess

Recommended Approach for Parallel / Multi-Stage Enrichment

2020-04-28 Thread Rion Williams
Hi all, I'm trying to implement a process and I'm not quite sure what the best approach to efficiently implement it might be while taking advantage of Beam's parallelism and recommended patterns. Basically the problem itself can be summarized as follows: I have a series of incoming events whic

Recommended Reading for Apache Beam

2020-04-20 Thread Rion Williams
Hi all, I posed this question over on the Apache Slack Community however didn't get much of a response so I thought I'd reach out here. I've been looking for some additional resources, specifically books, surrounding Apache Beam, the Beam Model, etc. and was wondering if anyone had any recommen

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
use only own wrapper. > Also, user won’t need to convert types between Read and Write (like in this > topic case). > >> On 17 Apr 2020, at 19:28, Rion Williams wrote: >> >> Hi Alexey, >> >> So this is currently the approach that I'm taking. Bas

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
will help, but KafkaIO allows to keep all meta information > while reading (using KafkaRecord) and writing (using ProducerRecord). > So, you can keep your tracing id in the record headers as you did with Kafka > Streams. > > > On 17 Apr 2020, at 18:58, Rion Williams wrote: >

Re: Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
e how your elements go to the pipeline or do you want to > see how every ParDo interacts with external systems? > > On Fri, Apr 17, 2020, 17:38 Rion Williams wrote: > > > Hi all, > > > > I'm reaching out today to inquire if Apache Beam has any support or >

Distributed Tracing in Apache Beam

2020-04-17 Thread Rion Williams
Hi all, I'm reaching out today to inquire if Apache Beam has any support or mechanisms to support some type of distributed tracing similar to something like Jaeger (https://www.jaegertracing.io/). Jaeger itself has a Java SDK, however due to the nature of Beam working with transforms that yield