Re: Growing checkpoint size with Python SDF for reading from Redis streams

2023-07-21 Thread Nimalan Mahendran
An interesting update - I found PeriodicImpulse, which is a Python-based unbounded SDF. I created the following minimal pipeline: with beam.Pipeline(options=runner_options) as pipeline: traceable_measurements = pipeline | PeriodicImpulse(fire_interval=5) traceable_measurements

Re: Growing checkpoint size with Python SDF for reading from Redis streams

2023-07-21 Thread Nimalan Mahendran
Hello again, I tried to follow a process of elimination and simplify my code as much as possible, see https://gist.github.com/nybbles/1feafda7321f7c6a9e16c1455ebf0f3e#file-simplified_beam-py. This splittable DoFn is pared down to only doing the polling. It no longer uses the Redis client or uses

[question]Best practices for branching pipeline.

2023-07-21 Thread Ruben Vargas
Hello, I'm starting using Beam and I would like to know if there is any recommended pattern for doing the following: I have a message coming from Kafka and then I would like to apply two different transformations and merge them in a single result at the end. I attached an image that describes

Re: Getting Started With Implementing a Runner

2023-07-21 Thread Joey Tran
Could you let me know when you update it? I would be interested in rereading after the rewrite. Thanks! Joey On Fri, Jul 14, 2023 at 4:38 PM Robert Bradshaw wrote: > I'm taking an action item to update that page, as it is *way* out of date. > > On Thu, Jul 13, 2023 at 6:54 PM Joey Tran >

Re: Can we use RedisIO to write records from an unbounded collection

2023-07-21 Thread Alexey Romanenko
Hi Sachin, > On 21 Jul 2023, at 08:45, Sachin Mittal wrote: > > I was reading up on this IO here > https://beam.apache.org/documentation/io/connectors/ and it states that it > only supports batch and not streaming. I believe it states only about Reading support. For Writing, it mostly

Can we use RedisIO to write records from an unbounded collection

2023-07-21 Thread Sachin Mittal
Hi, I was planning to use the RedisIO write/writeStreams function in a streaming pipeline. https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/redis/RedisIO.html The pipeline would read an unbounded collection from Kinesis and update redis. It will update data for which key

EFO KinesisIO watermarking doubt

2023-07-21 Thread Sachin Mittal
Hi, We are implementing EFO Kinesis IO reader provided by apache beam. I see that in code that for implementation of getCurrentTimestamp we always return getApproximateArrivalTimestamp and not the event time which we may have set for that record using withCustomWatermarkPolicy. Please refer: