Growing checkpoint size with Python SDF for reading from Redis streams

2023-07-18 Thread Nimalan Mahendran
Hello, I am running a pipeline built in the Python SDK that reads from a Redis stream via an SDF, in the following environment: - Python 3.11 - Apache Beam 2.48.0 - Flink 1.16 - Checkpoint interval: 60s - state.backend (Flink): hashmap

How to submit Beam Python job onto Kubernetes with Flink runner?

2023-07-18 Thread John Tipper
Hi all, I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb) but I'm not sure what the author is referring to when

Re: [Question] Processing chunks of data in batch based pipelines

2023-07-18 Thread Alexey Romanenko
Reading from RDBMS and processing the data downstream of your pipeline is not the same in terms of bundling. The main “issue" with a former is that it reads mostly in a single thread per SQL-query and JDBC client is not exception. So, Beam can’t split data, that are not yet read, into