Re: Fails to deploy a python pipeline to a flink cluster

2024-05-11 Thread Jaehyeon Kim
user, wrote: > Do you still have the same issue? I tried to follow your setup.sh to > reproduce this but somehow I am stuck at the word_len step. I saw you also > tried to use `print(kafka_kv)` to debug it. I am not sure about your > current status. > > On Fri, May 10, 2024 at 9:

Fails to deploy a python pipeline to a flink cluster

2024-05-10 Thread Jaehyeon Kim
) on minikube and a docker image named beam-python-example:1.17 created using the following docker file - the full details can be checked in https://github.com/jaehyeon-kim/beam-demos/blob/feature/beam-deploy/beam-deploy/beam/Dockerfile The java sdk is used for the sdk harness of the kafka io's expansion

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread Jaehyeon Kim
r SDFs, > > On Sun, May 5, 2024 at 5:10 AM Jaehyeon Kim wrote: > >> Hi XQ >> >> Thanks for checking it out. SDFs chaining seems to work as I created my >> pipeline while converting a pipeline that is built in the Java SDK. The >> source of the Java p

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread Jaehyeon Kim
makes me > wonder whether the chained SDFs work. > > @Chamikara Jayalath Any thoughts? > > On Fri, May 3, 2024 at 2:45 AM Jaehyeon Kim wrote: > >> Hello, >> >> I am building a pipeline using two SDFs that are chained. The first >> function (DirectoryWatch

Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-03 Thread Jaehyeon Kim
but the issue is the first SDF stops as soon as it returns the first list of files. The source of the pipeline can be found in - First SDF: https://github.com/jaehyeon-kim/beam-demos/blob/feature/beam-pipeline/beam-pipelines/chapter7/directory_watch.py - Second SDF: https://github.com/jaehyeon-kim

Re: Any recomendation for key for GroupIntoBatches

2024-04-12 Thread Jaehyeon Kim
Here is an example from a book that I'm reading now and it may be applicable. JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100 PYTHON - ord(id[0]) % 100 On Sat, 13 Apr 2024 at 06:12, George Dekermenjian wrote: > How about just keeping track of a buffer and flush the buffer after 100 > messages

Bundle a Beam pipeline along with all dependencies, artifacts, etc for FlinkRunner?

2024-04-07 Thread Jaehyeon Kim
Hello, I saw an example of bundling a Beam pipeline along with all dependencies, artifacts, etc in the Spark runner document ( https://beam.apache.org/documentation/runners/spark/), and the bundled jar can be used to run a job without a job server. Does it apply to the Flink Runner as well? Also,

Questions on Python ReadFromKafka watermark generation and trigger

2024-03-27 Thread Jaehyeon Kim
Hello, I have questions on watermark generation and trigger about ReadFromKafka (Python SDK). 1.It only supports the following simple timestamp policies: - ProcessingTime - CreateTime - LogAppendTime Is there any way to assign event time from timestamp in a Kafka message? > The Java SDK

Re: What's the current status of pattern matching with Beam SQL?

2024-03-24 Thread Jaehyeon Kim
nctions.java > for the supported functions. > > On Sat, Mar 23, 2024 at 5:50 AM Jaehyeon Kim wrote: > >> Hello, >> >> I found a blog article about pattern matching with Beam SQL - >> https://beam.apache.org/blog/pattern-match-beam-sql/. All of the PRs and >

What's the current status of pattern matching with Beam SQL?

2024-03-23 Thread Jaehyeon Kim
Hello, I found a blog article about pattern matching with Beam SQL - https://beam.apache.org/blog/pattern-match-beam-sql/. All of the PRs and commits that are included in the post are merged. On the other hand, the Beam Calcite SQL overview page indicates MATCH_RECOGNIZE is not supported (

Re: java.lang.ClassCastException: class java.lang.String cannot be cast to class...

2024-03-17 Thread Jaehyeon Kim
) > 4. The key I think is to explicitly specify the output types for > TestStream like this: TestStream(coder=coders.StrUtf8Coder()) > *.with_output_types(str)* > > These at least work for me. > > On Thu, Mar 14, 2024 at 4:37 PM Jaehyeon Kim wrote: > >> Hello, &g

Re: how to enable debugging mode for python worker harness

2024-03-17 Thread Jaehyeon Kim
Hello, The pipeline runs in host while host.docker.internal would only be resolved on the containers that run with the host network mode. I guess the pipeline wouldn't be accessible to host.docker.internal and fails to run. If everything before ReadFromKafka works successfully, a docker

java.lang.ClassCastException: class java.lang.String cannot be cast to class...

2024-03-14 Thread Jaehyeon Kim
Hello, I am trying a simple word count pipeline in a streaming environment using TestStream (Python SDK). While it works with the DirectRunner, it fails on the FlinkRunner with the following error. It looks like a type casting issue. Traceback (most recent call last): File

Re: Expansion service for SqlTranform fails with a local flink cluster using Python SDK

2024-03-12 Thread Jaehyeon Kim
ise self._runtime_exception RuntimeError: Pipeline sql-transform_4f61dc80-9aeb-4e85-9bfb-3df1b6e3db48 failed in state FAILED: java.lang.IllegalStateException: No container running for id c03fe103bd0c2498b66945608709e6488dabbae9ba359c1e7600c1a9a9c63800 On Tue, 12 Mar 2024 at 15:07, Jaehyeon Kim wrote:

Expansion service for SqlTranform fails with a local flink cluster using Python SDK

2024-03-11 Thread Jaehyeon Kim
Hello, ✔️ I have a simple pipeline that transforms data with *SqlTransform*. I use the *FlinkRunner *and, when I don't specify the *flink_master *option and use an embedded flink cluster, it works fine. However, if I use a local flink cluster and specify the *flink_master *option to

Re: Fails to run two multi-language pipelines locally?

2024-03-10 Thread Jaehyeon Kim
t; > That being said, it should work just fine with distinct jars on an > arbitrary setup as well (and the fact that it works locally hints to > something being up with that setup). Not sure what your flink master > configuration is like, but maybe it's a docker-in-docker issue? > &

Re: [Question] ReadFromKafka can't get messages.

2024-03-08 Thread Jaehyeon Kim
I've been teaching myself Beam and here is an example pipeline that uses Kafka IO (read and write). I hope it helps. *Prerequisites* 1. Kafka runs on Docker and its external listener is exposed on port 29092 (i.e. its bootstrap server address can be specified as localhost:29092) 2. The following

Re: Interactive runner that uses flink runner can read kafka messages?

2024-03-07 Thread Jaehyeon Kim
t; > > Ning. > > On Wed, Mar 6, 2024 at 9:50 PM Jaehyeon Kim wrote: > >> Hello, >> >> I'm playing with the interactive runner on a notebook and the flink >> runner is used as the underlying runner. I wonder if it can read messages >> from Kafka. I check

Interactive runner that uses flink runner can read kafka messages?

2024-03-06 Thread Jaehyeon Kim
Hello, I'm playing with the interactive runner on a notebook and the flink runner is used as the underlying runner. I wonder if it can read messages from Kafka. I checked the example notebook

Re: Fails to run two multi-language pipelines locally?

2024-03-06 Thread Jaehyeon Kim
It may be a configuration issue. It works if I don't specify *flink_master*, which uses an embedded flink cluster. On Thu, 7 Mar 2024 at 12:47, Jaehyeon Kim wrote: > I works fine if I only use Kafka read/write as I only see a single > container - two transforms (read and write) but a

Re: Fails to run two multi-language pipelines locally?

2024-03-06 Thread Jaehyeon Kim
d see why it failed to bring up the > containers (or, if they started, look in the container logs to see why > they died). > > On Wed, Mar 6, 2024 at 5:28 PM Jaehyeon Kim wrote: > > > > I am not using the python local runner but the flink runner. A flink > cluster is started

Re: Fails to run two multi-language pipelines locally?

2024-03-06 Thread Jaehyeon Kim
I am not using the python local runner but the flink runner. A flink cluster is started locally. On Thu, 7 Mar 2024 at 12:13, Robert Bradshaw via user wrote: > Streaming portable pipelines are not yet supported on the Python local > runner. > > On Wed, Mar 6, 2024 at 5:03 PM Jaehyeo

Fails to run two multi-language pipelines locally?

2024-03-06 Thread Jaehyeon Kim
Hello, I use the python SDK and my pipeline reads messages from Kafka and transforms via SQL. I see two containers are created but it seems that they don't communicate with each other so that the Flink task manager keeps killing the containers. The Flink cluster runs locally. Is there a way to

Re: Roadmap of Calcite support on Beam SQL?

2024-03-04 Thread Jaehyeon Kim
or > defining window on table is cool). > > Best > > Wiśniowski Piotr > On 4.03.2024 05:27, Jaehyeon Kim wrote: > > Hello, > > I just tried a simple tumbling window but failed with the following error > > RuntimeError: org.apache.beam.sdk.extensions.sql.i

How to change SQL dialect on beam_sql magic?

2024-03-03 Thread Jaehyeon Kim
Hello, beam_sql magic doesn't seem to have an option to specify an SQL dialect while the underlying SqlTransform has the dialect argument. Is there a way to specify an SQL dialect on a notebook? Cheers, Jaehyeon [image: image.png]

Roadmap of Calcite support on Beam SQL?

2024-03-03 Thread Jaehyeon Kim
Hello, I just tried a simple tumbling window but failed with the following error RuntimeError: org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse query WITH cte AS ( SELECT TO_TIMESTAMP(event_datetime) AS ts FROM PCOLLECTION ) SELECT

Re: Request to join slack channel

2024-02-26 Thread Jaehyeon Kim
Hello, Sorry for sending a late request. Can you please invite me to the slack channel as well? Cheers, Jaehyeon On Tue, 27 Feb 2024 at 03:44, Valentyn Tymofieiev via user < user@beam.apache.org> wrote: > Hi, done for all requests since my last message. > > On Mon, Feb 26, 2024 at 12:05 AM

Re: Beam portable runner setup for Flink + Python on Kubernetes

2024-02-22 Thread Jaehyeon Kim
wrote: > I made this a few years ago to help people like yourself. > > https://github.com/sambvfx/beam-flink-k8s > > Hopefully it's insightful and I'm happy to accept any MRs to update any > outdated information or to flesh it out more. > > On Thu, Feb 22, 2024 at 3:4

Beam portable runner setup for Flink + Python on Kubernetes

2024-02-22 Thread Jaehyeon Kim
Hello, I'm playing with the beam portable runner to read/write data from Kafka. I see a spark runner example on Kubernetes ( https://beam.apache.org/documentation/runners/spark/#kubernetes) but the flink runner section doesn't include such an example. Is there a resource that I can learn?