Re: [Question] Python Streaming Pipeline Support

2024-03-08 Thread Wiśniowski Piotr
Hi, There are two more things that I would suggest to try: 1. PipelineOptions( beam_args, streaming=True, ) The `streaming` flag changes the mode how the runners operate. Not sure why, but I found this required to get similar behavior that you want to get. It might be required even if You

RE: [Question] Python Streaming Pipeline Support

2024-03-08 Thread Puertos tavares, Jose J (Canada) via user
Hello group: I believe it might be interesting to show what I have found so found with you feedback as I have corroborated that the Direct Runners and Flink Runner DO work on streaming, but it seems more of a constraint on the definition of the PCollection rather than the operators, as show

Re: [Question] Python Streaming Pipeline Support

2024-03-08 Thread Puertos tavares, Jose J (Canada) via user
Hello Robert: Thanks for your answer, however as I posted at the begging tested it with Flink as well. There's more interesting as Reshuffle is a native (balance) operation but still doesn't seem to progress with streaming. Where you able to run int successfully with the expected

Re: Fails to run two multi-language pipelines locally?

2024-03-08 Thread Robert Bradshaw via user
The way that cross-language pipelines work is that each transform has an attached "environment" in which its workers should be instantiated. By default these are identified as docker images + a possible set of dependencies. Transforms with the same environment can be colocated. There is a tension

Re: [Question] Python Streaming Pipeline Support

2024-03-08 Thread Robert Bradshaw via user
The Python Local Runner has limited support for streaming pipelines. For the time being would recommend using Dataflow or Flink (the latter can be run locally) to try out streaming pipelines. On Fri, Mar 8, 2024 at 2:11 PM Puertos tavares, Jose J (Canada) via user wrote: > > Hello Hu: > > > >

Re: [Question] ReadFromKafka can't get messages.

2024-03-08 Thread Jaehyeon Kim
I've been teaching myself Beam and here is an example pipeline that uses Kafka IO (read and write). I hope it helps. *Prerequisites* 1. Kafka runs on Docker and its external listener is exposed on port 29092 (i.e. its bootstrap server address can be specified as localhost:29092) 2. The following

Re: [python] Side Inputs to CombineGlobally

2024-03-08 Thread Valentyn Tymofieiev via user
It appears to be broken and there is a known issue : https://github.com/apache/beam/issues/19851 . On Fri, Mar 8, 2024 at 8:40 AM Joey Tran wrote: > In the python SDK, should we be able to supply side inputs to > CombineGlobally? > > I created an example here that fails at the pipeline

RE: [Question] Python Streaming Pipeline Support

2024-03-08 Thread Puertos tavares, Jose J (Canada) via user
Hello Hu: Not really. This one as you have coded it finishes as per stop_timestamp=time.time() + 16 and after it finish emitting then everything else gets output and the pipeline in batch mode terminates. You can rule out STDOUT issues and confirm this behavior as putting a ParDo with

Re: [Question] ReadFromKafka can't get messages.

2024-03-08 Thread Chamikara Jayalath via user
Which runner are you using ? There's a known issue with SDFs not triggering for portable runners: https://github.com/apache/beam/issues/20979 This should not occur for Dataflow. For Flink, you could use the option "--experiments=use_deprecated_read" to make it work. Thanks, Cham On Fri, Mar 8,

Re: How to change SQL dialect on beam_sql magic?

2024-03-08 Thread XQ Hu via user
I do not think the dialect argument is exposed here: https://github.com/apache/beam/blob/a391198b5a632238dc4a9298e635bb5eb0f433df/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py#L293 Two options: 1) create a feature request and PR to add that 2) Switch to SqlTransform On Mon,

Re: [Question] Python Streaming Pipeline Support

2024-03-08 Thread XQ Hu via user
Is this what you are looking for? import random import time import apache_beam as beam from apache_beam.transforms import trigger, window from apache_beam.transforms.periodicsequence import PeriodicImpulse from apache_beam.utils.timestamp import Timestamp with beam.Pipeline() as p: input =

[python] Side Inputs to CombineGlobally

2024-03-08 Thread Joey Tran
In the python SDK, should we be able to supply side inputs to CombineGlobally? I created an example here that fails at the pipeline translation stage https://play.beam.apache.org/?sdk=python=vjM_k2TvNrf It fails with ``` File

[Question] ReadFromKafka can't get messages.

2024-03-08 Thread LDesire
Hello Apache Beam community. I'm asking because while creating a beam pipeline in Python, ReadFromKafka is not working. My code looks like this ``` @beam.ptransform_fn def LogElements(input): def log_element(elem): print(elem) return elem return ( input |

[Question] Python Streaming Pipeline Support

2024-03-08 Thread Puertos tavares, Jose J (Canada) via user
Hello Beam Users! I was looking into a simple example in Python to have an unbound (--streaming flag ) pipeline that generated random numbers , applied a Fixed Window (let's say 5 seconds) and then applies a group by operation ( reshuffle) and print the result just to check. I notice that