Re: "Slowly updating global window side inputs" example buggy?

2022-03-28 Thread Siyu Lin
Hi Reza, Just read a blog post[1] by you two years ago and you mentioned > Because this pattern uses a global-window SideInput, matching to elements > being processed will be nondeterministic. Do you mean two workers working on the same windowed main input and use different global windowed

Re: Support null values in kafkaIO

2022-03-28 Thread Alexey Romanenko
Thank you for working on this, John! This case with null key/values seems quite demanded. — Alexey > On 28 Mar 2022, at 21:58, John Casey wrote: > > Unfortunately, there isn't a workaround at the moment. I'm nearing completion > of the fix. > >

Re: [Question] Spark: standard setup to use beam-spark to parallelize python code

2022-03-28 Thread Alexey Romanenko
> On 28 Mar 2022, at 20:58, Mihai Alexe wrote: > > the jackson runtime dependencies should be updated manually (at least to > 2.9.2) in case of using Spark 2.x > > yes - that is exactly what we are looking to achieve, any hints about how to > do that? We’re not Java experts. Do you happen

RE: Re: [Question] Spark: standard setup to use beam-spark to parallelize python code

2022-03-28 Thread Mihai Alexe
* the jackson runtime dependencies should be updated manually (at least to 2.9.2) in case of using Spark 2.x yes - that is exactly what we are looking to achieve, any hints about how to do that? We’re not Java experts. Do you happen to have a CI recipe or binary lis for this particular

Re: [Question] Spark: standard setup to use beam-spark to parallelize python code

2022-03-28 Thread Alexey Romanenko
Well, it’s caused by recent jackson's version update in Beam [1] - so, the jackson runtime dependencies should be updated manually (at least to 2.9.2) in case of using Spark 2.x. Either, use Spark 3..x if possible since it already provides jackson jars of version 2.10.0. [1]

Support null values in kafkaIO

2022-03-28 Thread Abdelhakim Bendjabeur
Hello, I am trying to build a pipeline using Beam's Python SDK to run on Dataflow and I encountered an error when encoding Null value message coming from kafka (tombstone message) ``` Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null byte[] ``` It seems unsupported

Re: Working with SDF

2022-03-28 Thread Julien Chaty-Capelle
Thank you for your feeback ! I tried to make a simpler version of my problem here : https://pastebin.com/sa1P2EEc (I added some print statements as I don't really know how debugger friendly beam is) I'm running on Ubuntu 20.04, Python 3.8.10 with beam 2.37.0 like this (direct runner) : python -m

[Question] Spark: standard setup to use beam-spark to parallelize python code

2022-03-28 Thread Florian Pinault
Greetings, We are setting up an Apache Beam cluster using Spark as a backend to run python code. This is currently a toy example with 4 virtual machines running Centos (a client, a spark main, and two spark-workers). We are running into version issues (detail below) and would need help on which