Re: python3-avro with CombineGlobally(CombineFn)

2019-05-03 Thread Valentyn Tymofieiev
Correction: we are making fastavro a default option on Python 3 only for now. You can follow https://github.com/apache/beam/pull/8130 for updates on that. *From:*Valentyn Tymofieiev *Date:*Fri, May 3, 2019, 10:41 PM *To:* Hi, > > Unfortunately, Avro library currently doesn't work well with Pyth

Re: python3-avro with CombineGlobally(CombineFn)

2019-05-03 Thread Valentyn Tymofieiev
Hi, Unfortunately, Avro library currently doesn't work well with Python 3. Could you try using fastavro in your pipeline and report back if that helped to resolve your issue? We are also making fastavro a default option, likely starting from 2.13.0. You could use fastavro as follows (sent from P

python3-avro with CombineGlobally(CombineFn)

2019-05-03 Thread Chengxuan Wang
Hi, I am trying to create a PTransform to combine avro schemas. But I met `json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)`, I think It is related to https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1058 . Because avro didn't implement __ne__, it will use

Re: Triggering early & TestStream

2019-05-03 Thread Kenneth Knowles
On Fri, May 3, 2019 at 2:10 PM Mike Kaplinskiy wrote: > Hey Folks, > > I'm playing around with using TestStream and having a bit of a hard time > trying to use it for early firings within a global window. My test stream > looks like something like this: > > TestStream.create(...) > .advanceWa

Triggering early & TestStream

2019-05-03 Thread Mike Kaplinskiy
Hey Folks, I'm playing around with using TestStream and having a bit of a hard time trying to use it for early firings within a global window. My test stream looks like something like this: TestStream.create(...) .advanceWatermarkTo('2019-01-01') .addElements(kv1) .advanceProcessingTi

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-03 Thread Jan Lukavský
Hi, On 5/3/19 12:20 PM, Maximilian Michels wrote: Hi Jan, Typical example could be a machine learning task, where you might have a lot of data cleansing and simple transformations, followed by some ML algorithm (e.g. SVD). One might want to use Spark MLlib for the ML task, but Beam for all t

Re: kafka client interoperability

2019-05-03 Thread Alexey Romanenko
Oops, I see that Richard already created a Jira about that, so I close mine as a duplicate. > On 3 May 2019, at 15:58, Alexey Romanenko wrote: > > Thank you for reporting this. > > Seems like it’s a bug there (since ProducerRecord from kafka-clients:0.10.2.1 > doesn’t support headers), so I

Re: kafka client interoperability

2019-05-03 Thread Alexey Romanenko
Thank you for reporting this. Seems like it’s a bug there (since ProducerRecord from kafka-clients:0.10.2.1 doesn’t support headers), so I created a Jira for that: https://issues.apache.org/jira/browse/BEAM-7217 Unfortunately, I can’t reproduce

Re: kafka client interoperability

2019-05-03 Thread Moorhead,Richard
We attempted a downgrade to beam-sdks-java-io-kafka 2.9 while using 2.10 for the rest and ran into issues. I still see checks to the ConsumerSpel throughout ProducerRecordCoder and I am beginning to think this is a bug. From: Juan Carlos Garcia Sent: Thursday, M

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-03 Thread Maximilian Michels
Hi Jan, Typical example could be a machine learning task, where you might have a lot of data cleansing and simple transformations, followed by some ML algorithm (e.g. SVD). One might want to use Spark MLlib for the ML task, but Beam for all the transformations around. Then, porting to different r