Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
Hi Reza, Rui, Can we use [slowly changing lookup cache] approach if the source is [HDFS (or) HIVE] (data is changing), where the PCollection cannot fit into Memory in BeamSQL? This PCollection will be JOINED with Windowed PCollection Created from Reading data in Kafka in BeamSQL. Thanks and

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Reza Rokni
+1 On Tue, 16 Jul 2019 at 20:36, Rui Wang wrote: > Another approach is to let BeamSQL support it natively, as the title of > this thread says: "as a Table in BeamSQL". > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Rui Wang
Another approach is to let BeamSQL support it natively, as the title of this thread says: "as a Table in BeamSQL". We might be able to define a table with properties that says this table return a PCollectionView. By doing so we will have a trigger based PCollectionView available in SQL rel nodes,

Scio v0.8.0-alpha2

2019-07-16 Thread Filipe Regadas
Hi all, v0.8.0-alpha2 is out  This release brings a lot of bug fixes and improvements over alpha1. Thanks to all of you that helped making this possible! ❤️ Cheers, Regadas v0.8.0-alpha2 Release Notes Features Update beam to

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Yichi Zhang
Thanks for organizing this Pablo, it'll be very helpful! On Tue, Jul 16, 2019 at 10:57 AM Pablo Estrada wrote: > Hello all, > I'll be having a session where I live-fix a Beam bug for 1 hour next week. > Everyone is invited. > > It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I

Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Pablo Estrada
Hello all, I'll be having a session where I live-fix a Beam bug for 1 hour next week. Everyone is invited. It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I will finish a full change in that time frame, but we'll see. I have not yet decided if I will do this via hangouts, or via

Re: [Python] Read Hadoop Sequence File?

2019-07-16 Thread Shannon Duncan
I am still having the problem that local file system (DirectRunner) will not allow a local GLOB string to be passed as a file source. I have tried both relative path and fully qualified paths. I can confirm the same inputFile source GLOB returns data on a simple cat command. So I know the GLOB is

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Reza Rokni
Hi Rahul, FYI, that patterns is also available in the Beam docs ( with updated code example ) https://beam.apache.org/documentation/patterns/side-input-patterns/. Please note in the DoFn that feeds the View.asSingleton() you will need to manually call BigQuery using the BigQuery client.

Re: Industrializing batch ML algorithm using Apache Beam/Dataflow (on Google Cloud Platform)

2019-07-16 Thread Massy Bourennani
Sorry Germain ! and thanks again :) Le mar. 16 juil. 2019 à 15:00, Massy Bourennani a écrit : > Hi David ! > this helps a lot, > Many thanks :) > Massy > > Le mar. 16 juil. 2019 à 11:12, Germain Tanguy < > germain.tan...@dailymotion.com> a écrit : > >> Hello Massy, >> >> I just answer on

Re: Industrializing batch ML algorithm using Apache Beam/Dataflow (on Google Cloud Platform)

2019-07-16 Thread Massy Bourennani
Hi David ! this helps a lot, Many thanks :) Massy Le mar. 16 juil. 2019 à 11:12, Germain Tanguy < germain.tan...@dailymotion.com> a écrit : > Hello Massy, > > I just answer on reddit, I copy/paste answer here in case someone is > interested too. > > > > Dataflow support python 3.5 >

Re: Industrializing batch ML algorithm using Apache Beam/Dataflow (on Google Cloud Platform)

2019-07-16 Thread Germain Tanguy
Hello Massy, I just answer on reddit, I copy/paste answer here in case someone is interested too. Dataflow support python 3.5. In my company we do use apache-beam/dataflow in prod with a setup.py to initialize dependencies,

Industrializing batch ML algorithm using Apache Beam/Dataflow (on Google Cloud Platform)

2019-07-16 Thread Massy Bourennani
Hi all, Here is the link to the Reddit post[1] Many thanks for your help. Massy [1] https://www.reddit.com/r/dataengineering/comments/cdp5i3/industrializing_batch_ml_algorithm_using_apache/