Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
Hi Reza, Rui, Can we use [slowly changing lookup cache] approach if the source is [HDFS (or) HIVE] (data is changing), where the PCollection cannot fit into Memory in BeamSQL? This PCollection will be JOINED with Windowed PCollection Created from Reading data in Kafka in BeamSQL. Thanks and

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Rakesh Kumar
Congrats Rob!!! On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay wrote: > Hi, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Robert > Burke. > > Robert has been contributing to Beam and actively involved in the > community for over a year. He has been actively

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Hannah Jiang
Congratulations, Rebo! > On Jul 16, 2019, at 5:14 PM, Connell O'Callaghan wrote: > > Excellent - congratulations Rebo > >> On Tue, Jul 16, 2019 at 4:38 PM Tanay Tummalapalli >> wrote: >> Congratulations! >> >>> On Wed, Jul 17, 2019 at 3:27 AM Aizhamal Nurmamat kyzy >>> wrote: >>>

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Connell O'Callaghan
Excellent - congratulations Rebo On Tue, Jul 16, 2019 at 4:38 PM Tanay Tummalapalli wrote: > Congratulations! > > On Wed, Jul 17, 2019 at 3:27 AM Aizhamal Nurmamat kyzy < > aizha...@google.com> wrote: > >> Congratulations, Rebo >> >> On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Tanay Tummalapalli
Congratulations! On Wed, Jul 17, 2019 at 3:27 AM Aizhamal Nurmamat kyzy wrote: > Congratulations, Rebo > > On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath > wrote: > >> Congrats!! >> >> On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu wrote: >> >>> Congrats, Robert!! >>> >>> On Tue, Jul 16,

Re: [DISCUSS] Thoughts on stateful DoFns in merging windows

2019-07-16 Thread Kenneth Knowles
In actual practice, Combine + GBK are actually implemented using the same underlying code as user-facing state & timers. So you are very right :-). I think the use cases are very different, and the requirements to be more "safe by default" for users. The way we could do timer merging if we

Re: Using the BigQuery Storage API

2019-07-16 Thread Rui Wang
Hi, I have a fresh repo cloned and switch to release-2.13.0. I tried to add "import org.apache.beam.sdk.Pipeline" to BigQueryTornadoes.java and succeeded(I am using

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Aizhamal Nurmamat kyzy
Congratulations, Rebo On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath wrote: > Congrats!! > > On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu wrote: > >> Congrats, Robert!! >> >> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold wrote: >> >>> Congrats, Robert! >>> >>> On Tue, Jul 16, 2019 at

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Heejong Lee
Congratulations! On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath wrote: > Congrats!! > > On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu wrote: > >> Congrats, Robert!! >> >> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold wrote: >> >>> Congrats, Robert! >>> >>> On Tue, Jul 16, 2019 at 11:46 AM

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Reza Rokni
+1 On Tue, 16 Jul 2019 at 20:36, Rui Wang wrote: > Another approach is to let BeamSQL support it natively, as the title of > this thread says: "as a Table in BeamSQL". > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Eugene Kirpichov
I'd like to reiterate the request to not build anything on top of FileBasedSource/Reader. If the design requires having some interface for representing a function from a filename to a stream of records, better introduce a new interface for that. If it requires interoperability with other IOs that

[RESULT] [VOTE] Vendored Dependencies Release

2019-07-16 Thread Lukasz Cwik
I'm happy to announce that we have unanimously approved this release. There are 4 approving votes, 3 of which are binding: * Ismaël Mejía * Lukasz Cwik * Pablo Estrada There are no disapproving votes. Thanks everyone! On Tue, Jul 16, 2019 at 4:30 AM Ismaël Mejía wrote: > +1 > > Run build and

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Chamikara Jayalath
Congrats!! On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu wrote: > Congrats, Robert!! > > On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold wrote: > >> Congrats, Robert! >> >> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía wrote: >> >>> Congrats Robert! >>> >>> >>> On Tue, Jul 16, 2019 at 8:19 PM Yichi

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Robin Qiu
Congrats, Robert!! On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold wrote: > Congrats, Robert! > > On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía wrote: > >> Congrats Robert! >> >> >> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang wrote: >> > >> > Congratulations! >> > >> > On Tue, Jul 16, 2019 at

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Alan Myrvold
Congrats, Robert! On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía wrote: > Congrats Robert! > > > On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang wrote: > > > > Congratulations! > > > > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau > wrote: > >> > >> Congratulations! :) > >> > >> On Tue, Jul 16,

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Rui Wang
Another approach is to let BeamSQL support it natively, as the title of this thread says: "as a Table in BeamSQL". We might be able to define a table with properties that says this table return a PCollectionView. By doing so we will have a trigger based PCollectionView available in SQL rel nodes,

Re: Docker Run Options in SDK Container

2019-07-16 Thread Ankur Goenka
Thanks for summarizing the discussion. A few comments inline below: On Mon, Jul 15, 2019 at 5:28 PM Sam Bourne wrote: > Hello Beam devs, > > I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support > passing options/flags to the docker run command executed as part of the >

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ismaël Mejía
Congrats Robert! On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang wrote: > > Congratulations! > > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau wrote: >> >> Congratulations! :) >> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin wrote: >>> >>> Congratulations! >>> >>> On Tue, Jul 16, 2019 at

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Yichi Zhang
Congratulations! On Tue, Jul 16, 2019 at 10:51 AM Holden Karau wrote: > Congratulations! :) > > On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin > wrote: > >> Congratulations! >> >> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka wrote: >> >>> Congratulations Robert! >>> >>> Go GO! >>> >>> On

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Yichi Zhang
Thanks for organizing this Pablo, it'll be very helpful! On Tue, Jul 16, 2019 at 10:57 AM Pablo Estrada wrote: > Hello all, > I'll be having a session where I live-fix a Beam bug for 1 hour next week. > Everyone is invited. > > It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I

Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Pablo Estrada
Hello all, I'll be having a session where I live-fix a Beam bug for 1 hour next week. Everyone is invited. It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I will finish a full change in that time frame, but we'll see. I have not yet decided if I will do this via hangouts, or via

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Holden Karau
Congratulations! :) On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin wrote: > Congratulations! > > On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka wrote: > >> Congratulations Robert! >> >> Go GO! >> >> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang wrote: >> >>> Congrats! >>> >>> >>> -Rui >>> >>>

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Mikhail Gryzykhin
Congratulations! On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka wrote: > Congratulations Robert! > > Go GO! > > On Tue, Jul 16, 2019 at 10:34 AM Rui Wang wrote: > >> Congrats! >> >> >> -Rui >> >> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri wrote: >> >>> Congrats Robert B.! >>> >>> On Tue, Jul

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Anton Kedin
Congrats! On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka wrote: > Congratulations Robert! > > Go GO! > > On Tue, Jul 16, 2019 at 10:34 AM Rui Wang wrote: > >> Congrats! >> >> >> -Rui >> >> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri wrote: >> >>> Congrats Robert B.! >>> >>> On Tue, Jul 16, 2019

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ankur Goenka
Congratulations Robert! Go GO! On Tue, Jul 16, 2019 at 10:34 AM Rui Wang wrote: > Congrats! > > > -Rui > > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri wrote: > >> Congrats Robert B.! >> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay wrote: >> >>> Hi, >>> >>> Please join me and the rest of

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ruoyun Huang
Congratulations ! On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay wrote: > Hi, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Robert > Burke. > > Robert has been contributing to Beam and actively involved in the > community for over a year. He has been actively

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Yifan Zou
Congratulations!! On Tue, Jul 16, 2019 at 10:34 AM Rui Wang wrote: > Congrats! > > > -Rui > > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri wrote: > >> Congrats Robert B.! >> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay wrote: >> >>> Hi, >>> >>> Please join me and the rest of the Beam PMC in

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Rui Wang
Congrats! -Rui On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri wrote: > Congrats Robert B.! > > On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay wrote: > >> Hi, >> >> Please join me and the rest of the Beam PMC in welcoming a new committer: >> Robert >> Burke. >> >> Robert has been contributing to

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Udi Meiri
Congrats Robert B.! On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay wrote: > Hi, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Robert > Burke. > > Robert has been contributing to Beam and actively involved in the > community for over a year. He has been actively

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Reza Rokni
Congratulations ! On Tue, 16 Jul 2019, 18:24 Ahmet Altay, wrote: > Hi, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Robert > Burke. > > Robert has been contributing to Beam and actively involved in the > community for over a year. He has been actively working

[ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ahmet Altay
Hi, Please join me and the rest of the Beam PMC in welcoming a new committer: Robert Burke. Robert has been contributing to Beam and actively involved in the community for over a year. He has been actively working on Go SDK, helping users, and making it easier for others to contribute [1]. In

Re: Write-through-cache in State logic

2019-07-16 Thread Thomas Weise
Thanks for the pointer. For streaming, it will be important to support caching across bundles. It appears that even the Java SDK doesn't support that yet?

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Chamikara Jayalath
Thanks this clarifies a lot. For writer, I think it's great if you can utilize existing FileIO.Sink implementations even if you have to reimplement some of the logic (for example compression, temp file handling) that is already implemented in Beam FileIO/WriteFiles transforms in your SMB sink

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Neville Li
A little clarification of the IO requirement and my understanding of the current state of IO. tl;dr: not sure if there're reusable bits for the reader. It's possible to reuse some for the writer but with heavy refactoring. *Reader* - For each bucket (containing the same key partition,

Re: [Python] Read Hadoop Sequence File?

2019-07-16 Thread Shannon Duncan
I am still having the problem that local file system (DirectRunner) will not allow a local GLOB string to be passed as a file source. I have tried both relative path and fully qualified paths. I can confirm the same inputFile source GLOB returns data on a simple cat command. So I know the GLOB is

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Reza Rokni
Hi Rahul, FYI, that patterns is also available in the Beam docs ( with updated code example ) https://beam.apache.org/documentation/patterns/side-input-patterns/. Please note in the DoFn that feeds the View.asSingleton() you will need to manually call BigQuery using the BigQuery client.

Re: Write-through-cache in State logic

2019-07-16 Thread Lukasz Cwik
User state is built on top of read, append and clear and not off a read and write paradigm to allow for blind appends. The optimization you speak of can be done completely inside the SDK without any additional protocol being required as long as you clear the state first and then append all your

Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
Hi, we are following [*Pattern: Slowly-changing lookup cache*] from https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1 We have a use case to read slowly changing bounded data as a PCollection along with the main PCollection from Kafka(windowed) and

Re: Write-through-cache in State logic

2019-07-16 Thread Robert Bradshaw
Python workers also have a per-bundle SDK-side cache. A protocol has been proposed, but hasn't yet been implemented in any SDKs or runners. On Tue, Jul 16, 2019 at 6:02 AM Reuven Lax wrote: > > It's runner dependent. Some runners (e.g. the Dataflow runner) do have such a > cache, though I think

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Robert Bradshaw
On Mon, Jul 15, 2019 at 7:03 PM Eugene Kirpichov wrote: > > Quick note: I didn't look through the document, but please do not build on > either FileBasedSink or FileBasedReader. They are both remnants of the old, > non-composable IO world; and in fact much of the composable IO work emerged >

[DISCUSS] Reconciling SetState in Java and Python

2019-07-16 Thread Rakesh Kumar
Hi, I noticed that SetState is implemented in Java SDK but not implemented in Python SDK. I have filed the jira ticket and I am thinking to implement in Python SDK. Let me know if anyone has any concerns. Also, feel free to pass the link of the