Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread rahul patwari
Hi Kenn, Rui, The pipeline that we are trying is exactly what Kenn has mentioned above i.e. Read From Kafka => Apply Fixed Windows of 1 Min => SqlTransform => Write to Hive using HcatalogIO We are interested in understanding the behaviour when the source is Unbounded and Sink is bounded as this p

Re: Help needed on Dataflow worker exception of WriteToBigQuery

2020-02-24 Thread Reza Rokni
Could you try with --tempLocation set to a GCS bucket that your pipeline has access to in your pipeline options? Cheers Reza On Tue, Feb 25, 2020 at 9:23 AM Wenbing Bai wrote: > Hi there, > > I am using WriteToBigQuery in apache-beam Python SDK 2.16. I get this > error when I run my pipeline in

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
I see. So I guess I wasn't fully understand the requirement: Do you want to have a 1-min window join on two unbounded sources and write to sink when the window closes ? Or there is an extra requirement such that you also want to write to sink every minute per window? For the former, you can do it

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
Sorry please remove " .apply(Window.into(FixedWindows.of(1 minute))" from the query above. -Rui On Mon, Feb 24, 2020 at 5:26 PM Rui Wang wrote: > I see. So I guess I wasn't fully understand the requirement: > > Do you want to have a 1-min window join on two unbounded sources and write > to si

Help needed on Dataflow worker exception of WriteToBigQuery

2020-02-24 Thread Wenbing Bai
Hi there, I am using WriteToBigQuery in apache-beam Python SDK 2.16. I get this error when I run my pipeline in Dataflow Runner. RuntimeError: IOError: [Errno 2] Not found: gs://tmp-e3271c8deb2f655-0-of-1.avro [while running 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Kenneth Knowles
I think actually it depends on the pipeline. You cannot do it all in SQL, but if you mix Java and SQL I think you can do this. If you write this: pipeline.apply(KafkaIO.read() ... ) .apply(Window.into(FixedWindows.of(1 minute)) .apply(SqlTransform("SELECT ... FROM stream1 JOIN

Re: Unable to reliably have multiple cores working on a dataset with DirectRunner

2020-02-24 Thread Hannah Jiang
This issue will be fixed from v2.20. PR: https://github.com/apache/beam/pull/10847 On Fri, Jan 31, 2020 at 9:52 AM Hannah Jiang wrote: > Yeap, here is the Jira ticket. BEAM-9228 > > I just confirmed that the reshuffle operation is not being call

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-24 Thread Rui Wang
That is great. Feel free to send the patch and I can review it. -Rui On Mon, Feb 24, 2020, 3:54 PM Talat Uyarer wrote: > Hi Rui, > > I solved the issue. After 1.21 version they are not getting flattened in > LogicalPlan. > > Thanks for your help. I am going to create a patch for it. > > Talat >

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-24 Thread Talat Uyarer
Hi Rui, I solved the issue. After 1.21 version they are not getting flattened in LogicalPlan. Thanks for your help. I am going to create a patch for it. Talat On Sat, Feb 15, 2020 at 6:26 PM Rui Wang wrote: > Because Calcite flattened Row so BeamSQL didn't need to deal with nested > Row struc

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-24 Thread Maximilian Michels
Thank you for reporting / filing / collecting the issues. There is a fix pending: https://github.com/apache/beam/pull/10950 As for the upgrade issues, the 1.8 and 1.9 upgrade is trivial. I will check out the Flink 1.10 PR tomorrow. Cheers, Max On 24.02.20 09:26, Ismaël Mejía wrote: We are cu

Re: Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread Rui Wang
SQL does not support such joins with your requirement: write to sink after every 1 min after window closes. You might can use state and timer API to achieve your goal. -Rui On Mon, Feb 24, 2020 at 9:50 AM shanta chakpram wrote: > Hi, > > I am trying to join inputs from Unbounded Sources then

Unbounded input join Unbounded input then write to Bounded Sink

2020-02-24 Thread shanta chakpram
Hi, I am trying to join inputs from Unbounded Sources then write to Bounded Sink. The pipeline I'm trying is: Kafka Sources -> SqlTransform -> HCatalogIO Sink And, a FixedWindow of 1 minute duration is applied. I'm expecting the inputs from unbounded sources joined within the current window to

Re: Bay Area Beam Meetup 19 Feb (Last Wednesday).

2020-02-24 Thread Ahmet Altay
Thank you for sharing talks! On Fri, Feb 21, 2020 at 9:18 PM Austin Bennett wrote: > Hi All, > > We had a meetup @Sentry.io on Wednesday -- with a solid 40+ engaged > attendees. > > Thanks for those that joined in person, and for those that were unable, > talks can be found online --> > Syd's ta

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-24 Thread Ismaël Mejía
We are cutting the release branch for 2.20.0 next wednesday, so not sure if these tickets will make it, but hopefully. For ref, BEAM-9295 Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 BEAM-9299 Upgrade Flink Runner to 1.8.3 and 1.9.2 In any case if you have cycles to