Re: [RESULT] [VOTE] Release 2.46.0, release candidate #1

2023-03-13 Thread Ahmet Altay via dev
Thank you very much for doing the release! On Fri, Mar 10, 2023 at 7:30 PM Danny McCormick via dev wrote: > The release has been finalized, thanks everyone! > > Thanks, > Danny > > On Fri, Mar 10, 2023 at 2:38 PM Danny McCormick > wrote: > >> We determined that the same issue exists in the

Re: direct runner OOM issue

2023-03-13 Thread wilsonny...@gmail.com
Thank you Robert for the clarification. Regards - Wilson(Xiaoshuang) Wang Sr. Software Engineer On Mon, Mar 13, 2023 at 1:34 PM Robert Bradshaw wrote: > On Mon, Mar 13, 2023 at 12:32 PM wilsonny...@gmail.com > wrote: > > > >

Re: direct runner OOM issue

2023-03-13 Thread Robert Bradshaw via dev
On Mon, Mar 13, 2023 at 12:32 PM wilsonny...@gmail.com wrote: > > Thank you Robert. > > Yeah, I agree using flink can work in this case. However, in our environment, > we are using ray beam runner which will take advantage of Ray framework. > > Internally, ray beam runner is using FnApiRunner to

Re: direct runner OOM issue

2023-03-13 Thread wilsonny...@gmail.com
Thank you Robert. Yeah, I agree using flink can work in this case. However, in our environment, we are using ray beam runner which will take advantage of Ray framework. Internally, ray beam runner is using FnApiRunner to run stages and bundles. After some code walkthrough, I am wondering how

Re: Infrastructure-as-Code to provision a private GKE autopilot kubernetes cluster and strimzi kafka

2023-03-13 Thread Damon Douglas via dev
Hello Everyone, https://github.com/apache/beam/pull/25686 is approved and merged. I describe below its two new Beam project assets for those who know Kubernetes and terraform and those who don't. *Short Version (For those who know Kubernetes and terraform)*: This PR provides: 1. An end-to-end

Re: direct runner OOM issue

2023-03-13 Thread Robert Bradshaw via dev
The FnApiRunner is primarily for tiny jobs (development and testing) and holds all the data in memory. You'll likely have to run with a "real" runner to operate over datasets of this size. If you want to run locally, you can pass --runner=FlinkRunner and (assuming you have Java installed) it will

Re: direct runner OOM issue

2023-03-13 Thread wilsonny...@gmail.com
Let me provide more details. We are running TFX and we specified beam FnApiRunner as the underlying runner type. Our dataset is a large amount of HDFS files, each around 200MB and the total are around 200GB. When running our TFX code, we saw OOM issue. I assume this is due to Beam FnApiRunner

Re: Refactor BigQuery SchemaTransforms naming

2023-03-13 Thread Damon Douglas via dev
Thank you, Ahmed. This is ready now. On Fri, Mar 3, 2023 at 9:18 AM Ahmed Abualsaud wrote: > Thank you Damon, I left a few comments. > > On Fri, Mar 3, 2023 at 11:14 AM Damon Douglas via dev > wrote: > >> Hello Everyone, >> >> This PR brings BigQuery Schema transforms in line with the others

Re: direct runner OOM issue

2023-03-13 Thread wilsonny...@gmail.com
Python Beam direct runner. Regards - Wilson(Xiaoshuang) Wang Sr. Software Engineer On Mon, Mar 13, 2023 at 11:29 AM Robert Burke wrote: > Which direct runner? They are language specific. > > On Mon, Mar 13, 2023, 11:27 AM

Re: direct runner OOM issue

2023-03-13 Thread Robert Burke
Which direct runner? They are language specific. On Mon, Mar 13, 2023, 11:27 AM wilsonny...@gmail.com wrote: > Hi guys, > > We are trying to run our pipeline using direct runner and the input > dataset is a large amount of HDFS files (few hundred of GB data) > > We experienced OOM issue crash.

direct runner OOM issue

2023-03-13 Thread wilsonny...@gmail.com
Hi guys, We are trying to run our pipeline using direct runner and the input dataset is a large amount of HDFS files (few hundred of GB data) We experienced OOM issue crash. Then inside the direct runner document, I realized direct runner loads the whole dataset into the memory. Is there any

Beam High Priority Issue Report (35)

2023-03-13 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/25675 [Bug]: Reenable