Aniruddh - Using BigQueryIO.Read with EXPORT method involves a potentially long wait on BQ to complete the export.
I have experience with running Dataflow batch jobs which use this read method to ingest ~10 TB of data in a single job. The behavior I generally see is that the job will progress through obvious stages. First, it sits at the initial number of workers for ~30 minutes, and then quickly ramps up to maxNumWorkers and stays there while it processes data. That initial 30 minute stage is simply waiting on the BigQuery export job to complete. Your Beam job has no control over that as it's entirely BigQuery's responsibility to handle unloading the data into avro files. I don't think that Beam knows about partial data being available; it will essentially block further stages of processing until it determines that the BigQuery export job is complete. Only then does it start reading the avro files from GCS in parallel and being able to do work. Reading from BigQuery seems like an awkward fit for a streaming job. Is this for a static or slowly changing side input for some other streaming data source? On Thu, Apr 23, 2020 at 9:38 AM Aniruddh Sharma <[email protected]> wrote: > Hello > > I want to read a BQ table which has billions of rows. I am using Streaming > mode and using EXORT method. > > Read is running very slow (seems like in batches) and my job is super > slow. Intent of this query is to find what different settings can be > applied to maximize the read throughput from BQ. > > a) I notice in BigQueryOptions there are some options to control the > concurrency of Writes in BQ, but don't find any such options in READ. Can > there be some settings either in DF or BQ to say to read more data and in > parallel in BQ. > > b) I start from numWorkers=10 and maxWorkers=1000, and it constantly runs > on 10 workers, Dataflow does not apply autoscaling, somehow it does not > determine that it can spin up to 1000 workers and have billion of rows > pending to be read and it can spin more machines and read. It doesn't do > that. > > Any guidance will help. > > Thanks > Aniruddh > > >
