Re: Optimize Read from BQ in Streaming Mode

Chamikara Jayalath Thu, 23 Apr 2020 11:50:21 -0700

Current BQ source is a bounded source. Basically you are reading a SNAPSHOT
of a BQ table at a given point in time. It's possible to use the BQ source
(and any other bounded source) from a streaming pipeline. This will result
in an automatic bounded to unbounded converter being invoked that produces
a bare bones bounded source that might not scale well as you noticed.


- Cham

On Thu, Apr 23, 2020 at 6:39 AM Aniruddh Sharma <[email protected]>
wrote:

> Adding the subject line.
>
> On 2020/04/23 13:38:16, Aniruddh Sharma <[email protected]> wrote:
> > Hello
> >
> > I want to read a BQ table which has billions of rows. I am using
> Streaming mode and using EXORT method.
> >
> > Read is running very slow (seems like in batches) and my job is super
> slow. Intent of this query is to find what different settings can be
> applied to maximize the read throughput from BQ.
> >
> > a) I notice in BigQueryOptions there are some options to control the
> concurrency of Writes in BQ, but don't find any such options in READ.  Can
> there be some settings either in DF or BQ to say to read more data and in
> parallel in BQ.
> >
> > b) I start from numWorkers=10 and maxWorkers=1000, and it constantly
> runs on 10 workers, Dataflow does not apply autoscaling, somehow it does
> not determine that it can spin up to 1000 workers and have billion of rows
> pending to be read and it can spin more machines and read. It doesn't do
> that.
> >
> > Any guidance will help.
> >
> > Thanks
> > Aniruddh
> >
> >
> >
>

Re: Optimize Read from BQ in Streaming Mode

Reply via email to