Re: Why beam.io.BigQuerySource() transform taking so long time to read a bigquery table to P-Collection

Chamikara Jayalath Tue, 16 Jan 2018 11:54:07 -0800

BQ Read step involves running a query job in BQ followed by running a
export job to export the resulting table to GCS. Is it possible that these
jobs took a long time for some reason ?


Dataflow job should log the BQ job IDs of these jobs and it should be
possible to check the status using following command.

bq show -j --project_id=<GCP project ID> <BQ job ID>

Feel free to mention your job ID in Dataflow SDK's stackoverflow channel if
you want Dataflow team to take a look.
https://stackoverflow.com/questions/tagged/dataflow

- Cham



On Tue, Jan 16, 2018 at 1:15 AM Unais Thachuparambil <
[email protected]> wrote:

>
> I'm reading a date sharded table from Bigquery (180 days ~ 44.26GB) using
> beam.io.BigQuerySource() by running a simple query
>
>  """
> SELECT
>   filed1,
>   filed2,
>   filed3,
>   filed4,
>   filed5,
>   filed6
> FROM
>   TABLE_DATE_RANGE([dataset:table_name_], TIMESTAMP('{start_date}'),
> TIMESTAMP('{end_date}'))
> WHERE
>   filed1 IS NOT NULL
> """
>
> after that, I'm partitioning the source data based on field2 date and
> converting to date partitioned P-Collections
>
> But while monitoring the data flow console I noticed that the BQRead
> operation taking more than 1hr 40min out of 2hr: 54-minute total execution.
>
> Why the BQ io read taking a long time? Is there any implemented method in
> data flow (I'm using python API) to speed up this process.
>
> How I can reduce the read io execution time?.
>
> Screenshot of graph is attached (Time showed on the graph is wrong - It
> took 2hr 54-min to finish)
>
> [image: Screen Shot 2018-01-16 at 1.06.28 PM.png]
> 
>

Re: Why beam.io.BigQuerySource() transform taking so long time to read a bigquery table to P-Collection

Reply via email to