Why beam.io.BigQuerySource() transform taking so long time to read a bigquery table to P-Collection

Unais Thachuparambil Tue, 16 Jan 2018 01:15:45 -0800

I'm reading a date sharded table from Bigquery (180 days ~ 44.26GB) using
beam.io.BigQuerySource() by running a simple query


 """
SELECT
  filed1,
  filed2,
  filed3,
  filed4,
  filed5,
  filed6
FROM
  TABLE_DATE_RANGE([dataset:table_name_], TIMESTAMP('{start_date}'),
TIMESTAMP('{end_date}'))
WHERE
  filed1 IS NOT NULL
"""

after that, I'm partitioning the source data based on field2 date and
converting to date partitioned P-Collections

But while monitoring the data flow console I noticed that the BQRead
operation taking more than 1hr 40min out of 2hr: 54-minute total execution.

Why the BQ io read taking a long time? Is there any implemented method in
data flow (I'm using python API) to speed up this process.

How I can reduce the read io execution time?.

Screenshot of graph is attached (Time showed on the graph is wrong - It
took 2hr 54-min to finish)

Why beam.io.BigQuerySource() transform taking so long time to read a bigquery table to P-Collection

Reply via email to