I'm reading a date sharded table from Bigquery (180 days ~ 44.26GB) using
beam.io.BigQuerySource() by running a simple query
"""
SELECT
filed1,
filed2,
filed3,
filed4,
filed5,
filed6
FROM
TABLE_DATE_RANGE([dataset:table_name_], TIMESTAMP('{start_date}'),
TIMESTAMP('{end_date}'))
WHERE
filed1 IS NOT NULL
"""
after that, I'm partitioning the source data based on field2 date and
converting to date partitioned P-Collections
But while monitoring the data flow console I noticed that the BQRead
operation taking more than 1hr 40min out of 2hr: 54-minute total execution.
Why the BQ io read taking a long time? Is there any implemented method in
data flow (I'm using python API) to speed up this process.
How I can reduce the read io execution time?.
Screenshot of graph is attached (Time showed on the graph is wrong - It
took 2hr 54-min to finish)