BQ Read step involves running a query job in BQ followed by running a export job to export the resulting table to GCS. Is it possible that these jobs took a long time for some reason ?
Dataflow job should log the BQ job IDs of these jobs and it should be possible to check the status using following command. bq show -j --project_id=<GCP project ID> <BQ job ID> Feel free to mention your job ID in Dataflow SDK's stackoverflow channel if you want Dataflow team to take a look. https://stackoverflow.com/questions/tagged/dataflow - Cham On Tue, Jan 16, 2018 at 1:15 AM Unais Thachuparambil < [email protected]> wrote: > > I'm reading a date sharded table from Bigquery (180 days ~ 44.26GB) using > beam.io.BigQuerySource() by running a simple query > > """ > SELECT > filed1, > filed2, > filed3, > filed4, > filed5, > filed6 > FROM > TABLE_DATE_RANGE([dataset:table_name_], TIMESTAMP('{start_date}'), > TIMESTAMP('{end_date}')) > WHERE > filed1 IS NOT NULL > """ > > after that, I'm partitioning the source data based on field2 date and > converting to date partitioned P-Collections > > But while monitoring the data flow console I noticed that the BQRead > operation taking more than 1hr 40min out of 2hr: 54-minute total execution. > > Why the BQ io read taking a long time? Is there any implemented method in > data flow (I'm using python API) to speed up this process. > > How I can reduce the read io execution time?. > > Screenshot of graph is attached (Time showed on the graph is wrong - It > took 2hr 54-min to finish) > > [image: Screen Shot 2018-01-16 at 1.06.28 PM.png] > >
