Thanks for the code sample, when I switched to use bigquery_file_loads.BigQueryBatchFileLoads instead of bigquery.WriteToBigQuery it works ok now. Not sure why with WriteToBigQuery doesn't work, since it's using BigQueryBatchFileLoads under the hood...
Thanks for the help. Zdenko _______________________ http://www.the-swamp.info On Wed, Sep 4, 2019 at 6:55 PM Chamikara Jayalath <[email protected]> wrote: > +Pablo Estrada <[email protected]> who added this. > > I don't think we have tested this specific option but I believe additional > BQ parameters option was added in a generic way to accept all additional > parameters. > > Looking at the code, seems like additional parameters do get passed > through to load jobs: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L427 > > One thing you can try out is trying to run a BQ load job directly with the > same set of data and options to see if the data gets loaded. > > Thanks, > Cham > > On Tue, Sep 3, 2019 at 2:24 PM Zdenko Hrcek <[email protected]> wrote: > >> Greetings, >> >> I am using Beam 2.15 and Python 2.7. >> I am doing a batch job to load data from CSV and upload to BigQuery. I >> like functionality that instead of streaming to BigQuery I can use "file >> load", to load table all at once. >> >> For my case, there are few "bad" records in the input (it's geo data and >> during manual upload, BigQuery doesn't accept those as valid geography >> records. this is easily solved by setting the number of max bad records. >> If I understand correctly, WriteToBigQuery supports >> "additional_bq_parameters", but for some reason when running a pipeline on >> Dataflow runner it looks like those settings are ignored. >> >> I played with an example from the documentation >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py >> with >> gist file >> https://gist.github.com/zdenulo/99877307981b4d372df5a662d581a5df >> where the table should be created on the partitioned field and clustered, >> but when running on Dataflow it doesn't happen. >> When I run on DirectRunner it works as expected. interestingly, when I >> add maxBadRecords parameter to additional_bq_parameters, DirectRunner >> complains that it doesn't recognize that option. >> >> This is the first time using this setup/combination so I'm just wondering >> if I overlooked something. I would appreciate any help. >> >> Best regards, >> Zdenko >> >> >> _______________________ >> http://www.the-swamp.info >> >>
