+Pablo Estrada <[email protected]> who added this. I don't think we have tested this specific option but I believe additional BQ parameters option was added in a generic way to accept all additional parameters.
Looking at the code, seems like additional parameters do get passed through to load jobs: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L427 One thing you can try out is trying to run a BQ load job directly with the same set of data and options to see if the data gets loaded. Thanks, Cham On Tue, Sep 3, 2019 at 2:24 PM Zdenko Hrcek <[email protected]> wrote: > Greetings, > > I am using Beam 2.15 and Python 2.7. > I am doing a batch job to load data from CSV and upload to BigQuery. I > like functionality that instead of streaming to BigQuery I can use "file > load", to load table all at once. > > For my case, there are few "bad" records in the input (it's geo data and > during manual upload, BigQuery doesn't accept those as valid geography > records. this is easily solved by setting the number of max bad records. > If I understand correctly, WriteToBigQuery supports > "additional_bq_parameters", but for some reason when running a pipeline on > Dataflow runner it looks like those settings are ignored. > > I played with an example from the documentation > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py > with > gist file https://gist.github.com/zdenulo/99877307981b4d372df5a662d581a5df > where the table should be created on the partitioned field and clustered, > but when running on Dataflow it doesn't happen. > When I run on DirectRunner it works as expected. interestingly, when I add > maxBadRecords parameter to additional_bq_parameters, DirectRunner complains > that it doesn't recognize that option. > > This is the first time using this setup/combination so I'm just wondering > if I overlooked something. I would appreciate any help. > > Best regards, > Zdenko > > > _______________________ > http://www.the-swamp.info > >
