Re: Python WriteToBigQuery with FILE_LOAD & additional_bq_parameters not working

Chamikara Jayalath Wed, 04 Sep 2019 09:56:10 -0700

+Pablo Estrada <[email protected]> who added this.

I don't think we have tested this specific option but I believe additional
BQ parameters option was added in a generic way to accept all additional
parameters.


Looking at the code, seems like additional parameters do get passed through
to load jobs:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L427

One thing you can try out is trying to run a BQ load job directly with the
same set of data and options to see if the data gets loaded.

Thanks,
Cham

On Tue, Sep 3, 2019 at 2:24 PM Zdenko Hrcek <[email protected]> wrote:

> Greetings,
>
> I am using Beam 2.15 and Python 2.7.
> I am doing a batch job to load data from CSV and upload to BigQuery. I
> like functionality that instead of streaming to BigQuery I can use "file
> load", to load table all at once.
>
> For my case, there are few "bad" records in the input (it's geo data and
> during manual upload, BigQuery doesn't accept those as valid geography
> records. this is easily solved by setting the number of max bad records.
> If I understand correctly, WriteToBigQuery supports
> "additional_bq_parameters", but for some reason when running a pipeline on
> Dataflow runner it looks like those settings are ignored.
>
> I played with an example from the documentation
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
>  with
> gist file https://gist.github.com/zdenulo/99877307981b4d372df5a662d581a5df
> where the table should be created on the partitioned field and clustered,
> but when running on Dataflow it doesn't happen.
> When I run on DirectRunner it works as expected. interestingly, when I add
> maxBadRecords parameter to additional_bq_parameters, DirectRunner complains
> that it doesn't recognize that option.
>
> This is the first time using this setup/combination so I'm just wondering
> if I overlooked something. I would appreciate any help.
>
> Best regards,
> Zdenko
>
>
> _______________________
>  http://www.the-swamp.info
>
>

Re: Python WriteToBigQuery with FILE_LOAD & additional_bq_parameters not working

Reply via email to