Re: Writing to partitioned BigQuery tables

Lina Mårtensson via user Wed, 19 Mar 2025 10:13:07 -0700

Thanks Radek!

I didn't realize that writing is done with a copy job - then I understand
why we need to configure partitioning as well. And that all makes sense.


We haven't tried the storage write API - that wasn't available for Python
yet when we started doing this. I will take a look at it and see if that
might work for us and get back to you. Would using the storage write API
allow us to write to a partitioned table without specifying so?
I should mention as well that streaming the data would be out of the
question, since we need to use mutating DML statements as well.

Our call looks like this:
"Write to BigQuery" >> WriteToBigQuery(

    project=self.project_id,

    table=self.get_table_name,

    dataset=self.dataset,
    schema=self.get_schema,

    schema_side_inputs=(beam.pvalue.AsDict(types_for_schema),),

    create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
    write_disposition=(

        BigQueryDisposition.WRITE_TRUNCATE

        if self.overwrite

        else BigQueryDisposition.WRITE_APPEND

    ),
    additional_bq_parameters={
        "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION"],
    }
)

On Wed, Mar 19, 2025 at 3:12 AM Radek Stankiewicz <[email protected]>
wrote:

> hi Lina,
> there are multiple reasons why copy job is used with temporary table;
> - you may be using dynamic destinations
> - you are loading lots of data, probably with truncate
> This way we ensure atomicity as we can trigger copy from multiple temp
> tables into one final table.
> Can you confirm or paste a snippet how you configured
> pache_beam.io.gcp.bigquery.WriteToBigQuery ?
>
> BigQuery doesn't allow copying non partitioned tables into partitioned
> tables - it's a BQ limitation.
>
> have you tried other loading methods .e.g storage write api?
>
> Radek
>
>
>
>
>
>
>
> On Wed, Mar 19, 2025 at 5:42 AM Lina Mårtensson via user <
> [email protected]> wrote:
>
>> Hi,
>>
>> We have, by now, a large set of different Beam jobs all written in Python
>> that all write to a set of BigQuery tables that more or less behave the
>> same way in a single dataset. These tables aren't partitioned at all, but
>> going forward, we need them to be.
>>
>> I partitioned a single table to start with, and was very surprised to
>> find that a Beam job that wrote to it couldn't do so:
>>
>> Failed to copy Non partitioned table to Column partitioned table: not
>> supported.
>>
>>
>> We have a bunch of pre-created tables, and I would've thought I could
>> just keep writing to those without changing settings even after setting up
>> partitioning on them. It doesn't seem to matter whether the
>> create_disposition is CREATE_IF_NEEDED or CREATE_NEVER.
>> It does work when I set the additional_bq_parameters to add
>> time_partitioning, but it would be a huge undertaking not only to update
>> all of our currently running jobs across many projects, but also to make
>> sure to synchronize these changes with updating the underlying BigQuery
>> tables. And it doesn't seem like it should be necessary to specify if we're
>> not creating any new tables?
>>
>> Is there any way to just write the data we have to pre-created,
>> partitioned tables without having to set time_partitioning in
>> additional_bq_parameters, or potentially if there's some other
>> recommended way to solve this problem?
>>
>> Thanks,
>> -Lina
>>
>

Re: Writing to partitioned BigQuery tables

Reply via email to