Thanks Radek! I didn't realize that writing is done with a copy job - then I understand why we need to configure partitioning as well. And that all makes sense.
We haven't tried the storage write API - that wasn't available for Python yet when we started doing this. I will take a look at it and see if that might work for us and get back to you. Would using the storage write API allow us to write to a partitioned table without specifying so? I should mention as well that streaming the data would be out of the question, since we need to use mutating DML statements as well. Our call looks like this: "Write to BigQuery" >> WriteToBigQuery( project=self.project_id, table=self.get_table_name, dataset=self.dataset, schema=self.get_schema, schema_side_inputs=(beam.pvalue.AsDict(types_for_schema),), create_disposition=BigQueryDisposition.CREATE_IF_NEEDED, write_disposition=( BigQueryDisposition.WRITE_TRUNCATE if self.overwrite else BigQueryDisposition.WRITE_APPEND ), additional_bq_parameters={ "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION"], } ) On Wed, Mar 19, 2025 at 3:12 AM Radek Stankiewicz <radosl...@google.com> wrote: > hi Lina, > there are multiple reasons why copy job is used with temporary table; > - you may be using dynamic destinations > - you are loading lots of data, probably with truncate > This way we ensure atomicity as we can trigger copy from multiple temp > tables into one final table. > Can you confirm or paste a snippet how you configured > pache_beam.io.gcp.bigquery.WriteToBigQuery ? > > BigQuery doesn't allow copying non partitioned tables into partitioned > tables - it's a BQ limitation. > > have you tried other loading methods .e.g storage write api? > > Radek > > > > > > > > On Wed, Mar 19, 2025 at 5:42 AM Lina Mårtensson via user < > user@beam.apache.org> wrote: > >> Hi, >> >> We have, by now, a large set of different Beam jobs all written in Python >> that all write to a set of BigQuery tables that more or less behave the >> same way in a single dataset. These tables aren't partitioned at all, but >> going forward, we need them to be. >> >> I partitioned a single table to start with, and was very surprised to >> find that a Beam job that wrote to it couldn't do so: >> >> Failed to copy Non partitioned table to Column partitioned table: not >> supported. >> >> >> We have a bunch of pre-created tables, and I would've thought I could >> just keep writing to those without changing settings even after setting up >> partitioning on them. It doesn't seem to matter whether the >> create_disposition is CREATE_IF_NEEDED or CREATE_NEVER. >> It does work when I set the additional_bq_parameters to add >> time_partitioning, but it would be a huge undertaking not only to update >> all of our currently running jobs across many projects, but also to make >> sure to synchronize these changes with updating the underlying BigQuery >> tables. And it doesn't seem like it should be necessary to specify if we're >> not creating any new tables? >> >> Is there any way to just write the data we have to pre-created, >> partitioned tables without having to set time_partitioning in >> additional_bq_parameters, or potentially if there's some other >> recommended way to solve this problem? >> >> Thanks, >> -Lina >> >