Hello Thanks for the confirmation that we can use the Big Query Storage Write API.
I was able to get it working but noticed some things. 1. You need to specify withTriggeringFrequency. The compiler complains if not specified. The Storage Write API also supports streaming. Is there a reason for not supporting this within Beam? This API will replace the legacy Streaming API, does this mean low latency inserts to BigQuery will no longer be supported from Beam? 2. Doesn’t work with Dataflow Prime. I get an "Error processing pipeline.” failure when launching with Dataflow Prime. Note that Prime is currently still in Preview. 3. You have to specify withNumStorageWriteApiStreams. When not specified, you get a run time error. Even though the method documentation mentions withAutoSharding() is an alternative. You get a compiler message that it is not yet supported with STORAGE_WRITE_API method. 4. The documentation hasn’t been updated everywhere to include the new write methods. For example for withAutoSharding(). I also noticed another mail last week on user mail list regarding some weird messages with the storage write api. It seems the new BigQuery insert method is not yet ready for production use. Rgds Bruno > On 25 Jan 2022, at 22:46, Reuven Lax <[email protected]> wrote: > > Yes, you can use the storage write API from Java. Native Python support has > not yet been implemented. > > On Mon, Jan 24, 2022 at 1:47 AM Bruno Quinart <[email protected] > <mailto:[email protected]>> wrote: > Hello > > The BigQuery Storage Write API is GA since October 2021 (docs at [1]). > BEAM-11648 was created to adapt the Beam BigQuery sink. > > That Jira issue is still marked as open. However, it seems that the > functionality has already been added. > The Javadoc has the STORAGE_WRITE_API method added since release 2.29.0 (see > [2]). > The latest release (2.35.0), removed the notion that it is an experimental > API (at BigQuery side) and also added STORAGE_API_AT_LEAST_ONCE method [3]. > > However the Beam documentation at [4] does not mention the Storage Write API > option at all. > > Can we consider this development done and start using these features? > > What would be the best approach for a Python pipeline? > I found BEAM-10917 for the Storage Read API with Python SDK. That Jira is > also open, but again seems functionality has been added. > As from release 2.34.0 I see that it is possible to provide method = > DIRECT_READ to use the Storage Read API [5] (always in Arvo it seems, not > clear how you could use Arrow). > But didn’t find anything for the Storage Write API. > > Is it better (and even possible) to use the Java BigQuery sink using the > multi language features? > > Apologies for these annoying questions. As a dataflow user, I am a bit lost > to understand what is the reference (Google docs are limited and refer to > Beam, but seems the docs lag a bit versus the code). > > Thanks a lot! > Bruno > > [1] https://cloud.google.com/bigquery/docs/write-api > <https://cloud.google.com/bigquery/docs/write-api> > [2] > https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html > > <https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html> > [3] > https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html > > <https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html> > [4] > https://beam.apache.org/documentation/io/built-in/google-bigquery/#writing-to-bigquery > > <https://beam.apache.org/documentation/io/built-in/google-bigquery/#writing-to-bigquery> > [5] > https://beam.apache.org/releases/pydoc/2.34.0/apache_beam.io.gcp.bigquery.html > > <https://beam.apache.org/releases/pydoc/2.34.0/apache_beam.io.gcp.bigquery.html> >
