Writing Avro Files to Big Query using Python SDK and Dataflow Runner

Rajnil Guha Sun, 01 Aug 2021 12:06:12 -0700

Hi Beam Users,

Our pipeline is reading avro files from GCS into Dataflow and writing them
into Big Query tables . I am using the WriteToBigQuery transform to write
my Pcoll contents into Big Query.
My avro file contains about 150-200 fields. We have tested our pipeline by
providing the field information for all the fields in the TableSchema
object within the pipeline code. So every time there is a change in schema
or the schema evolves we need to change our pipeline code.
I was wondering if there is any way to provide the BigQuery table schema
information outside the pipeline code and infer into the pipeline from
there as it's much easier to maintain that way.


Note:- We are using the Python SDK to write our pipelines and running on
Dataflow.

Thanks & Regards
Rajnil Guha

Writing Avro Files to Big Query using Python SDK and Dataflow Runner

Reply via email to