Gene Peters created BEAM-4835: --------------------------------- Summary: Add more flexible options for data loading to BigQueryIO.Write Key: BEAM-4835 URL: https://issues.apache.org/jira/browse/BEAM-4835 Project: Beam Issue Type: Improvement Components: io-java-gcp Reporter: Gene Peters Assignee: Chamikara Jayalath
As part of the BigQuery API, there are a few options exposed to end-users which allow for more flexible data loading. For both [streaming|https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableDataInsertAllRequest.html#setIgnoreUnknownValues-java.lang.Boolean-] and [batch|https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setIgnoreUnknownValues-java.lang.Boolean-] inserts, the flag "ignoreUnknownValues" can be set, which indicates if BigQuery should accept rows that contain values that do not match the schema. [In addition,|https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableDataInsertAllRequest.html#setSkipInvalidRows-java.lang.Boolean-] streaming inserts allow for the option of accepting an inserted batch of rows even if some of of the rows are invalid. I've made the necessary code changes to make this available within BigQueryIO.Write and will be attaching the pull request to this ticket for review. Both flags are off by default. Let me know if you have any questions or feedback about this! -- This message was sent by Atlassian JIRA (v7.6.3#76005)