Hi, If you are familiar with BiqQuery insert retry policies in Apache Beam API (BigQueryIO), please help me understand the following behavior. I am using Dataflow runner.
- How Dataflow job behave if I specify retryTransientErrors? - shouldRetry provides an error from BigQuery and I can decide if I should retry. Where can I find expected error from BigQuery? *BiqQuery insert retry policies* https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.html - alwaysRetry - Always retry all failures. - neverRetry - Never retry any failures. - retryTransientErrors - Retry all failures except for known persistent errors. - shouldRetry - Return true if this failure should be retried. *Background* - When my Cloud Dataflow job inserting very old timestamp (more than 1 year before from now) to BigQuery, I got the following error. - Retry did not stop so I added retryTransientErrors to BigQueryIO.Write step then the retry stopped. jsonPayload: { > exception: "java.lang.RuntimeException: java.io.IOException: Insert > failed: > [{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for > field > timestamp_scanned of the destination table > fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the > allowed bounds. > You can only stream to date range within 365 days in the past and 183 days > in > the future relative to the current date.","reason":"invalid"}], > After the first error, Dataflow try to retry insert and it always rejected > from BigQuery with the same error. I also posted the same question here https://stackoverflow.com/questions/57403980/biqquery-insert-retry-policy-in-apache-beam Yohei Onishi
