Hi,

If you are familiar with BiqQuery insert retry policies in Apache Beam API
(BigQueryIO), please help me understand the following behavior. I am using
Dataflow runner.

   - How Dataflow job behave if I specify retryTransientErrors?
   - shouldRetry provides an error from BigQuery and I can decide if I
   should retry. Where can I find expected error from BigQuery?

*BiqQuery insert retry policies*
https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.html


   - alwaysRetry - Always retry all failures.
   - neverRetry - Never retry any failures.
   - retryTransientErrors - Retry all failures except for known persistent
   errors.
   - shouldRetry - Return true if this failure should be retried.

*Background*

   - When my Cloud Dataflow job inserting very old timestamp (more than 1
   year before from now) to BigQuery, I got the following error.
   - Retry did not stop so I added retryTransientErrors to BigQueryIO.Write
   step then the retry stopped.

 jsonPayload: {
>   exception:  "java.lang.RuntimeException: java.io.IOException: Insert
> failed:
>  [{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for
> field
>  timestamp_scanned of the destination table
> fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the
> allowed bounds.
> You can only stream to date range within 365 days in the past and 183 days
> in
> the future relative to the current date.","reason":"invalid"}],
> After the first error, Dataflow try to retry insert and it always rejected
> from BigQuery with the same error.


I also posted the same question here
https://stackoverflow.com/questions/57403980/biqquery-insert-retry-policy-in-apache-beam

Yohei Onishi

Reply via email to