Re: How to skip processing on failure at BigQueryIO sink?

2017-04-12 Thread Josh
Thanks for the replies, @Lukasz that sounds like a good option. It's just it may be hard to catch and filter out every case that will result in a 4xx error. I just want to avoid the whole pipeline failing in the case of a few elements in the stream being bad. @Dan that sounds promising, I will kee

Re: How to skip processing on failure at BigQueryIO sink?

2017-04-11 Thread Dan Halperin
I believe this is BEAM-190, which is actually being worked on today. However, it will probably not be ready in time for the first stable release. https://issues.apache.org/jira/browse/BEAM-190 On Tue, Apr 11, 2017 at 7:52 AM, Lukasz Cwik wrote: > Have you thought of fetching the schema upfront

Re: How to skip processing on failure at BigQueryIO sink?

2017-04-11 Thread Lukasz Cwik
Have you thought of fetching the schema upfront from BigQuery and prefiltering out any records in a preceeding DoFn instead of relying on BigQuery telling you that the schema doesn't match? Otherwise you are correct in believing that you will need to update BigQueryIO to have the retry/error seman

Re: How to skip processing on failure at BigQueryIO sink?

2017-04-11 Thread Josh
What I really want to do is configure BigQueryIO to log an error and skip the write if it receives a 4xx response from BigQuery (e.g. element does not match table schema). And for other errors (e.g. 5xx) I want it to retry n times with exponential backoff. Is there any way to do this at the moment

How to skip processing on failure at BigQueryIO sink?

2017-04-10 Thread Josh
Hi, I'm using BigQueryIO to write the output of an unbounded streaming job to BigQuery. In the case that an element in the stream cannot be written to BigQuery, the BigQueryIO seems to have some default retry logic which retries the write a few times. However, if the write fails repeatedly, it se