Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Derek Hao Hu
Thanks Aleksandr and Lukasz! These are really helpful. I'll take a look at the BigQueryIO and see if it's easy to implement something similar in our repo. :) But at the minimum we can always copy and paste DataStoreV1. :p Cheers, Derek On Mon, Oct 16, 2017 at 11:21 AM, Lukasz Cwik wrote: > Yo

Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Lukasz Cwik
Your idea makes total sense and has been brought up before mirroring the concept of a dead letter queue [1]. Your best bet would be to copy and modify the Datstore code in Apache Beam and add support for such a policy which outputs failed inserts into something like a dead letter queue. You can us

Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Aleksandr
Hello, Take a look flushBatch https://github.com/apache/beam/blob/master/sdks/java/io/ google-cloud-platform/src/main/java/org/apache/beam/sdk/ io/gcp/datastore/DatastoreV1.java You can create your own implementation based on that code (do not throw exception there) 16. okt 2017 8:48 PM kirjutas

Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Derek Hao Hu
I see. Thanks Lukasz. In that case, do you think there is an easy / clean way to implement the behavior I explained: "fail after a certain number of retries and then write the failed data to an external datasource"? I'm not sure using the [google-cloud-java]( https://googlecloudplatform.github.io/

Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Lukasz Cwik
That source is not available to you as it is part of the Dataflow service. On Mon, Oct 16, 2017 at 10:25 AM, Derek Hao Hu wrote: > Thanks Lukasz! > > "For an unbounded (streaming) pipeline, Dataflow will retry the bundle > forever until the pipeline is cancelled by the user." > > Can you help po

Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Derek Hao Hu
Thanks Lukasz! "For an unbounded (streaming) pipeline, Dataflow will retry the bundle forever until the pipeline is cancelled by the user." Can you help point out where this behavior is implemented? I'd like to take a look and see if it is possible to modify it a bit (e.g. write this to an extern

Re: How to catch exceptions while using DatastoreV1 API

2017-10-16 Thread Lukasz Cwik
It depends on the runner but that exception that is thrown is per bundle processed and it is up to the runner to choose what to do with bundles that fail. For a bounded (batch) pipeline, Dataflow will fail the pipeline after a fixed number of retries of each bundle. For an unbounded (streaming) pi

How to catch exceptions while using DatastoreV1 API

2017-10-15 Thread Derek Hao Hu
Hi, ​I'm using DatastoreV1 API to write data into Datastore. I've briefly gone through the implementation and it seems the Write transform will throw a DatastoreException ( https://github.com/apache/beam/blob/1bd17d1b95a6b27331626fa9bdbaa723969b710d/sdks/java/io/google-cloud-platform/src/main/java