Thanks Aleksandr and Lukasz! These are really helpful. I'll take a look at
the BigQueryIO and see if it's easy to implement something similar in our
repo. :)
But at the minimum we can always copy and paste DataStoreV1. :p
Cheers,
Derek
On Mon, Oct 16, 2017 at 11:21 AM, Lukasz Cwik wrote:
> Yo
Your idea makes total sense and has been brought up before mirroring the
concept of a dead letter queue [1].
Your best bet would be to copy and modify the Datstore code in Apache Beam
and add support for such a policy which outputs failed inserts into
something like a dead letter queue. You can us
Hello,
Take a look flushBatch
https://github.com/apache/beam/blob/master/sdks/java/io/
google-cloud-platform/src/main/java/org/apache/beam/sdk/
io/gcp/datastore/DatastoreV1.java
You can create your own implementation based on that code (do not throw
exception there)
16. okt 2017 8:48 PM kirjutas
I see. Thanks Lukasz.
In that case, do you think there is an easy / clean way to implement the
behavior I explained: "fail after a certain number of retries and then
write the failed data to an external datasource"? I'm not sure using the
[google-cloud-java](
https://googlecloudplatform.github.io/
That source is not available to you as it is part of the Dataflow service.
On Mon, Oct 16, 2017 at 10:25 AM, Derek Hao Hu
wrote:
> Thanks Lukasz!
>
> "For an unbounded (streaming) pipeline, Dataflow will retry the bundle
> forever until the pipeline is cancelled by the user."
>
> Can you help po
Thanks Lukasz!
"For an unbounded (streaming) pipeline, Dataflow will retry the bundle
forever until the pipeline is cancelled by the user."
Can you help point out where this behavior is implemented? I'd like to take
a look and see if it is possible to modify it a bit (e.g. write this to an
extern
It depends on the runner but that exception that is thrown is per bundle
processed and it is up to the runner to choose what to do with bundles that
fail.
For a bounded (batch) pipeline, Dataflow will fail the pipeline after a
fixed number of retries of each bundle.
For an unbounded (streaming) pi
Hi,
I'm using DatastoreV1 API to write data into Datastore. I've briefly gone
through the implementation and it seems the Write transform will throw a
DatastoreException (
https://github.com/apache/beam/blob/1bd17d1b95a6b27331626fa9bdbaa723969b710d/sdks/java/io/google-cloud-platform/src/main/java