Hi, We are trying to use Dataflow in Prod and right now one of our main concerns is this "infinite retry" behavior which might stall the whole pipeline.
Right now for all the DoFns we've implemented ourselves we've added some error handling or exception swallowing mechanism to make sure some bundles can just fail and we log the exceptions. But we are a bit concerned about the other Beam native transforms which we can not easily wrap, e.g. PubSubIO transforms and DatastoreV1 transforms. A few days ago I asked a specific question in this group about how one can catch exception in DatastoreV1 transforms and the recommended approach is to 1) either duplicate the code in the current DatastoreV1 implementation and swallow the exception instead of throwing or 2) Follow the implementation of BigQueryIO to add the ability to support custom retry policy. Both are feasible options but I'm a bit concerned in that doesn't that mean eventually all Beam native transforms need to implement something like 2) if we want to use them in Prod? So in short, I want to know right now what is the recommended approach or workaround to say, hey, just let this bundle fail and we can process the rest of the elements instead of just stall the pipeline? Thanks! -- Derek Hao Hu Software Engineer | Snapchat Snap Inc.