Thanks a lot Jeff On Mon, Jun 1, 2020 at 11:17 AM Jeff Klukas <jklu...@mozilla.com> wrote:
> Correct. If you ignore the error on the single element, the corresponding > PubSub message will be ack'd just like everything else in the bundle. > PubsubIO provides no handle for preventing acks per-message. > > In practice, if you have some messages that cause errors that are not > retryable, you may want to split those to a separate collection that > publishes to a separate "error" topic that you can inspect and handle > separately. See [0] for an intro to some basic API affordances that Beam > provides for exception handling. > > [0] > https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/MapElements.html#exceptionsVia-org.apache.beam.sdk.transforms.InferableFunction- > > > On Mon, Jun 1, 2020 at 1:56 PM KV 59 <kvajjal...@gmail.com> wrote: > >> Hi Jeff, >> >> Thanks for the response. Yes I have a Java pipeline and yes it is a >> simple transformation. While DoFns work on bundles and if a single element >> in the bundle fails and we ignore the error on the single element, then the >> bundle is considered still successfully processed am I correct? Then it >> would just ACK everything in the bundle >> >> Kishore >> >> On Mon, Jun 1, 2020 at 10:27 AM Jeff Klukas <jklu...@mozilla.com> wrote: >> >>> Is this a Python or Java pipeline? >>> >>> I'm familiar with PubsubIO in Java, though I expect the behavior in >>> Python is similar. It will ack messages at the first checkpoint step in the >>> pipeline, so the behavior in your case depends on whether there is a >>> GroupByKey operation happening before the exception is thrown. >>> >>> If there are no GroupByKey operations (so your pipeline is basically >>> just transforming single messages and republishing to a new topic), then I >>> would expect you are safe and messages will not be ACK'd unless they have >>> been published to the output topic. >>> >>> And yes, if an exception is thrown the whole bundle would fail, so those >>> messages would be picked up by another worker and retried. >>> >>> There is a chance of data loss if you have a pipeline that needs to >>> checkpoint data to disk and you shut down the pipeline without draining the >>> data. In that case, messages may have been ack'd to pubsub and held durably >>> only in the checkpoint state in Dataflow, so shutting down the pipeline >>> uncleanly would lose data. >>> >>> On Mon, Jun 1, 2020 at 1:09 PM KV 59 <kvajjal...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline >>>> transforms the data and writes to another PubSub topic. I have a question >>>> regarding exceptions in DoFns. If I chose to ignore an exception processing >>>> an element, does it ACK the bundle? >>>> >>>> Also if I were to just throw the exception, my understanding is the >>>> Dataflow Runner will fail the whole bundle and keeps retrying until the >>>> whole bundle is successful am I correct? >>>> >>>> Thanks for your response >>>> >>>