Thanks a lot Jeff

On Mon, Jun 1, 2020 at 11:17 AM Jeff Klukas <jklu...@mozilla.com> wrote:

> Correct. If you ignore the error on the single element, the corresponding
> PubSub message will be ack'd just like everything else in the bundle.
> PubsubIO provides no handle for preventing acks per-message.
>
> In practice, if you have some messages that cause errors that are not
> retryable, you may want to split those to a separate collection that
> publishes to a separate "error" topic that you can inspect and handle
> separately. See [0] for an intro to some basic API affordances that Beam
> provides for exception handling.
>
> [0]
> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/MapElements.html#exceptionsVia-org.apache.beam.sdk.transforms.InferableFunction-
>
>
> On Mon, Jun 1, 2020 at 1:56 PM KV 59 <kvajjal...@gmail.com> wrote:
>
>> Hi Jeff,
>>
>> Thanks for the response. Yes I have a Java pipeline and yes it is a
>> simple transformation. While DoFns work on bundles and if a single element
>> in the bundle fails and we ignore the error on the single element, then the
>> bundle is considered still successfully processed am I correct? Then it
>> would just ACK everything in the bundle
>>
>> Kishore
>>
>> On Mon, Jun 1, 2020 at 10:27 AM Jeff Klukas <jklu...@mozilla.com> wrote:
>>
>>> Is this a Python or Java pipeline?
>>>
>>> I'm familiar with PubsubIO in Java, though I expect the behavior in
>>> Python is similar. It will ack messages at the first checkpoint step in the
>>> pipeline, so the behavior in your case depends on whether there is a
>>> GroupByKey operation happening before the exception is thrown.
>>>
>>> If there are no GroupByKey operations (so your pipeline is basically
>>> just transforming single messages and republishing to a new topic), then I
>>> would expect you are safe and messages will not be ACK'd unless they have
>>> been published to the output topic.
>>>
>>> And yes, if an exception is thrown the whole bundle would fail, so those
>>> messages would be picked up by another worker and retried.
>>>
>>> There is a chance of data loss if you have a pipeline that needs to
>>> checkpoint data to disk and you shut down the pipeline without draining the
>>> data. In that case, messages may have been ack'd to pubsub and held durably
>>> only in the checkpoint state in Dataflow, so shutting down the pipeline
>>> uncleanly would lose data.
>>>
>>> On Mon, Jun 1, 2020 at 1:09 PM KV 59 <kvajjal...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline
>>>> transforms the data and writes to another PubSub topic. I have a question
>>>> regarding exceptions in DoFns. If I chose to ignore an exception processing
>>>> an element, does it ACK the bundle?
>>>>
>>>> Also if I were to just throw the exception, my understanding is the
>>>> Dataflow Runner will fail the whole bundle and keeps retrying until the
>>>> whole bundle is successful am I correct?
>>>>
>>>> Thanks for your response
>>>>
>>>

Reply via email to