Hi Thiago, Note that Dataflow has a custom implementation of PubSub interaction, so the code you see in PubsubIO in the Beam codebase does not necessarily reflect Pubsub handling in Dataflow.
Dataflow acks messages as soon as they are first checkpointed, so the first step in your pipeline that introduces a GroupByKey operation will be the point at which messages get acked. This means that the checkpointed state in a streaming Dataflow job reading from PubSub is significant. A job that is cancelled without draining could lead to messages acked in the subscription that were never fully processed by your pipeline. On Mon, Oct 5, 2020 at 1:28 PM Thiago Chiarato <[email protected]> wrote: > Hi, > I'm trying to discover how and when Dataflow acknowledges an inflight > message from PubSub. > Could you please help me with where I should start investigating this > behavior of DataflowRunner and PubSub? > > Best regards, > Thiago >
