Hey Encho,
The Flink Runner acknowledges messages through PubSubIO's
`CheckpointMark#finalizeCheckpoint()` method.
The Flink Runner wraps the PubSubIO source via the
UnboundedSourceWrapper. When Flink takes a checkpoint of the running
Beam streaming job, the wrapper will retrieve the CheckpointMarks from
the PubSubIO source.
When the Checkpoint is completed, there is a callback which informs the
wrapper (`notifyCheckpointComplete()`) and calls `finalizeCheckpoint()`
on all the generated CheckpointMarks.
Hope that helps debugging your problem. I don't have an explanation why
this doesn't work for the last records in your PubSub queue. It
shouldn't make a difference for how the Flink Runner does checkpointing.
Best,
Max
On 10.09.18 18:17, Encho Mishinev wrote:
Hello,
I am using Flink runner with Apache Beam 2.6.0. I was wondering if there
is information on when exactly the runner acknowledges a pubsub message
when reading from PubsubIO?
My problem is that whenever there are a few messages left in a
subscription my streaming job never really seems to acknowledge them
all. For example is a subscription has 100,000,000 messages in total,
the job will go through about 99,990,000 and then keep reading the last
few thousand and seemingly never acknowledge them.
Some clarity on when the acknowledgement happens in the pipeline might
help me debug this problem.
Thanks!