Re: Hot update in dataflow without lossing messages

Juan Romero Mon, 15 Apr 2024 20:15:36 -0700

The deployment in the job is made by terraform. I verified and seems that
terraform do it incorrectly under the hood because it stop the current job
and starts and new one. Thanks for the information !


On Mon, 15 Apr 2024 at 6:42 PM Robert Bradshaw via user <
user@beam.apache.org> wrote:

> Are you draining[1] your pipeline or simply canceling it and starting a
> new one? Draining should close open windows and attempt to flush all
> in-flight data before shutting down. For PubSub you may also need to read
> from subscriptions rather than topics to ensure messages are processed by
> either one or the other.
>
> [1]
> https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline#drain
>
> On Mon, Apr 15, 2024 at 9:33 AM Juan Romero <jsrf...@gmail.com> wrote:
>
>> Hi guys. Good morning.
>>
>> I haven't done some test in apache beam over data flow in order to see if
>> i can do an hot update or hot swap meanwhile the pipeline is processing a
>> bunch of messages that fall in a time window of 10 minutes. What I saw is
>> that when I do a hot update over the pipeline and currently there are some
>> messages in the time window (before sending them to the target), the
>> current job is shutdown and dataflow creates a new one. The problem is that
>> it seems that I am losing the messages that were being processed in the old
>> one and they are not taken by the new one, which imply we are incurring in
>> losing data .
>>
>> Can you help me or recommend any strategy to me?
>>
>> Thanks!!
>>
>

Re: Hot update in dataflow without lossing messages

Reply via email to