Re: Restarting a job with drain flag set to true

David Morávek Mon, 08 Nov 2021 09:35:07 -0800

It's not a recommended approach, but if you're able to handle this
"side-effect" downstream (eg. ignore the "wrong / incomplete" results),
then you should be OK


On Mon, Nov 8, 2021 at 4:36 PM Pedro Facal <ped...@empathy.co> wrote:

> Hi David,
>
>   Thanks for the quick response. Say our app writes to disk every 10
> minutes: the difference is, in one case a single window is emitted and thus
> a single file is written, while if we drain and restart the pipeline we
> will end up with two files (because we will have two windows emitted for
> the 10 minutes period when the drain happened).
>
>    In our particular case this is ok - everything consuming the files
> downstream will just use the two "wrong" (ie partial) files transparently.
> So if this is the extent of the drain side-effects, I think it's safe for
> us to do this for pipeline upgrades. Or am I missing something?
>
> Thanks,
>
>     Pedro
>
>
> On Mon, 8 Nov 2021 at 15:58, David Morávek <d...@apache.org> wrote:
>
>> Hi Pedro,
>>
>> draining basically means that all of the sources will finish and progress
>> their watermark to end of the global window, which will fire all of the
>> triggers as a result. In other words, it will trigger the _ON_TIME_ results
>> from all of the unfinished windows, even though they might not have seen a
>> complete input.
>>
>> This behavior doesn't really play well with the pipeline upgrades,
>> because you've already emitted "wrong" results from the pipeline.
>>
>> Best,
>> D.
>>
>>
>> On Mon, Nov 8, 2021 at 3:19 PM Pedro Facal <ped...@empathy.co> wrote:
>>
>>> Hello,
>>>
>>>    We have an apache beam streaming application, running under flink
>>> native kubernetes. It consolidates aws kinesis records into parquet files
>>> every few minutes.
>>>
>>>   To manage the lifecycle of this app, we use the rest api to stop the
>>> job with a savepoint and then restart the cluster/job from said savepoint.
>>> This normally works as expected, but we run into problems when the data
>>> schema changes. So far so good, since, as expected, even if the schema
>>> changes, stopping the job using "drain:true" results in a proper upgrade
>>> without issues.
>>>
>>>    To avoid over complicating our release workflows, we are evaluating
>>> the possibility of doing a "drain" restart every time we do a new release.
>>> However, we have come across the following:
>>>
>>> > Use the --drain flag if you want to terminate the job permanently. If
>>> you want to resume the job at a later point in time, then do not drain the
>>> pipeline because it could lead to incorrect results when the job is resumed
>>> [1].
>>>
>>> It's not clear what kind of "incorrect results" we could face here - can
>>> anybody elaborate?  Our own tests show that we do not lose events from the
>>> kinesis queue after the restart.
>>>
>>> Thanks,
>>>
>>>     Pedro
>>>
>>> [1](
>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#terminating-a-job
>>> )
>>>
>>> --
>>> *Pedro Facal San Luis* <ped...@empathy.co>
>>> Data Team Lead
>>> [image: Empathy Logo]
>>> Privacy Policy <https://www.empathy.co/privacy-policy/>
>>>
>>
>
> --
> *Pedro Facal San Luis* <ped...@empathy.co>
> Data Team Lead
> [image: Empathy Logo]
> Privacy Policy <https://www.empathy.co/privacy-policy/>
>

Re: Restarting a job with drain flag set to true

Reply via email to