Hi Pedro,

draining basically means that all of the sources will finish and progress
their watermark to end of the global window, which will fire all of the
triggers as a result. In other words, it will trigger the _ON_TIME_ results
from all of the unfinished windows, even though they might not have seen a
complete input.

This behavior doesn't really play well with the pipeline upgrades, because
you've already emitted "wrong" results from the pipeline.

Best,
D.


On Mon, Nov 8, 2021 at 3:19 PM Pedro Facal <ped...@empathy.co> wrote:

> Hello,
>
>    We have an apache beam streaming application, running under flink
> native kubernetes. It consolidates aws kinesis records into parquet files
> every few minutes.
>
>   To manage the lifecycle of this app, we use the rest api to stop the job
> with a savepoint and then restart the cluster/job from said savepoint. This
> normally works as expected, but we run into problems when the data schema
> changes. So far so good, since, as expected, even if the schema changes,
> stopping the job using "drain:true" results in a proper upgrade without
> issues.
>
>    To avoid over complicating our release workflows, we are evaluating the
> possibility of doing a "drain" restart every time we do a new release.
> However, we have come across the following:
>
> > Use the --drain flag if you want to terminate the job permanently. If
> you want to resume the job at a later point in time, then do not drain the
> pipeline because it could lead to incorrect results when the job is resumed
> [1].
>
> It's not clear what kind of "incorrect results" we could face here - can
> anybody elaborate?  Our own tests show that we do not lose events from the
> kinesis queue after the restart.
>
> Thanks,
>
>     Pedro
>
> [1](
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#terminating-a-job
> )
>
> --
> *Pedro Facal San Luis* <ped...@empathy.co>
> Data Team Lead
> [image: Empathy Logo]
> Privacy Policy <https://www.empathy.co/privacy-policy/>
>

Reply via email to