Graceful shutdown of long-running Beam pipeline on Flink

Wayne Collins Sun, 02 Dec 2018 15:46:18 -0800

Hi all,
We have a number of Beam pipelines processing unbounded streams sourced
from Kafka on the Flink runner and are very happy with both the platform
and performance!


The problem is with shutting down the pipelines...for version upgrades,
system maintenance, load management, etc. it would be nice to be able to
gracefully shut these down under software control but haven't been able
to find a way to do so. We're in good shape on checkpointing and then
cleanly recovering but shutdowns are all destructive to Flink or the
Flink TaskManager.

Methods tried:

1) Calling cancel on FlinkRunnerResult returned from pipeline.run()
This would be our preferred method but p.run() doesn't return until
termination and even if it did, the runner code simply throws:
"throw new UnsupportedOperationException("FlinkRunnerResult does not
support cancel.");"
so this doesn't appear to be a near-term option.

2) Inject a "termination" message into the pipeline via Kafka
This does get through, but calling exit() from a stage in the pipeline
also terminates the Flink TaskManager.

3) Inject a "sleep" message, then manually restart the cluster
This is our current method: we pause the data at the source, flood all
branches of the pipeline with a "we're going down" msg so the stages can
do a bit of housekeeping, then hard-stop the entire environment and
re-launch with the new version.

Is there a "Best Practice" method for gracefully terminating an
unbounded pipeline from within the pipeline or from the mainline that
launches it?

Thanks!
Wayne

-- 
Wayne Collins
dades.ca Inc.
mailto:[email protected]
cell:416-898-5137

Graceful shutdown of long-running Beam pipeline on Flink

Reply via email to