Hi Andrew! Would be great to know if what Aljoscha described works for you. Ideally, this costs no more than a failure/recovery cycle, which one typically also gets with rolling upgrades.
Best, Stephan On Tue, Dec 20, 2016 at 6:27 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > zero-downtime updates are currently not supported. What is supported in > Flink right now is a savepoint-shutdown-restore cycle. With this, you first > draw a savepoint (which is essentially a checkpoint with some meta data), > then you cancel your job, then you do whatever you need to do (update > machines, update Flink, update Job) and restore from the savepoint. > > A possible solution for zero-downtime update would be to do a savepoint, > then start a second Flink job from that savepoint, then shutdown the first > job. With this, your data sinks would need to be able to handle being > written to by 2 jobs at the same time, i.e. writes should probably be > idempotent. > > This is the link to the savepoint doc: https://ci.apache.org/ > projects/flink/flink-docs-release-1.2/setup/savepoints.html > > Does that help? > > Cheers, > Aljoscha > > On Fri, 16 Dec 2016 at 18:16 Andrew Hoblitzell <ahoblitz...@salesforce.com> > wrote: > >> Hi. Does Apache Flink currently have support for zero down time or the = >> ability to do rolling upgrades? >> >> If so, what are concerns to watch for and what best practices might = >> exist? Are there version management and data inconsistency issues to = >> watch for?= >> >