Hi all, I’m working on using Flink to do a variety of streaming jobs that will be processing very high-volume streams. I want to be able to update a job’s software with an absolute minimum impact on the processing of the data. What I don’t understand the best way to update the software running the job. From what I gather, the way it works today is that I would have to shut down the first job, ensuring that it properly checkpoints, and then start up a new job. My concern is that this may take a relatively long time and cause problems with SLAs I may have with my users.
Does Flink provide more nuanced ways of upgrading a job’s software? Are there folks out there that are working with this sort of problem, either within Flink or around it? Thank you for any help, thoughts, etc. you may have. -Bruce