Hi Barisa, it seems that there is no immediate answer to your concrete question here, so I wanted to ask you back a more general question: did you consider using the Community Edition of Ververica Platform for your purposes [1] <https://www.ververica.com/blog/announcing-ververica-platform-community-edition>? It comes with a complete lifecycle management for Flink jobs on K8S. It also exposes a full REST API for integrating into CI/CD workflows, so if you do not need the UI, you can just ignore it. Community Edition is permanently free for commercial use at any scale.
I see that you are already using Helm, so installation could be very straightforward [2] <https://www.ververica.com/getting-started>. Here is the documentation with a bit more comprehensive "Getting started" guide [3] <https://docs.ververica.com/getting_started/index.html>. [1] https://www.ververica.com/blog/announcing-ververica-platform-community-edition [2] https://www.ververica.com/getting-started [3] https://docs.ververica.com/getting_started/index.html Best regards, -- Alexander Fedulov | Solutions Architect +49 1514 6265796 <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Tony) Cheng On Wed, Apr 29, 2020 at 5:32 PM Barisa Obradovic <bbaj...@gmail.com> wrote: > Hi, we are attempting to migrate our flink cluster to K8, and are looking > into options how to automate job upgrades; wondering if anyone here has > done > it with init container? Or if there is a simpler way? > > 0: So, let's assume we have a job manager with few task managers running, > in > a stateful set; managed with helm. > > 1: New helm chart is published, and helm attempts the upgrade. > Since it's a stateful set, new version of job manager and taskmanager is > started even while old one is still running. > 2: In the job manager pod, there is an init container, whose purpose it to > find currently running job manager with previous version of JOB ( either > via > zookeeper or kubernetes service which points to currently running job > manager). After it finds it, it runs cancel with savepoint using flink CLI, > and passes the savepoint URL via volume to main container. > 3: job manager container starts, it finds the savepoint, and restores the > new version of job, with the state from savepoint. > 4: new pods are passing healthchecks, so old pods are destroyed by > kubernetes. > > > What happens if there is no previous job manager running? init container > sees that, and just exits without any other work. > > > > > > Caveat: > Most of solutions I noticed were using operators, which feel quite a bit > more complex, yet since I haven't found any solution using init container, > I'm guessing I'm missing something, just can't figure out what? > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >