hi all--we've run into a gap (knowledge? design? tbd?) for our use cases
when deploying Flink jobs to start from savepoints using the job-cluster
mode in Kubernetes.

we're running a ~15 different jobs, all in job-cluster mode, using a mix of
Flink 1.8.1 and 1.9.0, under GKE (Google Kubernetes Engine). these are all
long-running streaming jobs, all essentially acting as microservices. we're
using Helm charts to configure all of our deployments.

we have a number of use cases where we want to restart jobs from a
savepoint to replay recent events, i.e. when we've enhanced the job logic
or fixed a bug. but after the deployment we want to have the job resume
it's "long-running" behavior, where any unplanned restarts resume from the
latest checkpoint.

the issue we run into is that any obvious/standard/idiomatic Kubernetes
deployment includes the savepoint argument in the configuration. if the Job
Manager container(s) have an unplanned restart, when they come back up they
will start from the savepoint instead of resuming from the latest
checkpoint. everything is working as configured, but that's not exactly
what we want. we want the savepoint argument to be transient somehow (only
used during the initial deployment), but Kubernetes doesn't really support
the concept of transient configuration.

i can see a couple of potential solutions that either involve custom code
in the jobs or custom logic in the container (i.e. a custom entrypoint
script that records that the configured savepoint has already been used in
a file on a persistent volume or GCS, and potentially when/why/by which
deployment). but these seem like unexpected and hacky solutions. before we
head down that road i wanted to ask:

   - is this is already a solved problem that i've missed?
   - is this issue already on the community's radar?

thanks in advance!

-- 
*Sean Hester* | Senior Staff Software Engineer | m. 404-828-0865
3525 Piedmont Rd. NE, Building 6, Suite 500, Atlanta, GA 30305
<http://www.bettercloud.com> <http://www.bettercloud.com>
*Altitude 2019 in San Francisco | Sept. 23 - 25*
It’s not just an IT conference, it’s “a complete learning and networking
experience”
<https://altitude.bettercloud.com/?utm_source=gmail&utm_medium=signature&utm_campaign=2019-altitude>

Reply via email to