Hi!
I would consider using the Flink Kubernetes Operator, which already avoids
scenarios like this.
One possible workaround that we use there is when we expect it to restart
from HA metadata we still set execution.savepoint.path to a DUMMY path, so
that if this issue happens it won't start withou
We're considering avoiding point 4 above (The HA configmap containing the
checkpoint reference is cleaned up) by patching our deployment of Flink to
never have flink-kubernetes clean up job configmaps, by commenting out the
contents of KubernetesLeaderElectionHaServices#internalCleanupJobData. Are