"One customization we did was to have the job-submitter pod search for the
latest checkpoint or savepoint in S3 and then submit this information with
the Flink job to the Flink cluster"
I am aware that the Google operator does not support redeploying from last
checkpoint it always uses savepoint
For context, we have forked the GoogleCloudPlatform operator (
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator), and we have
customized it a bit to fit our use cases here. One customization we did was
to have the job-submitter pod search for the latest checkpoint or savepoint
in S3
Hey!
Please help us understand why you need to delete and recreate the
FlinkDeployment objects in your ecosystem. Maybe we can help suggest some
alternative to make your life easier :)
Of course every prod ecosystem is unique in its own way and larger
platforms generally have a layer on top of
Hi Gyula,
Got it. Our use case might be unique to our own ecosystem here at
Robinhood, so I will have to look into creating a service that can search
for the latest savepoint / checkpoint in S3 and provide that to the
FlinkDeployment resource.
Will the Flink Community be okay with us adding this
Hi!
I don’t understand why you need to delete the deployment to restart. You
can suspend, use the restartNonce or simply upgrade .
These should cover most upgrade/restart scenarios. Like with other
resources in Kubernetes once you delete them the status is gone, so the
FlinkDeployment won’t keep
Hi Gyula,
Thank you for responding so quickly. I went through the page you sent me a
bit more, and I see the following (
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.4/docs/custom-resource/job-management/#running-suspending-and-deleting-applications
):
Deleting a
Hey Tony,
Please see:
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/job-management/#stateful-and-stateless-application-upgrades
The operator is made especially to handle stateful application upgrades
robustly. In general any spec change that you make
Hi Flink Community,
My name is Tony Chen, and I am a software engineer at Robinhood. I have
some questions on restarting a Flink Application from a savepoint or
checkpoint.
We currently store our checkpoints and savepoints in S3, and we would like
to use the Apache Flink Kubernetes Operator to