Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-21 Thread Gyula Fóra
"One customization we did was to have the job-submitter pod search for the latest checkpoint or savepoint in S3 and then submit this information with the Flink job to the Flink cluster" I am aware that the Google operator does not support redeploying from last checkpoint it always uses savepoint

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-21 Thread Tony Chen
For context, we have forked the GoogleCloudPlatform operator ( https://github.com/GoogleCloudPlatform/flink-on-k8s-operator), and we have customized it a bit to fit our use cases here. One customization we did was to have the job-submitter pod search for the latest checkpoint or savepoint in S3

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
Hey! Please help us understand why you need to delete and recreate the FlinkDeployment objects in your ecosystem. Maybe we can help suggest some alternative to make your life easier :) Of course every prod ecosystem is unique in its own way and larger platforms generally have a layer on top of

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
Hi Gyula, Got it. Our use case might be unique to our own ecosystem here at Robinhood, so I will have to look into creating a service that can search for the latest savepoint / checkpoint in S3 and provide that to the FlinkDeployment resource. Will the Flink Community be okay with us adding this

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
Hi! I don’t understand why you need to delete the deployment to restart. You can suspend, use the restartNonce or simply upgrade . These should cover most upgrade/restart scenarios. Like with other resources in Kubernetes once you delete them the status is gone, so the FlinkDeployment won’t keep

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
Hi Gyula, Thank you for responding so quickly. I went through the page you sent me a bit more, and I see the following ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.4/docs/custom-resource/job-management/#running-suspending-and-deleting-applications ): Deleting a

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
Hey Tony, Please see: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/job-management/#stateful-and-stateless-application-upgrades The operator is made especially to handle stateful application upgrades robustly. In general any spec change that you make

Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
Hi Flink Community, My name is Tony Chen, and I am a software engineer at Robinhood. I have some questions on restarting a Flink Application from a savepoint or checkpoint. We currently store our checkpoints and savepoints in S3, and we would like to use the Apache Flink Kubernetes Operator to