Great, glad it was an easy fix :) Thanks for following up!

On Fri, Jul 23, 2021 at 3:54 AM Thms Hmm <> wrote:

> Finally I found the mistake. I put the „—host“ param as one
> argument. I think the savepoint argument was not interpreted correctly or
> ignored. Might be that the „-s“ param was used as value for „—host
>“ and „s3p://…“ as new param and because these are not valid
> arguments they were ignored.
> Not working:
> 23.07.2021 09:19:54.546 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program Arguments:
> ...
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -     --host
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -     -s
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> s3p://bucket/job1/savepoints/savepoint-000000-1234
> ————-
> Working:
> 23.07.2021 09:19:54.546 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program Arguments:
> ...
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -     --host
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -     -s
> 23.07.2021 09:19:54.549 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> s3p://bucket/job1/savepoints/savepoint-000000-1234
> ...
> 23.07.2021 09:37:12.932 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job
> 00000000000000000000000000000000 from savepoint
> s3p://bucket/job1/savepoints/savepoint-000000-1234 ()
> Thanks again for your help.
> Kr Thomas
> Yang Wang <> schrieb am Fr. 23. Juli 2021 um 04:34:
>> Please note that when the job is canceled, the HA data(including the
>> checkpoint pointers) stored in the ConfigMap/ZNode will be deleted.
>> But it is strange that the "-s/--fromSavepoint" does not take effect when
>> redeploying the Flink application. The JobManager logs could help a lot to
>> find the root cause.
>> Best,
>> Yang
>> Austin Cawley-Edwards <> 于2021年7月22日周四 下午11:09写道:
>>> Hey Thomas,
>>> Hmm, I see no reason why you should not be able to update the checkpoint
>>> interval at runtime, and don't believe that information is stored in a
>>> savepoint. Can you share the JobManager logs of the job where this is
>>> ignored?
>>> Thanks,
>>> Austin
>>> On Wed, Jul 21, 2021 at 11:47 AM Thms Hmm <> wrote:
>>>> Hey Austin,
>>>> Thanks for your help.
>>>> I tried to change the checkpoint interval as example. The value for it
>>>> comes from an additional config file and is read and set within main() of
>>>> the job.
>>>> The job is running in Application mode. Basically the same
>>>> configuration as from the official Flink website but instead of running the
>>>> JobManager as job it is created as deployment.
>>>> For the redeployment of the job the REST API is triggered to create a
>>>> savepoint and cancel the job. After completion the deployment is updated
>>>> and the pods are recreated. The -s <latest_savepoint> Is always added as a
>>>> parameter to start the JobManager ( CLI is not involved.
>>>> We have automated these steps. But I tried the steps manually and have the
>>>> same results.
>>>> I also tried to trigger a savepoint, scale the pods down, update the
>>>> start parameter with the recent savepoint and renamed
>>>> ‚kubernetes.cluster-id‘ as well as ‚high-availability.storageDir‘.
>>>> When I trigger a savepoint with cancel, I also see that the HA config
>>>> maps are cleaned up.
>>>> Kr Thomas
>>>> Austin Cawley-Edwards <> schrieb am Mi. 21.
>>>> Juli 2021 um 16:52:
>>>>> Hi Thomas,
>>>>> I've got a few questions that will hopefully help get to find an
>>>>> answer:
>>>>> What job properties are you trying to change? Something like
>>>>> parallelism?
>>>>> What mode is your job running in? i.e., Session, Per-Job, or
>>>>> Application?
>>>>> Can you also describe how you're redeploying the job? Are you using
>>>>> the Native Kubernetes integration or Standalone (i.e. writing k8s  
>>>>> manifest
>>>>> files yourself)? It sounds like you are using the Flink CLI as well, is
>>>>> that correct?
>>>>> Thanks,
>>>>> Austin
>>>>> On Wed, Jul 21, 2021 at 4:05 AM Thms Hmm <> wrote:
>>>>>> Hey,
>>>>>> we have some application clusters running on Kubernetes and explore
>>>>>> the HA mode which is working as expected. When we try to upgrade a job,
>>>>>> e.g. trigger a savepoint, cancel the job and redeploy, Flink is not
>>>>>> restarting from the savepoint we provide using the -s parameter. So all
>>>>>> state is lost.
>>>>>> If we just trigger the savepoint without canceling the job and
>>>>>> redeploy the HA mode picks up from the latest savepoint.
>>>>>> But this way we can not upgrade job properties as they were picked up
>>>>>> from the savepoint as it seems.
>>>>>> Is there any advice on how to do upgrades with HA enabled?
>>>>>> Flink version is 1.12.2.
>>>>>> Thanks for your help.
>>>>>> Kr thomas

Reply via email to