Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
>> >>>> Thanks for the answer. >>>> I think this is not about the operator issue, kubernetes deployment >>>> just restarts the fallen pod, restarted jobmanager without HA metadata >>>> starts the job itself from an empty state. >>>> >>>> I'm looking for a way t

Re: JobManager restarts on job failure

2022-09-26 Thread Gyula Fóra
estarted jobmanager without HA metadata >>> starts the job itself from an empty state. >>> >>> I'm looking for a way to prevent it from exiting in case of an job error >>> (we use application mode cluster). >>> >>> >>> >>> -- >>&

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
t; -- >> *От:* Gyula Fóra >> *Отправлено:* 20 сентября 2022 г. 19:49:37 >> *Кому:* Evgeniy Lyutikov >> *Копия:* user@flink.apache.org >> *Тема:* Re: JobManager restarts on job failure >> >> The best thing for you to do would be t

Re: JobManager restarts on job failure

2022-09-20 Thread Gyula Fóra
e application mode cluster). > > > > -- > *От:* Gyula Fóra > *Отправлено:* 20 сентября 2022 г. 19:49:37 > *Кому:* Evgeniy Lyutikov > *Копия:* user@flink.apache.org > *Тема:* Re: JobManager restarts on job failure > > The best thing for you to do

Re: JobManager restarts on job failure

2022-09-20 Thread Evgeniy Lyutikov
use application mode cluster). От: Gyula Fóra Отправлено: 20 сентября 2022 г. 19:49:37 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: JobManager restarts on job failure The best thing for you to do would be to upgrade to Flink 1.15 and the latest ope

Re: JobManager restarts on job failure

2022-09-20 Thread Gyula Fóra
The best thing for you to do would be to upgrade to Flink 1.15 and the latest operator version. In Flink 1.15 we have the option to interact with the Flink jobmanager even after the job FAILED and the operator leverages this for a much more robust behaviour. In any case the operator should not eve

JobManager restarts on job failure

2022-09-20 Thread Evgeniy Lyutikov
Hi, We using flink 1.14.4 with flink kubernetes operator. Sometimes when updating a job, it fails on startup and flink removes all HA metadata and exits the jobmanager. 2022-09-14 14:54:44,534 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Restoring job 00