Hi Ethan!

Thanks for raising the issue, this is indeed a bug - for the previous code
path, it falls back to "execution graph store" for completed jobs.
I've raise a JIRA here - https://issues.apache.org/jira/browse/FLINK-33872
I've also managed to RC and fix it in the associated PR -
https://github.com/apache/flink/pull/23949.

Regards,
Hong

On Thu, Dec 14, 2023 at 10:07 AM Ethan T Yang <ivanygy...@gmail.com> wrote:

> Hi Hong Liang Teoh,
> I think you are the owner of the ticket below. Can you take a look see if
> a bug in the code that breaks retrieving checkpoint history of the
> cancelled job?
>
> Thanks,
> Ivan
>
> On Dec 10, 2023, at 8:46 AM, Surendra Singh Lilhore <
> surendralilh...@gmail.com> wrote:
>
> Hi Ethan,
>
> Looks like this got changed after
> https://issues.apache.org/jira/browse/FLINK-32469.
>
> Now the checkpoint history call throws below exception for canceled job.
>
> 2023-12-10 21:50:12,990 ERROR
> org.apache.flink.runtime.rest.handler.job.checkpoints.
> CheckpointingStatisticsHandler [] - Exception occurred in REST handler:
> Job 7504e7a6106093a3a9c7ef35f52ce6cf not found
>
>
> Thanks
> Surendra
>
>
> On Sat, Dec 9, 2023 at 12:26 PM Ethan T Yang <ivanygy...@gmail.com> wrote:
>
>> Hello Surendra,
>> Thank you for replying my question. I already have this code
>>
>>
>> env.getCheckpointConfig().setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>>
>> I also tried use the rest api to retrieve a cancelled job, and no
>> checkpoint was found from the rest api either. We use this conf
>>
>> # s3 checkpointing
>> state.backend: filesystem
>> state.checkpoints.dir: {{ .Values.jobManager.checkpointUrl }}
>> state.savepoints.dir: {{ .Values.jobManager.savepointUrl }}
>>
>> The actual checkpoint is there in s3 after cancellation. Can someone
>> point me to the code where the checkpoint history is maintained?
>>
>> Thanks,
>> Ethan
>>
>> On Dec 8, 2023, at 8:23 AM, Surendra Singh Lilhore <
>> surendralilh...@gmail.com> wrote:
>>
>>
>> Hi Ethan,
>>
>> Can you try :
>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints/#retained-checkpoints
>>
>> Thanks
>> Surendra
>>
>>
>> On Thu, Dec 7, 2023 at 4:47 PM Ethan T Yang <ivanygy...@gmail.com> wrote:
>>
>>> Hi Flink Users,
>>>
>>> After migration from Flink 1.13.1 -> 1.18.0, I am no longer seeing the
>>> checkpoint history after cancelling a job. I am wonder which setting to
>>> enable so that I can see the checkpoint history on a cancelled job in Flink
>>> 1.18.0.  Below is the screenshot of what I can see in Flink 1.13.1. Hope to
>>> get back the same view in the new version.
>>>
>>> Thanks,
>>> Ethan
>>>
>>>
>>
>

Reply via email to