Re: Flink 1.18.0 Checkpoints on cancelled jobs

2023-12-18 Thread Hong Liang
Hi Ethan!

Thanks for raising the issue, this is indeed a bug - for the previous code
path, it falls back to "execution graph store" for completed jobs.
I've raise a JIRA here - https://issues.apache.org/jira/browse/FLINK-33872
I've also managed to RC and fix it in the associated PR -
https://github.com/apache/flink/pull/23949.

Regards,
Hong

On Thu, Dec 14, 2023 at 10:07 AM Ethan T Yang  wrote:

> Hi Hong Liang Teoh,
> I think you are the owner of the ticket below. Can you take a look see if
> a bug in the code that breaks retrieving checkpoint history of the
> cancelled job?
>
> Thanks,
> Ivan
>
> On Dec 10, 2023, at 8:46 AM, Surendra Singh Lilhore <
> surendralilh...@gmail.com> wrote:
>
> Hi Ethan,
>
> Looks like this got changed after
> https://issues.apache.org/jira/browse/FLINK-32469.
>
> Now the checkpoint history call throws below exception for canceled job.
>
> 2023-12-10 21:50:12,990 ERROR
> org.apache.flink.runtime.rest.handler.job.checkpoints.
> CheckpointingStatisticsHandler [] - Exception occurred in REST handler:
> Job 7504e7a6106093a3a9c7ef35f52ce6cf not found
>
>
> Thanks
> Surendra
>
>
> On Sat, Dec 9, 2023 at 12:26 PM Ethan T Yang  wrote:
>
>> Hello Surendra,
>> Thank you for replying my question. I already have this code
>>
>>
>> env.getCheckpointConfig().setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>>
>> I also tried use the rest api to retrieve a cancelled job, and no
>> checkpoint was found from the rest api either. We use this conf
>>
>> # s3 checkpointing
>> state.backend: filesystem
>> state.checkpoints.dir: {{ .Values.jobManager.checkpointUrl }}
>> state.savepoints.dir: {{ .Values.jobManager.savepointUrl }}
>>
>> The actual checkpoint is there in s3 after cancellation. Can someone
>> point me to the code where the checkpoint history is maintained?
>>
>> Thanks,
>> Ethan
>>
>> On Dec 8, 2023, at 8:23 AM, Surendra Singh Lilhore <
>> surendralilh...@gmail.com> wrote:
>>
>>
>> Hi Ethan,
>>
>> Can you try :
>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints/#retained-checkpoints
>>
>> Thanks
>> Surendra
>>
>>
>> On Thu, Dec 7, 2023 at 4:47 PM Ethan T Yang  wrote:
>>
>>> Hi Flink Users,
>>>
>>> After migration from Flink 1.13.1 -> 1.18.0, I am no longer seeing the
>>> checkpoint history after cancelling a job. I am wonder which setting to
>>> enable so that I can see the checkpoint history on a cancelled job in Flink
>>> 1.18.0.  Below is the screenshot of what I can see in Flink 1.13.1. Hope to
>>> get back the same view in the new version.
>>>
>>> Thanks,
>>> Ethan
>>>
>>>
>>
>


Re: Flink 1.18.0 Checkpoints on cancelled jobs

2023-12-14 Thread Ethan T Yang
Hi Hong Liang Teoh, 
I think you are the owner of the ticket below. Can you take a look see if a bug 
in the code that breaks retrieving checkpoint history of the cancelled job?

Thanks,
Ivan

> On Dec 10, 2023, at 8:46 AM, Surendra Singh Lilhore 
>  wrote:
> 
> Hi Ethan,
> 
> Looks like this got changed after 
> https://issues.apache.org/jira/browse/FLINK-32469.
> 
> Now the checkpoint history call throws below exception for canceled job.
> 
> 2023-12-10 21:50:12,990 ERROR 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler
>  [] - Exception occurred in REST handler: Job 
> 7504e7a6106093a3a9c7ef35f52ce6cf not found
> 
> 
> Thanks
> Surendra
> 
> 
> On Sat, Dec 9, 2023 at 12:26 PM Ethan T Yang  > wrote:
>> Hello Surendra, 
>> Thank you for replying my question. I already have this code 
>> 
>> env.getCheckpointConfig().setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>> 
>> I also tried use the rest api to retrieve a cancelled job, and no checkpoint 
>> was found from the rest api either. We use this conf
>> 
>> # s3 checkpointing
>> state.backend: filesystem
>> state.checkpoints.dir: {{ .Values.jobManager.checkpointUrl }}
>> state.savepoints.dir: {{ .Values.jobManager.savepointUrl }}
>> The actual checkpoint is there in s3 after cancellation. Can someone point 
>> me to the code where the checkpoint history is maintained?
>> 
>> Thanks,
>> Ethan
>> 
>>> On Dec 8, 2023, at 8:23 AM, Surendra Singh Lilhore 
>>> mailto:surendralilh...@gmail.com>> wrote:
>>> 
>>> 
>>> Hi Ethan,
>>> 
>>> Can you try : 
>>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints/#retained-checkpoints
>>> 
>>> Thanks 
>>> Surendra
>>> 
>>> 
>>> On Thu, Dec 7, 2023 at 4:47 PM Ethan T Yang >> > wrote:
 Hi Flink Users,
 
 After migration from Flink 1.13.1 -> 1.18.0, I am no longer seeing the 
 checkpoint history after cancelling a job. I am wonder which setting to 
 enable so that I can see the checkpoint history on a cancelled job in 
 Flink 1.18.0.  Below is the screenshot of what I can see in Flink 1.13.1. 
 Hope to get back the same view in the new version.
 
 Thanks,
 Ethan
 
>> 



Re: Flink 1.18.0 Checkpoints on cancelled jobs

2023-12-10 Thread Surendra Singh Lilhore
Hi Ethan,

Looks like this got changed after
https://issues.apache.org/jira/browse/FLINK-32469.

Now the checkpoint history call throws below exception for canceled job.

2023-12-10 21:50:12,990 ERROR
org.apache.flink.runtime.rest.handler.job.checkpoints.
CheckpointingStatisticsHandler [] - Exception occurred in REST handler: Job
7504e7a6106093a3a9c7ef35f52ce6cf not found


Thanks
Surendra


On Sat, Dec 9, 2023 at 12:26 PM Ethan T Yang  wrote:

> Hello Surendra,
> Thank you for replying my question. I already have this code
>
>
> env.getCheckpointConfig().setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>
> I also tried use the rest api to retrieve a cancelled job, and no
> checkpoint was found from the rest api either. We use this conf
>
> # s3 checkpointing
> state.backend: filesystem
> state.checkpoints.dir: {{ .Values.jobManager.checkpointUrl }}
> state.savepoints.dir: {{ .Values.jobManager.savepointUrl }}
>
> The actual checkpoint is there in s3 after cancellation. Can someone point
> me to the code where the checkpoint history is maintained?
>
> Thanks,
> Ethan
>
> On Dec 8, 2023, at 8:23 AM, Surendra Singh Lilhore <
> surendralilh...@gmail.com> wrote:
>
>
> Hi Ethan,
>
> Can you try :
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints/#retained-checkpoints
>
> Thanks
> Surendra
>
>
> On Thu, Dec 7, 2023 at 4:47 PM Ethan T Yang  wrote:
>
>> Hi Flink Users,
>>
>> After migration from Flink 1.13.1 -> 1.18.0, I am no longer seeing the
>> checkpoint history after cancelling a job. I am wonder which setting to
>> enable so that I can see the checkpoint history on a cancelled job in Flink
>> 1.18.0.  Below is the screenshot of what I can see in Flink 1.13.1. Hope to
>> get back the same view in the new version.
>>
>> Thanks,
>> Ethan
>>
>>
>