[jira] [Commented] (FLINK-31685) Checkpoint job folder not deleted after job is cancelled

2023-09-07 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762597#comment-17762597
 ] 

Hangxiang Yu commented on FLINK-31685:
--

[~Zakelly] 
This makes sense to me.
_FsCompletedCheckpointStorageLocation_ losts global view of checkpoint dir info.
+1 for deleting the directory only when we know all checkpoint files are 
deleted.

> Checkpoint job folder not deleted after job is cancelled
> 
>
> Key: FLINK-31685
> URL: https://issues.apache.org/jira/browse/FLINK-31685
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Sergio Sainz
>Priority: Major
>
> When flink job is being checkpointed, and after the job is cancelled, the 
> checkpoint is indeed deleted (as per 
> {{{}execution.checkpointing.externalized-checkpoint-retention: 
> DELETE_ON_CANCELLATION{}}}), but the job-id folder still remains:
>  
> [sergio@flink-cluster-54f7fc7c6-k6km8 JobCheckpoints]$ ls
> 01eff17aa2910484b5aeb644bc531172  3a59309ef018541fc0c20856d0d89855  
> 78ff2344dd7ef89f9fbcc9789fc0cd79  a6fd7cec89c0af78c3353d4a46a7d273  
> dbc957868c08ebeb100d708bbd057593
> 04ff0abb9e860fc85f0e39d722367c3c  3e09166341615b1b4786efd6745a05d6  
> 79efc000aa29522f0a9598661f485f67  a8c42bfe158abd78ebcb4adb135de61f  
> dc8e04b02c9d8a1bc04b21d2c8f21f74
> 05f48019475de40230900230c63cfe89  3f9fb467c9af91ef41d527fe92f9b590  
> 7a6ad7407d7120eda635d71cd843916a  a8db748c1d329407405387ac82040be4  
> dfb2df1c25056e920d41c94b659dcdab
> 09d30bc0ff786994a6a3bb06abd3  455525b76a1c6826b6eaebd5649c5b6b  
> 7b1458424496baaf3d020e9fece525a4  aa2ef9587b2e9c123744e8940a66a287
> All folders in the above list, like {{01eff17aa2910484b5aeb644bc531172}}  , 
> are empty ~
>  
> *Expected behaviour:*
> The job folder id should also be deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31685) Checkpoint job folder not deleted after job is cancelled

2023-09-02 Thread Zakelly Lan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761487#comment-17761487
 ] 

Zakelly Lan commented on FLINK-31685:
-

[~masteryhx]  I think there are two issue within this problem:
 # User may not need the job-id directory (to simplify the cp dir layout 
especially in CLAIM node), I will create another ticket to address this.
 # Deleting the job-id directory if all the checkpoint files are deleted. 
Different from [~Wencong Liu]'s opinion, I think it is the 
```CompletedCheckpointStore```'s responsibility to delete the job-id directory, 
since it has the global view of whether it is needed by any other checkpoint.

WDYT?

> Checkpoint job folder not deleted after job is cancelled
> 
>
> Key: FLINK-31685
> URL: https://issues.apache.org/jira/browse/FLINK-31685
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Sergio Sainz
>Priority: Major
>
> When flink job is being checkpointed, and after the job is cancelled, the 
> checkpoint is indeed deleted (as per 
> {{{}execution.checkpointing.externalized-checkpoint-retention: 
> DELETE_ON_CANCELLATION{}}}), but the job-id folder still remains:
>  
> [sergio@flink-cluster-54f7fc7c6-k6km8 JobCheckpoints]$ ls
> 01eff17aa2910484b5aeb644bc531172  3a59309ef018541fc0c20856d0d89855  
> 78ff2344dd7ef89f9fbcc9789fc0cd79  a6fd7cec89c0af78c3353d4a46a7d273  
> dbc957868c08ebeb100d708bbd057593
> 04ff0abb9e860fc85f0e39d722367c3c  3e09166341615b1b4786efd6745a05d6  
> 79efc000aa29522f0a9598661f485f67  a8c42bfe158abd78ebcb4adb135de61f  
> dc8e04b02c9d8a1bc04b21d2c8f21f74
> 05f48019475de40230900230c63cfe89  3f9fb467c9af91ef41d527fe92f9b590  
> 7a6ad7407d7120eda635d71cd843916a  a8db748c1d329407405387ac82040be4  
> dfb2df1c25056e920d41c94b659dcdab
> 09d30bc0ff786994a6a3bb06abd3  455525b76a1c6826b6eaebd5649c5b6b  
> 7b1458424496baaf3d020e9fece525a4  aa2ef9587b2e9c123744e8940a66a287
> All folders in the above list, like {{01eff17aa2910484b5aeb644bc531172}}  , 
> are empty ~
>  
> *Expected behaviour:*
> The job folder id should also be deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31685) Checkpoint job folder not deleted after job is cancelled

2023-08-22 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757767#comment-17757767
 ] 

Hangxiang Yu commented on FLINK-31685:
--

I just linked many related tickets.

It's valid and many users want to resolve.

I think we could just introduce an option whether generate the job id directory 
and make them compatible.

As for the job id layout, I think it's still useful if user want to save some 
historical checkpoints with NO_CLAIM mode.

[~tangyun]  WDYT?

> Checkpoint job folder not deleted after job is cancelled
> 
>
> Key: FLINK-31685
> URL: https://issues.apache.org/jira/browse/FLINK-31685
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Sergio Sainz
>Priority: Major
>
> When flink job is being checkpointed, and after the job is cancelled, the 
> checkpoint is indeed deleted (as per 
> {{{}execution.checkpointing.externalized-checkpoint-retention: 
> DELETE_ON_CANCELLATION{}}}), but the job-id folder still remains:
>  
> [sergio@flink-cluster-54f7fc7c6-k6km8 JobCheckpoints]$ ls
> 01eff17aa2910484b5aeb644bc531172  3a59309ef018541fc0c20856d0d89855  
> 78ff2344dd7ef89f9fbcc9789fc0cd79  a6fd7cec89c0af78c3353d4a46a7d273  
> dbc957868c08ebeb100d708bbd057593
> 04ff0abb9e860fc85f0e39d722367c3c  3e09166341615b1b4786efd6745a05d6  
> 79efc000aa29522f0a9598661f485f67  a8c42bfe158abd78ebcb4adb135de61f  
> dc8e04b02c9d8a1bc04b21d2c8f21f74
> 05f48019475de40230900230c63cfe89  3f9fb467c9af91ef41d527fe92f9b590  
> 7a6ad7407d7120eda635d71cd843916a  a8db748c1d329407405387ac82040be4  
> dfb2df1c25056e920d41c94b659dcdab
> 09d30bc0ff786994a6a3bb06abd3  455525b76a1c6826b6eaebd5649c5b6b  
> 7b1458424496baaf3d020e9fece525a4  aa2ef9587b2e9c123744e8940a66a287
> All folders in the above list, like {{01eff17aa2910484b5aeb644bc531172}}  , 
> are empty ~
>  
> *Expected behaviour:*
> The job folder id should also be deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31685) Checkpoint job folder not deleted after job is cancelled

2023-04-01 Thread Wencong Liu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707508#comment-17707508
 ] 

Wencong Liu commented on FLINK-31685:
-

Hello [~sergiosp] , thanks for proposing this ticket! I think the key code path 
is 
{code:java}
FsCompletedCheckpointStorageLocation#disposeStorageLocation {code}
We could delete the parent folder in this method.

> Checkpoint job folder not deleted after job is cancelled
> 
>
> Key: FLINK-31685
> URL: https://issues.apache.org/jira/browse/FLINK-31685
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.1
>Reporter: Sergio Sainz
>Priority: Major
>
> When flink job is being checkpointed, and after the job is cancelled, the 
> checkpoint is indeed deleted (as per 
> {{{}execution.checkpointing.externalized-checkpoint-retention: 
> DELETE_ON_CANCELLATION{}}}), but the job-id folder still remains:
>  
> [sergio@flink-cluster-54f7fc7c6-k6km8 JobCheckpoints]$ ls
> 01eff17aa2910484b5aeb644bc531172  3a59309ef018541fc0c20856d0d89855  
> 78ff2344dd7ef89f9fbcc9789fc0cd79  a6fd7cec89c0af78c3353d4a46a7d273  
> dbc957868c08ebeb100d708bbd057593
> 04ff0abb9e860fc85f0e39d722367c3c  3e09166341615b1b4786efd6745a05d6  
> 79efc000aa29522f0a9598661f485f67  a8c42bfe158abd78ebcb4adb135de61f  
> dc8e04b02c9d8a1bc04b21d2c8f21f74
> 05f48019475de40230900230c63cfe89  3f9fb467c9af91ef41d527fe92f9b590  
> 7a6ad7407d7120eda635d71cd843916a  a8db748c1d329407405387ac82040be4  
> dfb2df1c25056e920d41c94b659dcdab
> 09d30bc0ff786994a6a3bb06abd3  455525b76a1c6826b6eaebd5649c5b6b  
> 7b1458424496baaf3d020e9fece525a4  aa2ef9587b2e9c123744e8940a66a287
> All folders in the above list, like {{01eff17aa2910484b5aeb644bc531172}}  , 
> are empty ~
>  
> *Expected behaviour:*
> The job folder id should also be deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)