[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481098#comment-17481098 ] Yun Gao commented on FLINK-17808: - Fix on master via 4050cf320fc6c3ad77ad406689dd8a99af75d594. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Assignee: Junfan Zhang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464459#comment-17464459 ] Yun Tang commented on FLINK-17808: -- Since [~zuston] had already provided a PR to review, already assigned this ticket to him. Please refactor the PR: 1. Consider file systems which not support recoverable writer. 2. Resolve all failed tests of current PR. 3. Add a separate test to verify the logic. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Assignee: Junfan Zhang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464399#comment-17464399 ] Junfan Zhang commented on FLINK-17808: -- Gentle ping [~yunta] . Do you have any ideas? > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463624#comment-17463624 ] Junfan Zhang commented on FLINK-17808: -- I overlooked this, and thanks [~yunta] for pointing it. I think If filesystem dont support, we could fall back to original implementation in {{{}FsCheckpointMetadataOutputStream{}}}. Maybe we should underline this point in doc. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463604#comment-17463604 ] Yun Tang commented on FLINK-17808: -- [~zuston] It seems FileSystem#createRecoverableWriter is only optional on file systems and various file system implementations may not support this method, did you consider file systems which not support this feature? > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463584#comment-17463584 ] Junfan Zhang commented on FLINK-17808: -- [~yunta] As Stephan mentioned above, i also think the first one is the better option. So we could use the {{RecoverableFsDataOutputStream}} close and commit to ensure the writing file atomicity instead of using the {{FSDataOutputStream}} in {{FsCheckpointMetadataOutputStream}}. Right? Please let me know what u think. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463546#comment-17463546 ] Yun Tang commented on FLINK-17808: -- Before moving to review PR, [~zuston] could you share your ideas first and ensure to got assignment of this ticket. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462957#comment-17462957 ] Junfan Zhang commented on FLINK-17808: -- Could i take over this ticket? [~yunta] [~guoyangze] [~zhoujira86] Draft PR link: https://github.com/apache/flink/pull/18157 If OK, i will optimize and add more tests on it. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315992#comment-17315992 ] xiaogang zhou commented on FLINK-17808: --- [~yunta] Let me know your thinking :) > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Major > Fix For: 1.14.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312029#comment-17312029 ] xiaogang zhou commented on FLINK-17808: --- [~yunta] As Steven mentioned above, between two solutions, the better one is to use a RecoverableWriter. We can refer to the streamFileWriter's approach, use the existing RecoverableFsDataOutputStream. We don't really use the recover function, only need the commit function. And the closeForCommit function is enough for our case. And for each implement: 1, file: ok 2, hdfs: ok 3, S3: ok 4, oss: I am not sure about the fs, we can discuss whether we should implement a Recoverable stream for it. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Major > Fix For: 1.14.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311497#comment-17311497 ] Yun Tang commented on FLINK-17808: -- [~zhoujira86] since feature freeze of Flink-1.13 is close, we can have some discussions here first to choose the best solution from current available candidates. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Major > Fix For: 1.14.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311245#comment-17311245 ] xiaogang zhou commented on FLINK-17808: --- Should we create a pointer point to the latest finished _metadata? I can take this ticked [~yunta] > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > Fix For: 1.13.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112652#comment-17112652 ] Stephan Ewen commented on FLINK-17808: -- I think this pattern (atomically visible file) may be quite common. We can add a method to {{FileSystem}} or add a flag to {{create(Path)}} to create a stream for a file that is only visible once the writing is complete. That should be easy to implement - file:// will to that via rename - hdfs:// will do that via rename - s3:// does not need to do anything, it always behaves like that. - oss:// ? (not sure, but I would assume also does not need anything, object stores tend to publish at the end) This avoids cases where components manually implement a rename-based solution (which breaks S3 and probably other object store visibility consistency) > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > Fix For: 1.12.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112647#comment-17112647 ] Stephan Ewen commented on FLINK-17808: -- We need to avoid renaming in checkpoints, because it causes visibility/consistency issues on some file systems. We can instead do the following: - Use the RecoverableWriter (we don't need the recoverability, but we can use its committing feature) - Write a "latest checkpoint" file in the checkpoints root which points to the latest completed checkpoint Option two would also be a simple way to implement a generic "resume latest" feature for the CLI. It would not reliably work on all filesystems (for example not reliably on S3), but that would not be as bad as having inconsistent visibility of the checkpoint metadata file, which is used by ZK and externalized-checkpoint-based recovery. > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > Fix For: 1.12.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)