[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819073#comment-17819073 ] lincoln lee commented on FLINK-6755: [~Zakelly]Thanks for your updates! > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: Zakelly Lan >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.19.0 > > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819044#comment-17819044 ] Zakelly Lan commented on FLINK-6755: [~lincoln.86xy] Thanks for the reminder! I've add a release note. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: Zakelly Lan >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.19.0 > > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819041#comment-17819041 ] lincoln lee commented on FLINK-6755: [~Zakelly] Should we add release notes for this ticket? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: Zakelly Lan >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.19.0 > > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796167#comment-17796167 ] David Artiga commented on FLINK-6755: - This is awesome :D > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: Zakelly Lan >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.19.0 > > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783452#comment-17783452 ] Zakelly Lan commented on FLINK-6755: I would like to implement this function in CLI, since we do meet the need in our production/testing senario. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-unassigned > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458438#comment-17458438 ] Piotr Nowojski commented on FLINK-6755: --- The motivation behind this feature request will be covered by FLINK-25276. As mentioned above by Aljoscha, there might be still a value of exposing manual checkpoint triggering REST API hook, so I'm keeping this ticket open. However it doesn't look like such feature is well motivated. Implementation of this should be quite straightforward since Flink internally already supports this (FLINK-24280). It's just not exposed in anyway to the user. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334259#comment-17334259 ] Flink Jira Bot commented on FLINK-6755: --- This issue was marked "stale-assigned" and has not received an update in 7 days. It is now automatically unassigned. If you are still working on it, you can assign it to yourself again. Please also give an update about the status of the work. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > Labels: stale-assigned > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323648#comment-17323648 ] Flink Jira Bot commented on FLINK-6755: --- This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > Labels: stale-assigned > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857564#comment-16857564 ] Aljoscha Krettek commented on FLINK-6755: - Hi, I had a brief discussion with Stephan that helped me sort my thoughts on the broader topics of checkpoints, savepoints, binary formats, user-triggered checkpoints, and periodic savepoints. I’ll try to summarise my stance on this and also comment with the same message on the other relevant Jira Issues and threads. For reference, the relevant FLIP and Jira issues are these: - https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints: Unified Savepoint Format - FLINK-12619: Add support for stop-with-checkpoint - FLINK-6755: User-triggered checkpoints - FLINK-4620: Automatically creating savepoints - FLINK-4511: Schedule periodic savepoints There are roughly two different dimensions in the topic of savepoints/checkpoints (I’ll use snapshot as the generic term for both): 1) who controls the snapshot 2) what’s the (binary) format of the snapshot For 1), we currently have checkpoints and savepoints. Checkpoints are created by the system for fault tolerance. They are managed by the system and the system is free to discard them when it sees fit. Savepoints are in the control of the user. A user can choose to create a save point, they can delete them, they can restore from them at will. The system will not clean up savepoints. We should try and keep this separation and not muddle the two concepts. For 2), we currently have various different formats between the different state backends and also for the same backend. I.e. RocksDB can do full or incremental snapshots, local snapshots, and probably more. FLIP-41 aims at introducing a unified “savepoint" format that is interchangeable between the different state backends. In light of the above points, we should say that FLIP-41 aims to introduce a canonical format that is interchangeable between different backends. This doesn’t mean that we should tie this format strictly to savepoints, though. For performance reasons, users might choose to do savepoints that use one of the optimised formats that the backends offer, for example incremental snapshots. Or they might choose to use the canonical format for regular checkpoints so that they can always switch between backends using periodically created externalised checkpoints. The motivation behind FLINK-12619 is to have a more lightweight alternative for stop-with-savepoint, for example using the incremental snapshot format that RocksDB has. With the above in mind, however, this becomes “Add support for choosing the snapshot format for stop-with-savepoint”. It should not be stop-with-checkpoint, because checkpoints are something that the system manages and not something that the user should trigger. The same is true for FLINK-6755, the motivation is the same I think. The change should be called “Add support for choosing the snapshot format for savepoints”, however. For the last two Jira issues mentioned above it should be quite clear what I think. I do, however, see a need for potentially different overlapping checkpoint periods or intervals. Users might want to have their regular checkpoints use an optimised format but they also want to have a “canonical format” checkpoint every no and then so that the lineage of incremental checkpoints does not become too unwieldy. Please let me know what you think! Aljoscha > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857225#comment-16857225 ] vinoyang commented on FLINK-6755: - [~klion26] The considerations facing this issue also exist in your FLINK-12619, if you read the comments again. In fact, stop-with-checkpoint is just one of the special cases of triggering checkpoints manually. This is why I ping you and [~aljoscha]. Of course, I agree with your worry about "vulnerabilities", but this also exists in savepoint. I agree with [~gyfora] 's point of view, I think we need to seriously consider the needs that users really care about. If this feature is truly valuable to users, we don't have to wait for a long release cycle. Of course, I am not saying that the problems facing the moment are not important, but they are already there, and we can also push it together to see how to solve them. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856887#comment-16856887 ] Gyula Fora commented on FLINK-6755: --- Imagine you are running a production streaming job with a few TBs of KV states. Your average checkpoint time with incremental checkpoints is 1-5 minutes, your average savepoint time could be anything from 10 minutes to 30 or more and might cost you a lot of money depending on where you are running the job. You try to keep a strong SLA and have to redeploy the job (maybe you hit a bug). At the moment you have 2 options: - Trigger a savepoint, wait for it to complete, stop the job then restore -> might take an hour total (actual production numbers) - Look at the flink UI and wait for the next checkpoint, hopefully you are lucky enough so its taken at a frequent interval. Stop the job, search for the latest checkpoint and recover the job If you have done any of these things under pressure in a production I guarantee that you broke some sweat :D I think we can risk leaking a bit more of an already user controlled mechanism. The user controls interval, number of concurrent checkpoints, etc. all part of the public API. Triggering one manually is not gonna change this by much in my opinion. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856860#comment-16856860 ] Congxian Qiu(klion26) commented on FLINK-6755: -- The two issues are indeed related, but the solutions proposed are totally different from my point of view. FLINK-12619 stems from the idea of FLINK-11458 (stop job with savepoint), the management of checkpoint is still controlled by framework and user could only *ask* the framework to perform *one single* checkpoint *at end of the job*, instead of at any time w/o limit. However, just like [~till.rohrmann] and [~srichter] mentioned, the proposal here leaks the management of checkpoint to users and may introduce side effects or even vulnerabilities (one could trigger unlimited checkpoints with unlimited interval if only get the job id and access to cluster). > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855707#comment-16855707 ] Aljoscha Krettek commented on FLINK-6755: - I agree with [~yanghua] that these two issues are related (or maybe even the same) and should be considered together. Coming from the "stop-with-checkpoint" discussion, I can clearly see the value, because a stop-with-checkpoint with incremental checkpoints could be a lot more efficient (in space and probably runtime) than stop-with-savepoint. I'm a bit torn because it exposes a somewhat more internal concept (as Till mentioned already). > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855369#comment-16855369 ] vinoyang commented on FLINK-6755: - Hi [~till.rohrmann] [~aljoscha] [~gyfora] It seems there is another issue try to work for trigger checkpoint manually(FLINK-12619). I think we need to agree with this idea before we start these issues so that we can reduce unnecessary work. cc [~klion26] > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848739#comment-16848739 ] vinoyang commented on FLINK-6755: - Hi [~till.rohrmann] If you don't mind, I want to invite [~gyfora] to be my mentor to implement this issue. We try to implement minimal changes. As for the side effects that have always existed, we hope to solve it at some point in the future. Maybe we can temporarily not provide the stop-with-checkpoint function. What do you think? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847366#comment-16847366 ] vinoyang commented on FLINK-6755: - [~gyfora] Since we have reached an agreement on the implementation plan, now only need to listen to the opinion comes from [~till.rohrmann] and [~srichter]? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846540#comment-16846540 ] vinoyang commented on FLINK-6755: - [~gyfora] If we need to keep the change as small as possible. I am not against. Just introduce another if/else execution branch. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846534#comment-16846534 ] Gyula Fora commented on FLINK-6755: --- I agree with [~yanghua] that we should try to be consistent with the different actions when it comes to the schedule. If the manual savepoint doesnt affect the checkpointing schedule the manual checkpoint probably shouldnt either. We can refactor the checkpointcoordinator but I see this as quite a small change. Probably some extra property to the CheckpointProperties saying that its user triggered so the regular checks like max concurrent checkpoints, minimum time between, etc are not enforced. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846506#comment-16846506 ] vinoyang commented on FLINK-6755: - [~srichter] [~gyfora] What do you think about the proposal that I suggest the manually trigger checkpoint does not use the timer to schedule? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843725#comment-16843725 ] vinoyang commented on FLINK-6755: - [~srichter] If we just think it's a new trigger strategy then we may not have necessary to reset the timer because it's not periodic scheduling. IMO, we should have another execution branch to support manual checkpoint. About general side effects, it's still a problem like stop-with-savepoint scene. Actually, I think the core internal {{triggerCheckpoint}} need to be refactored. With more and more features are added, it would become more and more complicated. There are more and more {{if/else}} branches, it would be hard to maintain. If we want to implement this feature, I suggest introducing an interface such as {{CheckpointTriggerStrategy}} to separate complex implementations. WDYT? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842210#comment-16842210 ] Stefan Richter commented on FLINK-6755: --- IMO, this somehow leads to the more general question about the future directions of checkpoints and savepoints and their distinguising features. It this point, on a very general level, I agree that there is a valid use case for this request. With a closer look, I think that it also leaves some open questions, e.g. how do the manually triggeres checkpoints interact with the automatic ones (reset timer?). It also sounds like the target use case would like to have a similar function like stop-with-savepoint, a stop-with-checkpoint basically. We would also have to solve the question of general side effects: if we introduce triggered checkpoints, do the commit side effects (like checkpoints) or not (like savepoints, except if used in the context of stop). Thinking a bit about unification of concepts, so far the de-facto differences that evolved are i) ownership for delete and ii) what happens to side-effects. We can have the discussion and it probably should start by anwering those questions for this idea. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842201#comment-16842201 ] Gyula Fora commented on FLINK-6755: --- Would be great to have a consensus on this feature wether we see any serious reason for not having it. [~stefanrichte...@gmail.com], [~StephanEwen] or [~uce] do you have any thoughts on this? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840198#comment-16840198 ] Gyula Fora commented on FLINK-6755: --- I agree with the leaked abstraction, I dont see this as a problem though. [~yanghua] I think selecting either automatic/manual is a bit too restrictive. The periodic checkpoints are nice, but sometimes manually triggering one right before a restart is just a great feature. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840103#comment-16840103 ] vinoyang commented on FLINK-6755: - Actually, the {{CheckpointConfig#enableExternalizedCheckpoints}} seems has leaked the concept of checkpoints more or less and the partial control permission has been transferred to the users. I think introducing this feature is just an extension of the trigger mechanism. Having this, we even can introduce a global config option for example : {{checkpoint.triggerStrategy: automatic/manual}}. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838494#comment-16838494 ] Gyula Fora commented on FLINK-6755: --- I dont really see the danger of exposing manual checkpoints at this point [~till.rohrmann], but there is a big downside to not having it as [~yanghua] explained. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838405#comment-16838405 ] vinoyang commented on FLINK-6755: - [~till.rohrmann] We just think the periodic triggering mechanism is not flexible enough. But the savepoint's recovery performance is not as better as the checkpoint. What do you think about introducing an advanced config option and let the user decide and consider the risk? Or a new concept named for example "Lightweight Savepoint", it belongs savepoint semantic but the snapshot's data format like checkpoints? > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838386#comment-16838386 ] Till Rohrmann commented on FLINK-6755: -- The idea of checkpoints are that they under control of Flink. In contrast to that savepoints are user realm objects which need to be explicitly managed by the user. I'm not sure whether we should leak the concept of checkpoints by introducing this feature even though I can see the benefit of it. At some point there was also the idea to support incremental savepoints. Not sure, whether this is feasible, though. > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838322#comment-16838322 ] vinoyang commented on FLINK-6755: - I think trigger checkpoints manually is a very valuable feature for users. I'd like to implement this feature. But it has been created so long time and did not receive feedback. Is there anything obstacle exists? [~gyfora] [~richtesn] [~till.rohrmann] > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)