[GitHub] flink pull request #2609: [FLINK-4717] Add CancelJobWithSavepoint

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2609


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2609: [FLINK-4717] Add CancelJobWithSavepoint

2016-10-12 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/2609#discussion_r82980824
  
--- Diff: docs/setup/cli.md ---
@@ -113,6 +113,10 @@ The command line can be used to
 
 ./bin/flink cancel 
 
+-   Cancel a job with a savepoint:
+
+./bin/flink cancel -s [targetDirectory] 
--- End diff --

We could also change the savepoint trigger command to `savepoint -t 
[targetDir] `.

Or we could allow `cancel  [targetDirectory]` in addition to the 
`-s` option variant, but that might be slightly confusing, too. We need the 
`-s` option without an argument for cancel with savepoint to the default target.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2609: [FLINK-4717] Add CancelJobWithSavepoint

2016-10-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2609#discussion_r82805867
  
--- Diff: docs/setup/cli.md ---
@@ -113,6 +113,10 @@ The command line can be used to
 
 ./bin/flink cancel 
 
+-   Cancel a job with a savepoint:
+
+./bin/flink cancel -s [targetDirectory] 
--- End diff --

It is a little bit confusing that the `savepoint` command has a different 
argument order. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2609: [FLINK-4717] Add CancelJobWithSavepoint

2016-10-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2609#discussion_r82814608
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -581,6 +581,62 @@ class JobManager(
   )
   }
 
+case CancelJobWithSavepoint(jobId, savepointDirectory) =>
+  try {
+val targetDirectory = if (savepointDirectory != null) {
+  savepointDirectory
+} else {
+  defaultSavepointDir
+}
+
+log.info(s"Trying to cancel job $jobId with savepoint to 
$targetDirectory")
+
+currentJobs.get(jobId) match {
+  case Some((executionGraph, _)) =>
+// We don't want any checkpoint between the savepoint and 
cancellation
+val coord = executionGraph.getCheckpointCoordinator
+coord.stopCheckpointScheduler()
--- End diff --

I think it's not enough to simply call `stopCheckpointScheduler`. If I'm 
not mistaken, then the following could happen: You call 
`stopCheckpointScheduler` which will try to `cancel` the last 
`currentPeriodicTrigger`. Now assume that the last `TimerTask` to trigger the 
next checkpoint has just been triggered but not executed (just before 
cancelling it). Now the `stopCheckpointScheduler` finishes without the 
`TimerTask` having completed. Now the `TimerTask` can still trigger a 
checkpoint even though we've stopped the checkpoint scheduler.

The way to fix this (admittedly academic corner case), is to filter out 
outdated `TimerTask` calls in the `CheckpointCoordinator` by having a kind of 
fencing tokens for the trigger checkpoint calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2609: [FLINK-4717] Add CancelJobWithSavepoint

2016-10-07 Thread uce
GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/2609

[FLINK-4717] Add CancelJobWithSavepoint 

This builds on top of #2608.

- Adds CancelJobWithSavepoint message, which triggers a savepoint before 
cancelling the respective job. The periodic checkpoint scheduler is stopped 
before doing this so that no checkpoints happen between the savepoint and 
cancellation. Note that users could manually trigger another savepoint while 
cancel-with-savepoint happens, but I think this is tolerable.

- Adds `-s [targetDirectory]` option to CLI cancel command:
* Regular cancelling:`bin/flink cancel `
* Cancel with savepoint to default dir: `bin/flink cancel -s `
* Cancel with savepoint to target dir: `bin/flink cancel -s [targetDir] 
`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/flink 4717-cancel_with_savepoint

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2609.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2609






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---