[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1923 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1923#issuecomment-214812452 Unrelated test case failures. Will be merging this PR to master and the 1.0-release branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1923#issuecomment-214327997 I increased the timeouts another time. Maybe the timeout was still set too low. Let's see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1923#issuecomment-214211890 The build fails with timeouts while waiting for the TMs to connect again. This is weird as the changes look unrelated to this. Do you have an idea what this might be? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1923#issuecomment-213346413 Maybe I have to increase the timeout for Travis here. Will do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1923#issuecomment-213342621 Code looks good. It seems that the test is still a bit unstable: ``` Tests in error: LeaderChangeJobRecoveryTest.before:73 û Timeout Futures timed out after [1... ``` in 3 of 5 runs: https://s3.amazonaws.com/archive.travis-ci.org/jobs/124781545/log.txt https://s3.amazonaws.com/archive.travis-ci.org/jobs/124781546/log.txt https://s3.amazonaws.com/archive.travis-ci.org/jobs/124781547/log.txt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1923#issuecomment-213288408 Good fix, Till! Changes look good, +1 to merge soon for 1.0.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/1923 [FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly This PR terminates the ExecutionGraphs properly without restarts when the JobManager calls cancelAndClearEverything. It is achieved by allowing the method to be only called with an SuppressRestartsException. The SuppressRestartsException will disable the restart strategy of the respective ExecutionGraph. This is important because the root cause could be a different exception. In order to avoid race conditions, the restart strategy has to be checked twice whether it allows to restart the job: Once before and once after the job has transitioned to the state RESTARTING. This avoids that ExecutionGraphs can become orphans. Furthermore, this PR fixes the problem that the default restart strategy is shared by multiple jobs. The problem is solved by introducing a RestartStrategyFactory which creates for every job its own instance of a RestartStrategy. - [X] General - The pull request references the related JIRA issue - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixJobRestart Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1923.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1923 commit ea05ae102428f6be8db4091b849b680112099c36 Author: Till RohrmannDate: 2016-04-21T15:07:51Z [FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly This PR terminates the ExecutionGraphs properly without restarts when the JobManager calls cancelAndClearEverything. It is achieved by allowing the method to be only called with an SuppressRestartsException. The SuppressRestartsException will disable the restart strategy of the respective ExecutionGraph. This is important because the root cause could be a different exception. In order to avoid race conditions, the restart strategy has to be checked twice whether it allwos to restart the job: Once before and once after the job has transitioned to the state RESTARTING. This avoids that ExecutionGraphs can become an orphan. Furhtermore, this PR fixes the problem that the default restart strategy is shared by multiple jobs. The problem is solved by introducing a RestartStrategyFactory which creates for every job its own instance of a RestartStrategy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---