[GitHub] flink pull request #6092: [FLINK-9352] In Standalone checkpoint recover mode...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/6092 ---
[GitHub] flink pull request #6092: [FLINK-9352] In Standalone checkpoint recover mode...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6092#discussion_r200065877 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java --- @@ -1173,9 +1179,10 @@ public void startCheckpointScheduler() { stopCheckpointScheduler(); periodicScheduling = true; + long initialDelay = schedulerInitialDelayGenerator.nextLong( + minPauseBetweenCheckpointsNanos / 1_000_000, baseInterval); --- End diff -- Could we replace `schedulerInitialDelayGenerator` with `long initialDelay = ThreadLocalRandom.current().nextLong(minPauseBetweenCheckpointsNanos / 1_000_000, baseInterval);`? That way we would not have to use `RandomUtils`. ---
[GitHub] flink pull request #6092: [FLINK-9352] In Standalone checkpoint recover mode...
Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6092#discussion_r191341746 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java --- @@ -1173,9 +1179,10 @@ public void startCheckpointScheduler() { stopCheckpointScheduler(); periodicScheduling = true; + long initialDelay = schedulerInitialDelayGenerator.nextLong( + minPauseBetweenCheckpointsNanos / 1_000_000, baseInterval); --- End diff -- @yuqi1129 please see the constructor's [code segment](https://github.com/yanghua/flink/blob/1eb432833bf2dd23187194500b6e1c6523f30605/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L218) ---
[GitHub] flink pull request #6092: [FLINK-9352] In Standalone checkpoint recover mode...
Github user yuqi1129 commented on a diff in the pull request: https://github.com/apache/flink/pull/6092#discussion_r191340849 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java --- @@ -1173,9 +1179,10 @@ public void startCheckpointScheduler() { stopCheckpointScheduler(); periodicScheduling = true; + long initialDelay = schedulerInitialDelayGenerator.nextLong( + minPauseBetweenCheckpointsNanos / 1_000_000, baseInterval); --- End diff -- Any place to check the value `minPauseBetweenCheckpointsNanos / 1_000_000` is less or equal than baseInterval? ---
[GitHub] flink pull request #6092: [FLINK-9352] In Standalone checkpoint recover mode...
GitHub user yanghua opened a pull request: https://github.com/apache/flink/pull/6092 [FLINK-9352] In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure ## What is the purpose of the change *This pull request fixed a problem : In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure* ## Brief change log - *Replace the scheduler's initial delay time from baseInterval to a random num between min pause and base interval* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / **not documented**) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanghua/flink FLINK-9352 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6092.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6092 commit 1eb432833bf2dd23187194500b6e1c6523f30605 Author: yanghua Date: 2018-05-29T07:59:48Z [FLINK-9352] In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure ---