Repository: flink Updated Branches: refs/heads/master 92efcd34a -> 6c0a83e4f
[fix] [docs] Fix typo in savepoints documentation Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/6c0a83e4 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/6c0a83e4 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/6c0a83e4 Branch: refs/heads/master Commit: 6c0a83e4fa1ef3d68df31bf01618972c4a445b21 Parents: 92efcd3 Author: Till Rohrmann <trohrm...@apache.org> Authored: Mon Feb 1 18:16:07 2016 +0100 Committer: Till Rohrmann <trohrm...@apache.org> Committed: Mon Feb 1 18:16:07 2016 +0100 ---------------------------------------------------------------------- docs/apis/streaming/savepoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/6c0a83e4/docs/apis/streaming/savepoints.md ---------------------------------------------------------------------- diff --git a/docs/apis/streaming/savepoints.md b/docs/apis/streaming/savepoints.md index 910f503..f04845c 100644 --- a/docs/apis/streaming/savepoints.md +++ b/docs/apis/streaming/savepoints.md @@ -79,7 +79,7 @@ For savepoints **only stateful tasks matter**. In the above example, the source Each task is identified by its **generated task IDs** and **subtask index**. In the above example the state of the source (**s<sub>1</sub>**, **s<sub>2</sub>**) and map tasks (**m<sub>1</sub>**, **m<sub>2</sub>**) is identified by their respective task ID (*0xC322EC* for the source tasks and *0x27B3EF* for the map tasks) and subtask index. There is no state for the sinks (**t<sub>1</sub>**, **t<sub>2</sub>**). Their IDs therefore do not matter. -<span class="label label-danger">Important</span> The IDs are generated **deterministically** from your program structure. This means that as long as your program does not change, the IDs do not change. **The only allowed changes are within the user function, e.g. you can change the implemented `MapFunction` without changing the typology**. In this case, it is straight forward to restore the state from a savepoint by mapping it back to the same task IDs and subtask indexes. This allows you to work with savepoints out of the box, but gets problematic as soon as you make changes to the topology, because they result in changed IDs and the savepoint state cannot be mapped to your program any more. +<span class="label label-danger">Important</span> The IDs are generated **deterministically** from your program structure. This means that as long as your program does not change, the IDs do not change. **The only allowed changes are within the user function, e.g. you can change the implemented `MapFunction` without changing the topology**. In this case, it is straight forward to restore the state from a savepoint by mapping it back to the same task IDs and subtask indexes. This allows you to work with savepoints out of the box, but gets problematic as soon as you make changes to the topology, because they result in changed IDs and the savepoint state cannot be mapped to your program any more. <span class="label label-info">Recommended</span> In order to be able to change your program and **have fixed IDs**, the *DataStream* API provides a method to manually specify the task IDs. Each operator provides a **`uid(String)`** method to override the generated ID. The ID is a String, which will be deterministically hashed to a 16-byte hash value. It is **important** that the specified IDs are **unique per transformation and job**. If this is not the case, job submission will fail.