Stefan Richter created FLINK-5053:
-------------------------------------

             Summary: Incremental / lightweight snapshots for checkpoints
                 Key: FLINK-5053
                 URL: https://issues.apache.org/jira/browse/FLINK-5053
             Project: Flink
          Issue Type: Improvement
          Components: State Backends, Checkpointing
            Reporter: Stefan Richter


There is currently basically no difference between savepoints and checkpoints 
in Flink and both are created through exactly the same process.

However, savepoints and checkpoints have a slightly different meaning which we 
should take into account to keep Flink efficient:

- Savepoints are (typically infrequently) triggered by the user to create a 
state from which the application can be restarted, e.g. because Flink, some 
code, or the parallelism needs to be changed.

- Checkpoints are (typically frequently) triggered by the System to allow for 
fast recovery in case of failure, but keeping the job/system unchanged.

This means that savepoints and checkpoints can have different properties in 
that:

- Savepoint should represent a state of the application, where characteristics 
of the job (e.g. parallelism) can be adjusted for the next restart. One example 
for things that savepoints need to be aware of are key-groups. Savepoints can 
potentially be a little more expensive than checkpoints, because they are 
usually created a lot less frequently through the user.

- Checkpoints are frequently triggered by the system to allow for fast failure 
recovery. However, failure recovery leaves all characteristics of the job 
unchanged. This checkpoints do not have to be aware of those, e.g. think again 
of key groups. Checkpoints should run faster than creating savepoints, in 
particular it would be nice to have incremental checkpoints.

For a first approach, I would suggest the following steps/changes:

- In checkpoint coordination: differentiate between triggering checkpoints 
and savepoints. Introduce properties for checkpoints that describe their set of 
abilities, e.g. "is-key-group-aware", "is-incremental".

- In state handle infrastructure: introduce state handles that reflect 
incremental checkpoints and drop full key-group awareness, i.e. covering 
folders instead of files and not having keygroup_id -> file/offset mapping, but 
keygroup_range -> folder?

- Backend side: We should start with RocksDB by reintroducing something similar 
to semi-async snapshots, but using 
BackupableDBOptions::setShareTableFiles(true) and transferring only new 
incremental outputs to HDFS. Notice that using RocksDB's internal backup 
mechanism is giving up on the information about individual key-groups. But as 
explained above, this should be totally acceptable for checkpoints, while 
savepoints should use the key-group-aware fully async mode. Of course we also 
need to implement the ability to restore from both types of snapshots.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to