Re: difference between checkpoints & savepoints

Stefan Richter Thu, 10 Aug 2017 01:21:00 -0700

Hi,

I would explain the main conceptual difference as follows:


- Checkpoints are periodically triggered by the system for fault tolerance. 
They are used to automatically recover from failures. Because of their 
automatic and periodical nature, they should be lightweight to produce and will 
restore the same job without any changes to the jobgraph, parallelism, etc. 
Checkpoints are usually dropped after the job was terminated by the user.

- Savepoints are triggered by the user to store the state of the job for a 
manual resume and backup. Savepoints are usually not periodical but typically 
taken before some user actions to the job or the system. For example, this 
could be an update of your Flink version, changing your job graph, changing 
parallelism, forking a second job like for a red/blue deployment, and so on.  
Of course, savepoints must survive job termination. Conceptually, savepoints 
can be a bit more expensive to produce, because they should have a format that 
makes all those „changes to the job“ features possible.

Besides this conceptual difference, the current implementations are basically 
using the same code and produce the same „format". However, there is currently 
one exception from this, but I would expect more differences in the future. 
This exception are incremental checkpoints with the RocksDB state backend. They 
are using some RocksDB internal format instead of Flink’s „savepoint format“. 
This makes them the first instance of a more lightweight checkpointing 
mechanism, compared to savepoints, at the cost of dropping support for certain 
features such as changing the parallelism.

Furthermore, there also exists „externalized checkpoints“, which are somewhere 
in between checkpoints and savepoints. They are triggered by Flink, but can 
survive job termination and can then be used by the user to restart the job, 
similar to savepoints. They use the checkpointing code path, so there are for 
example externalized incremental checkpoints. However, exactly like a normal 
checkpoints, they might also lack certain features like rescalability.

Best,
Stefan

> Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli <raja.aravapa...@target.com>:
> 
> Hi,
>  
> Can someone please help me understand the difference between Flink's 
> Checkpoints & Savepoints.
>  
> While I read the documentation, couldn't understand the difference! :s
>  
>  
> Thanks a lot. 
>  
>  
>  
> Regards,
> Raja.

Re: difference between checkpoints & savepoints

Reply via email to