Re: difference between checkpoints & savepoints

Stefan Richter Thu, 10 Aug 2017 05:16:29 -0700

Hi,

but I think this is exactly the case for externalized checkpoints. Periodic 
savepoints are problematic because, their lifecycle is meant to be under the 
control of the user and Flink can not make any assumptions when they can be 
dropped. So in the conservative scenario, savepoints would quickly pile up. 
With externalized checkpoints, you can control the number of retained 
checkpoints. if you set this number to one, that should be exactly what you 
want.


As for rescalability, this limitation is more of a future than a current 
problem. Right now, you should be able to rescale from all externalized 
checkpoints. But this might not hold in the future, because you can optimize 
checkpoints in some cases if this is feature dropped.

Right now, externalized checkpoints should offer all that you want.

Best,
Stefan

> Am 10.08.2017 um 11:46 schrieb Henri Heiskanen <henri.heiska...@gmail.com>:
> 
> Hi,
> 
> It would be super helpful if Flink would provide out of the box functionality 
> for writing automatic savepoints and then starting from the latest savepoint. 
> If external checkpoints would support rescaling then 1st requirement is met, 
> but one would still need to e.g. find the latest checkpoint from some folder 
> and pass that as argument. We are currently writing our own functionality for 
> this. Why not just tell Flink that this job uses persistent states and 
> default functionality is then to start from the latest snapshot.
> 
> Br,
> Henri H
> 
> On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter <s.rich...@data-artisans.com 
> <mailto:s.rich...@data-artisans.com>> wrote:
> Hi,
> 
> I would explain the main conceptual difference as follows:
> 
> - Checkpoints are periodically triggered by the system for fault tolerance. 
> They are used to automatically recover from failures. Because of their 
> automatic and periodical nature, they should be lightweight to produce and 
> will restore the same job without any changes to the jobgraph, parallelism, 
> etc. Checkpoints are usually dropped after the job was terminated by the user.
> 
> - Savepoints are triggered by the user to store the state of the job for a 
> manual resume and backup. Savepoints are usually not periodical but typically 
> taken before some user actions to the job or the system. For example, this 
> could be an update of your Flink version, changing your job graph, changing 
> parallelism, forking a second job like for a red/blue deployment, and so on.  
> Of course, savepoints must survive job termination. Conceptually, savepoints 
> can be a bit more expensive to produce, because they should have a format 
> that makes all those „changes to the job“ features possible.
> 
> Besides this conceptual difference, the current implementations are basically 
> using the same code and produce the same „format". However, there is 
> currently one exception from this, but I would expect more differences in the 
> future. This exception are incremental checkpoints with the RocksDB state 
> backend. They are using some RocksDB internal format instead of Flink’s 
> „savepoint format“. This makes them the first instance of a more lightweight 
> checkpointing mechanism, compared to savepoints, at the cost of dropping 
> support for certain features such as changing the parallelism.
> 
> Furthermore, there also exists „externalized checkpoints“, which are 
> somewhere in between checkpoints and savepoints. They are triggered by Flink, 
> but can survive job termination and can then be used by the user to restart 
> the job, similar to savepoints. They use the checkpointing code path, so 
> there are for example externalized incremental checkpoints. However, exactly 
> like a normal checkpoints, they might also lack certain features like 
> rescalability.
> 
> Best,
> Stefan
> 
>> Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli <raja.aravapa...@target.com 
>> <mailto:raja.aravapa...@target.com>>:
>> 
>> Hi,
>>  
>> Can someone please help me understand the difference between Flink's 
>> Checkpoints & Savepoints.
>>  
>> While I read the documentation, couldn't understand the difference! :s
>>  
>>  
>> Thanks a lot. 
>>  
>>  
>>  
>> Regards,
>> Raja.
> 
>

Re: difference between checkpoints & savepoints

Reply via email to