Re: [EXTERNAL] difference between checkpoints & savepoints

2017-11-30 Thread Hao Sun
Hi team, I have one follow up question on this.

There is a discussion on resuming jobs from *a saved external checkpoint*,
I feel there are two aspects of that topic.
*1. I do not have changes to the job, just want to resume the job from a
failure.*
I can see this automatically happen with ZK enabled. I do not have to
manually do anything.
==
2017-12-01 05:02:26,603 DEBUG
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore -
Recovering job graph f824eabe58d180d79416d9637ac6aa32 from
fraud_prevention_service/flink/jobgraphs/f824eabe58d180d79416d9637ac6aa32.
==

*2. I want to submit a new job and resume the previous process for whatever
reason. e.g. JobGraph changed, need to change parallelism, etc.*
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html#faq
I am wondering for Flink 1.3.2, 1.4 and 1.5, does external checkpoint
identical to savepoint? Does it mean everything in the FAQ section, also
applies to the externalized checkpoint? *How about allowNonRestoredState,
do we have things like this for externalized chkpnt?*

I am running Flink 1.3.2 on K8S, so I am wondering what is the best
practice to do the deployment for new code releases. And Flip6 is awesome,
can't wait to use it.

Thanks as always.


On Wed, Aug 16, 2017 at 5:23 PM Raja.Aravapalli 
wrote:

>
>
> Thanks very much for the detailed explanation Stefan.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *Stefan Richter 
> *Date: *Monday, August 14, 2017 at 7:47 AM
> *To: *Raja Aravapalli 
> *Cc: *"user@flink.apache.org" 
> *Subject: *Re: [EXTERNAL] difference between checkpoints & savepoints
>
>
>
> Just noticed that I forgot to include also a reference to the
> documentation about externalized checkpoints:
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html
> <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html>
>
>
>
> Am 14.08.2017 um 14:17 schrieb Stefan Richter  >:
>
>
>
>
>
> Hi,
>
>
>
>
>
> Also, in the same line, can someone detail the difference between State
> Backend & External checkpoint?
>
>
>
>
>
> Those are two very different things. If we talk about state backends in
> Flink, we mean the entity that is responsible for storing and managing the
> state inside an operator. This could for example be something like the
> FsStateBackend that is based on hash maps and keeps state on the heap, or
> the RocksDBStateBackend which is using RocksDB as a store internally and
> operates on native memory and disk.
>
>
>
> An externalized checkpoint, like a normal checkpoint, is the collection of
> all state in a job persisted to stable storage for recovery. A little more
> concrete, this typically means writing out the contents of the state
> backends to a save place so that we can restore them from there.
>
>
>
> Also, programmatic API, thru which methods we can configure those.
>
>
>
> This explains how to set the backend programatically:
>
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
> <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html>
>
>
>
> To activate externalized checkpoints, you activate normal checkpoints,
> plus the following line:
>
>
>
> env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.*RETAIN_ON_CANCELLATION*);
>
>
>
> where env is your StreamExecutionEnvironment.
>
>
>
> If you need an example, please take a look at the
> org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This
> class configures everything you asked about programatically.
>
>
>
> Best,
>
> Stefan
>
>
>
>
>


Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-16 Thread Raja . Aravapalli

Thanks very much for the detailed explanation Stefan.


Regards,
Raja.

From: Stefan Richter 
Date: Monday, August 14, 2017 at 7:47 AM
To: Raja Aravapalli 
Cc: "user@flink.apache.org" 
Subject: Re: [EXTERNAL] difference between checkpoints & savepoints

Just noticed that I forgot to include also a reference to the documentation 
about externalized checkpoints: 
https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html

Am 14.08.2017 um 14:17 schrieb Stefan Richter 
mailto:s.rich...@data-artisans.com>>:


Hi,



Also, in the same line, can someone detail the difference between State Backend 
& External checkpoint?


Those are two very different things. If we talk about state backends in Flink, 
we mean the entity that is responsible for storing and managing the state 
inside an operator. This could for example be something like the FsStateBackend 
that is based on hash maps and keeps state on the heap, or the 
RocksDBStateBackend which is using RocksDB as a store internally and operates 
on native memory and disk.

An externalized checkpoint, like a normal checkpoint, is the collection of all 
state in a job persisted to stable storage for recovery. A little more 
concrete, this typically means writing out the contents of the state backends 
to a save place so that we can restore them from there.


Also, programmatic API, thru which methods we can configure those.

This explains how to set the backend programatically:

https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html

To activate externalized checkpoints, you activate normal checkpoints, plus the 
following line:


env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

where env is your StreamExecutionEnvironment.

If you need an example, please take a look at the 
org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This class 
configures everything you asked about programatically.

Best,
Stefan




Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter
Just noticed that I forgot to include also a reference to the documentation 
about externalized checkpoints: 
https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html
 


> Am 14.08.2017 um 14:17 schrieb Stefan Richter :
> 
> 
> Hi,
> 
>> 
>> Also, in the same line, can someone detail the difference between State 
>> Backend & External checkpoint?
>>  
> 
> Those are two very different things. If we talk about state backends in 
> Flink, we mean the entity that is responsible for storing and managing the 
> state inside an operator. This could for example be something like the 
> FsStateBackend that is based on hash maps and keeps state on the heap, or the 
> RocksDBStateBackend which is using RocksDB as a store internally and operates 
> on native memory and disk.
> 
> An externalized checkpoint, like a normal checkpoint, is the collection of 
> all state in a job persisted to stable storage for recovery. A little more 
> concrete, this typically means writing out the contents of the state backends 
> to a save place so that we can restore them from there.
> 
>> Also, programmatic API, thru which methods we can configure those.
> 
> This explains how to set the backend programatically:
> 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
>  
> 
> 
> To activate externalized checkpoints, you activate normal checkpoints, plus 
> the following line:
> 
> env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
> 
> where env is your StreamExecutionEnvironment.
> 
> If you need an example, please take a look at the 
> org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This class 
> configures everything you asked about programatically.
> 
> Best,
> Stefan
> 



Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter

Hi,

> 
> Also, in the same line, can someone detail the difference between State 
> Backend & External checkpoint?
>  

Those are two very different things. If we talk about state backends in Flink, 
we mean the entity that is responsible for storing and managing the state 
inside an operator. This could for example be something like the FsStateBackend 
that is based on hash maps and keeps state on the heap, or the 
RocksDBStateBackend which is using RocksDB as a store internally and operates 
on native memory and disk.

An externalized checkpoint, like a normal checkpoint, is the collection of all 
state in a job persisted to stable storage for recovery. A little more 
concrete, this typically means writing out the contents of the state backends 
to a save place so that we can restore them from there.

> Also, programmatic API, thru which methods we can configure those.

This explains how to set the backend programatically:

https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
 


To activate externalized checkpoints, you activate normal checkpoints, plus the 
following line:

env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

where env is your StreamExecutionEnvironment.

If you need an example, please take a look at the 
org.apache.flink.test.checkpointing.ExternalizedCheckpointITCase. This class 
configures everything you asked about programatically.

Best,
Stefan



Re: [EXTERNAL] Re: difference between checkpoints & savepoints

2017-08-11 Thread Raja . Aravapalli
Thanks for the discussion. That answered many questions I have.

Also, in the same line, can someone detail the difference between State Backend 
& External checkpoint?

Also, programmatic API, thru which methods we can configure those.



Regards,
Raja.

From: Stefan Richter 
Date: Thursday, August 10, 2017 at 11:38 AM
To: Henri Heiskanen 
Cc: "user@flink.apache.org" 
Subject: [EXTERNAL] Re: difference between checkpoints & savepoints

I most of the things you are asking for are already there: you can configure 
checkpoint interval + externalized cp, the backend, and the location for 
savepoints and externalized checkpoints. You can restart from savepoints and 
externalized checkpoints from the CLI. One point I am not entirely sure about 
are automatic CP or SP when a job is shut down. IIRC, this is either already 
available, or in the making.

Resolving the last external checkpoint is as easy as listing the configured 
directory, especially if you only retain the last one. Otherwise the timestamp 
gives the required information. It is true that there could also be an CLI 
option to automatically does the work to pick the latest.

And there is a command line parameter switch to supply savepoints and 
externalized checkpoints for restarts. I think that makes more sense than a 
general configuration of automatic restart behaviour because the user might 
also intend to start a new, clean run for the job.


Am 10.08.2017 um 15:45 schrieb Henri Heiskanen 
mailto:henri.heiska...@gmail.com>>:

Hi,

But I still need to resolve the latest checkpoint and pass that as an argument. 
My question still is that why all this can not be handled by Flink core? Why 
not just have parameters enable savepoints, location of savepoints and state 
backend and system would then automatically do checkpoints / savepoints on exit 
and also start from the first available checkpoint?

Br,
Henkka

On Thu, Aug 10, 2017 at 3:15 PM, Stefan Richter 
mailto:s.rich...@data-artisans.com>> wrote:
Hi,

but I think this is exactly the case for externalized checkpoints. Periodic 
savepoints are problematic because, their lifecycle is meant to be under the 
control of the user and Flink can not make any assumptions when they can be 
dropped. So in the conservative scenario, savepoints would quickly pile up. 
With externalized checkpoints, you can control the number of retained 
checkpoints. if you set this number to one, that should be exactly what you 
want.

As for rescalability, this limitation is more of a future than a current 
problem. Right now, you should be able to rescale from all externalized 
checkpoints. But this might not hold in the future, because you can optimize 
checkpoints in some cases if this is feature dropped.

Right now, externalized checkpoints should offer all that you want.

Best,
Stefan

Am 10.08.2017 um 11:46 schrieb Henri Heiskanen 
mailto:henri.heiska...@gmail.com>>:

Hi,

It would be super helpful if Flink would provide out of the box functionality 
for writing automatic savepoints and then starting from the latest savepoint. 
If external checkpoints would support rescaling then 1st requirement is met, 
but one would still need to e.g. find the latest checkpoint from some folder 
and pass that as argument. We are currently writing our own functionality for 
this. Why not just tell Flink that this job uses persistent states and default 
functionality is then to start from the latest snapshot.

Br,
Henri H

On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter 
mailto:s.rich...@data-artisans.com>> wrote:
Hi,

I would explain the main conceptual difference as follows:

- Checkpoints are periodically triggered by the system for fault tolerance. 
They are used to automatically recover from failures. Because of their 
automatic and periodical nature, they should be lightweight to produce and will 
restore the same job without any changes to the jobgraph, parallelism, etc. 
Checkpoints are usually dropped after the job was terminated by the user.

- Savepoints are triggered by the user to store the state of the job for a 
manual resume and backup. Savepoints are usually not periodical but typically 
taken before some user actions to the job or the system. For example, this 
could be an update of your Flink version, changing your job graph, changing 
parallelism, forking a second job like for a red/blue deployment, and so on.  
Of course, savepoints must survive job termination. Conceptually, savepoints 
can be a bit more expensive to produce, because they should have a format that 
makes all those „changes to the job“ features possible.

Besides this conceptual difference, the current implementations are basically 
using the same code and produce the same „format". However, there is currently 
one exception from this, but I would expect more differences in the future. 
This exception are incremental checkpoints with the RocksDB state backend. They 
are using some RocksDB inter

Re: difference between checkpoints & savepoints

2017-08-10 Thread Stefan Richter
I most of the things you are asking for are already there: you can configure 
checkpoint interval + externalized cp, the backend, and the location for 
savepoints and externalized checkpoints. You can restart from savepoints and 
externalized checkpoints from the CLI. One point I am not entirely sure about 
are automatic CP or SP when a job is shut down. IIRC, this is either already 
available, or in the making.

Resolving the last external checkpoint is as easy as listing the configured 
directory, especially if you only retain the last one. Otherwise the timestamp 
gives the required information. It is true that there could also be an CLI 
option to automatically does the work to pick the latest.

And there is a command line parameter switch to supply savepoints and 
externalized checkpoints for restarts. I think that makes more sense than a 
general configuration of automatic restart behaviour because the user might 
also intend to start a new, clean run for the job.

 
> Am 10.08.2017 um 15:45 schrieb Henri Heiskanen :
> 
> Hi,
> 
> But I still need to resolve the latest checkpoint and pass that as an 
> argument. My question still is that why all this can not be handled by Flink 
> core? Why not just have parameters enable savepoints, location of savepoints 
> and state backend and system would then automatically do checkpoints / 
> savepoints on exit and also start from the first available checkpoint?
> 
> Br,
> Henkka
> 
> On Thu, Aug 10, 2017 at 3:15 PM, Stefan Richter  > wrote:
> Hi,
> 
> but I think this is exactly the case for externalized checkpoints. Periodic 
> savepoints are problematic because, their lifecycle is meant to be under the 
> control of the user and Flink can not make any assumptions when they can be 
> dropped. So in the conservative scenario, savepoints would quickly pile up. 
> With externalized checkpoints, you can control the number of retained 
> checkpoints. if you set this number to one, that should be exactly what you 
> want.
> 
> As for rescalability, this limitation is more of a future than a current 
> problem. Right now, you should be able to rescale from all externalized 
> checkpoints. But this might not hold in the future, because you can optimize 
> checkpoints in some cases if this is feature dropped.
> 
> Right now, externalized checkpoints should offer all that you want.
> 
> Best,
> Stefan
> 
>> Am 10.08.2017 um 11:46 schrieb Henri Heiskanen > >:
>> 
>> Hi,
>> 
>> It would be super helpful if Flink would provide out of the box 
>> functionality for writing automatic savepoints and then starting from the 
>> latest savepoint. If external checkpoints would support rescaling then 1st 
>> requirement is met, but one would still need to e.g. find the latest 
>> checkpoint from some folder and pass that as argument. We are currently 
>> writing our own functionality for this. Why not just tell Flink that this 
>> job uses persistent states and default functionality is then to start from 
>> the latest snapshot.
>> 
>> Br,
>> Henri H
>> 
>> On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter 
>> mailto:s.rich...@data-artisans.com>> wrote:
>> Hi,
>> 
>> I would explain the main conceptual difference as follows:
>> 
>> - Checkpoints are periodically triggered by the system for fault tolerance. 
>> They are used to automatically recover from failures. Because of their 
>> automatic and periodical nature, they should be lightweight to produce and 
>> will restore the same job without any changes to the jobgraph, parallelism, 
>> etc. Checkpoints are usually dropped after the job was terminated by the 
>> user.
>> 
>> - Savepoints are triggered by the user to store the state of the job for a 
>> manual resume and backup. Savepoints are usually not periodical but 
>> typically taken before some user actions to the job or the system. For 
>> example, this could be an update of your Flink version, changing your job 
>> graph, changing parallelism, forking a second job like for a red/blue 
>> deployment, and so on.  Of course, savepoints must survive job termination. 
>> Conceptually, savepoints can be a bit more expensive to produce, because 
>> they should have a format that makes all those „changes to the job“ features 
>> possible.
>> 
>> Besides this conceptual difference, the current implementations are 
>> basically using the same code and produce the same „format". However, there 
>> is currently one exception from this, but I would expect more differences in 
>> the future. This exception are incremental checkpoints with the RocksDB 
>> state backend. They are using some RocksDB internal format instead of 
>> Flink’s „savepoint format“. This makes them the first instance of a more 
>> lightweight checkpointing mechanism, compared to savepoints, at the cost of 
>> dropping support for certain features such as changing the parallelism.
>> 
>> Furthermore, there also exists „externalized checkpoints“,

Re: difference between checkpoints & savepoints

2017-08-10 Thread Henri Heiskanen
Hi,

But I still need to resolve the latest checkpoint and pass that as an
argument. My question still is that why all this can not be handled by
Flink core? Why not just have parameters enable savepoints, location of
savepoints and state backend and system would then automatically do
checkpoints / savepoints on exit and also start from the first available
checkpoint?

Br,
Henkka

On Thu, Aug 10, 2017 at 3:15 PM, Stefan Richter  wrote:

> Hi,
>
> but I think this is exactly the case for externalized checkpoints.
> Periodic savepoints are problematic because, their lifecycle is meant to be
> under the control of the user and Flink can not make any assumptions when
> they can be dropped. So in the conservative scenario, savepoints would
> quickly pile up. With externalized checkpoints, you can control the number
> of retained checkpoints. if you set this number to one, that should be
> exactly what you want.
>
> As for rescalability, this limitation is more of a future than a current
> problem. Right now, you should be able to rescale from all externalized
> checkpoints. But this might not hold in the future, because you can
> optimize checkpoints in some cases if this is feature dropped.
>
> Right now, externalized checkpoints should offer all that you want.
>
> Best,
> Stefan
>
> Am 10.08.2017 um 11:46 schrieb Henri Heiskanen  >:
>
> Hi,
>
> It would be super helpful if Flink would provide out of the box
> functionality for writing automatic savepoints and then starting from the
> latest savepoint. If external checkpoints would support rescaling then 1st
> requirement is met, but one would still need to e.g. find the latest
> checkpoint from some folder and pass that as argument. We are currently
> writing our own functionality for this. Why not just tell Flink that this
> job uses persistent states and default functionality is then to start from
> the latest snapshot.
>
> Br,
> Henri H
>
> On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter <
> s.rich...@data-artisans.com> wrote:
>
>> Hi,
>>
>> I would explain the main conceptual difference as follows:
>>
>> - Checkpoints are periodically triggered by the system for fault
>> tolerance. They are used to automatically recover from failures. Because of
>> their automatic and periodical nature, they should be lightweight to
>> produce and will restore the same job without any changes to the jobgraph,
>> parallelism, etc. Checkpoints are usually dropped after the job was
>> terminated by the user.
>>
>> - Savepoints are triggered by the user to store the state of the job for
>> a manual resume and backup. Savepoints are usually not periodical but
>> typically taken before some user actions to the job or the system. For
>> example, this could be an update of your Flink version, changing your job
>> graph, changing parallelism, forking a second job like for a red/blue
>> deployment, and so on.  Of course, savepoints must survive job termination.
>> Conceptually, savepoints can be a bit more expensive to produce, because
>> they should have a format that makes all those „changes to the job“
>> features possible.
>>
>> Besides this conceptual difference, the current implementations are
>> basically using the same code and produce the same „format". However, there
>> is currently one exception from this, but I would expect more differences
>> in the future. This exception are incremental checkpoints with the RocksDB
>> state backend. They are using some RocksDB internal format instead of
>> Flink’s „savepoint format“. This makes them the first instance of a more
>> lightweight checkpointing mechanism, compared to savepoints, at the cost of
>> dropping support for certain features such as changing the parallelism.
>>
>> Furthermore, there also exists „externalized checkpoints“, which are
>> somewhere in between checkpoints and savepoints. They are triggered by
>> Flink, but can survive job termination and can then be used by the user to
>> restart the job, similar to savepoints. They use the checkpointing code
>> path, so there are for example externalized incremental checkpoints.
>> However, exactly like a normal checkpoints, they might also lack certain
>> features like rescalability.
>>
>> Best,
>> Stefan
>>
>> Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli <
>> raja.aravapa...@target.com>:
>>
>> Hi,
>>
>> Can someone please help me understand the difference between Flink's
>> Checkpoints & Savepoints.
>>
>> While I read the documentation, couldn't understand the difference! :s
>>
>>
>> Thanks a lot.
>>
>>
>>
>> Regards,
>> Raja.
>>
>>
>>
>
>


Re: difference between checkpoints & savepoints

2017-08-10 Thread Stefan Richter
Hi,

but I think this is exactly the case for externalized checkpoints. Periodic 
savepoints are problematic because, their lifecycle is meant to be under the 
control of the user and Flink can not make any assumptions when they can be 
dropped. So in the conservative scenario, savepoints would quickly pile up. 
With externalized checkpoints, you can control the number of retained 
checkpoints. if you set this number to one, that should be exactly what you 
want.

As for rescalability, this limitation is more of a future than a current 
problem. Right now, you should be able to rescale from all externalized 
checkpoints. But this might not hold in the future, because you can optimize 
checkpoints in some cases if this is feature dropped.

Right now, externalized checkpoints should offer all that you want.

Best,
Stefan

> Am 10.08.2017 um 11:46 schrieb Henri Heiskanen :
> 
> Hi,
> 
> It would be super helpful if Flink would provide out of the box functionality 
> for writing automatic savepoints and then starting from the latest savepoint. 
> If external checkpoints would support rescaling then 1st requirement is met, 
> but one would still need to e.g. find the latest checkpoint from some folder 
> and pass that as argument. We are currently writing our own functionality for 
> this. Why not just tell Flink that this job uses persistent states and 
> default functionality is then to start from the latest snapshot.
> 
> Br,
> Henri H
> 
> On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter  > wrote:
> Hi,
> 
> I would explain the main conceptual difference as follows:
> 
> - Checkpoints are periodically triggered by the system for fault tolerance. 
> They are used to automatically recover from failures. Because of their 
> automatic and periodical nature, they should be lightweight to produce and 
> will restore the same job without any changes to the jobgraph, parallelism, 
> etc. Checkpoints are usually dropped after the job was terminated by the user.
> 
> - Savepoints are triggered by the user to store the state of the job for a 
> manual resume and backup. Savepoints are usually not periodical but typically 
> taken before some user actions to the job or the system. For example, this 
> could be an update of your Flink version, changing your job graph, changing 
> parallelism, forking a second job like for a red/blue deployment, and so on.  
> Of course, savepoints must survive job termination. Conceptually, savepoints 
> can be a bit more expensive to produce, because they should have a format 
> that makes all those „changes to the job“ features possible.
> 
> Besides this conceptual difference, the current implementations are basically 
> using the same code and produce the same „format". However, there is 
> currently one exception from this, but I would expect more differences in the 
> future. This exception are incremental checkpoints with the RocksDB state 
> backend. They are using some RocksDB internal format instead of Flink’s 
> „savepoint format“. This makes them the first instance of a more lightweight 
> checkpointing mechanism, compared to savepoints, at the cost of dropping 
> support for certain features such as changing the parallelism.
> 
> Furthermore, there also exists „externalized checkpoints“, which are 
> somewhere in between checkpoints and savepoints. They are triggered by Flink, 
> but can survive job termination and can then be used by the user to restart 
> the job, similar to savepoints. They use the checkpointing code path, so 
> there are for example externalized incremental checkpoints. However, exactly 
> like a normal checkpoints, they might also lack certain features like 
> rescalability.
> 
> Best,
> Stefan
> 
>> Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli > >:
>> 
>> Hi,
>>  
>> Can someone please help me understand the difference between Flink's 
>> Checkpoints & Savepoints.
>>  
>> While I read the documentation, couldn't understand the difference! :s
>>  
>>  
>> Thanks a lot. 
>>  
>>  
>>  
>> Regards,
>> Raja.
> 
> 



Re: difference between checkpoints & savepoints

2017-08-10 Thread Henri Heiskanen
Hi,

It would be super helpful if Flink would provide out of the box
functionality for writing automatic savepoints and then starting from the
latest savepoint. If external checkpoints would support rescaling then 1st
requirement is met, but one would still need to e.g. find the latest
checkpoint from some folder and pass that as argument. We are currently
writing our own functionality for this. Why not just tell Flink that this
job uses persistent states and default functionality is then to start from
the latest snapshot.

Br,
Henri H

On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter <
s.rich...@data-artisans.com> wrote:

> Hi,
>
> I would explain the main conceptual difference as follows:
>
> - Checkpoints are periodically triggered by the system for fault
> tolerance. They are used to automatically recover from failures. Because of
> their automatic and periodical nature, they should be lightweight to
> produce and will restore the same job without any changes to the jobgraph,
> parallelism, etc. Checkpoints are usually dropped after the job was
> terminated by the user.
>
> - Savepoints are triggered by the user to store the state of the job for a
> manual resume and backup. Savepoints are usually not periodical but
> typically taken before some user actions to the job or the system. For
> example, this could be an update of your Flink version, changing your job
> graph, changing parallelism, forking a second job like for a red/blue
> deployment, and so on.  Of course, savepoints must survive job termination.
> Conceptually, savepoints can be a bit more expensive to produce, because
> they should have a format that makes all those „changes to the job“
> features possible.
>
> Besides this conceptual difference, the current implementations are
> basically using the same code and produce the same „format". However, there
> is currently one exception from this, but I would expect more differences
> in the future. This exception are incremental checkpoints with the RocksDB
> state backend. They are using some RocksDB internal format instead of
> Flink’s „savepoint format“. This makes them the first instance of a more
> lightweight checkpointing mechanism, compared to savepoints, at the cost of
> dropping support for certain features such as changing the parallelism.
>
> Furthermore, there also exists „externalized checkpoints“, which are
> somewhere in between checkpoints and savepoints. They are triggered by
> Flink, but can survive job termination and can then be used by the user to
> restart the job, similar to savepoints. They use the checkpointing code
> path, so there are for example externalized incremental checkpoints.
> However, exactly like a normal checkpoints, they might also lack certain
> features like rescalability.
>
> Best,
> Stefan
>
> Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli  >:
>
> Hi,
>
> Can someone please help me understand the difference between Flink's
> Checkpoints & Savepoints.
>
> While I read the documentation, couldn't understand the difference! :s
>
>
> Thanks a lot.
>
>
>
> Regards,
> Raja.
>
>
>


Re: difference between checkpoints & savepoints

2017-08-10 Thread Stefan Richter
Hi,

I would explain the main conceptual difference as follows:

- Checkpoints are periodically triggered by the system for fault tolerance. 
They are used to automatically recover from failures. Because of their 
automatic and periodical nature, they should be lightweight to produce and will 
restore the same job without any changes to the jobgraph, parallelism, etc. 
Checkpoints are usually dropped after the job was terminated by the user.

- Savepoints are triggered by the user to store the state of the job for a 
manual resume and backup. Savepoints are usually not periodical but typically 
taken before some user actions to the job or the system. For example, this 
could be an update of your Flink version, changing your job graph, changing 
parallelism, forking a second job like for a red/blue deployment, and so on.  
Of course, savepoints must survive job termination. Conceptually, savepoints 
can be a bit more expensive to produce, because they should have a format that 
makes all those „changes to the job“ features possible.

Besides this conceptual difference, the current implementations are basically 
using the same code and produce the same „format". However, there is currently 
one exception from this, but I would expect more differences in the future. 
This exception are incremental checkpoints with the RocksDB state backend. They 
are using some RocksDB internal format instead of Flink’s „savepoint format“. 
This makes them the first instance of a more lightweight checkpointing 
mechanism, compared to savepoints, at the cost of dropping support for certain 
features such as changing the parallelism.

Furthermore, there also exists „externalized checkpoints“, which are somewhere 
in between checkpoints and savepoints. They are triggered by Flink, but can 
survive job termination and can then be used by the user to restart the job, 
similar to savepoints. They use the checkpointing code path, so there are for 
example externalized incremental checkpoints. However, exactly like a normal 
checkpoints, they might also lack certain features like rescalability.

Best,
Stefan

> Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli :
> 
> Hi,
>  
> Can someone please help me understand the difference between Flink's 
> Checkpoints & Savepoints.
>  
> While I read the documentation, couldn't understand the difference! :s
>  
>  
> Thanks a lot. 
>  
>  
>  
> Regards,
> Raja.



difference between checkpoints & savepoints

2017-08-09 Thread Raja . Aravapalli
Hi,

Can someone please help me understand the difference between Flink's 
Checkpoints & Savepoints.

While I read the documentation, couldn't understand the difference! :s


Thanks a lot.



Regards,
Raja.