Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-21 Thread David Morávek
Tue, Jan 18, 2022 at 3:51 PM Piotr Nowojski 
>> > wrote:
>> >
>> > > Hi All,
>> > >
>> > > Yu, sorry I was somehow confused about the configuration. I've changed
>> > > the FLIP as you sugested.
>> > >
>> > > Yu/Yun Tang:
>> > >
>> > > Regarding the RocksDB incompatibility, I think we can claim that Flink
>> > > version upgrades are/will be supported. If ever we need to break the
>> > > backward compatibility via bumping RocksDB version in a such way, that
>> > > RocksDB won't be able to provide that compatibility, we will need to
>> make
>> > > this a prominent notice in the release notes.
>> > >
>> > > > I have used the State Processor API with aligned, full checkpoints.
>> > There
>> > > it has worked just fine.
>> > >
>> > > Thanks for this information.
>> > >
>> > > > 0) What exactly does the "State Processor API" row mean? Is it: Can
>> it
>> > be
>> > > > read by the State Processor API? Can it be written by the State
>> > Processor
>> > > > API? Both? Something else?
>> > >
>> > > Good question. I'm not sure how State Processor API is working. Can
>> > someone
>> > > help answer what we guarantee/support right now and what we can
>> > > reasonably support?
>> > >
>> > > > 1) and 2)
>> > >
>> > > I guess you are simply in favour of the 2nd proposal? So
>> > >
>> > > * rescaling
>> > > * Job upgrade w/o changing graph shape and record types
>> > > * Flink bug/patch (1.14.x → 1.14.y) version upgrade
>> > >
>> > > + Flink minor (1.x → 1.y) version upgrade
>> > >
>> > > which I think is important for the native savepoint to be truly
>> > savepoints.
>> > >
>> > > > 3) Should "Job upgrade w/o changing graph shape and record types" be
>> > > split? I guess "record types" is only relevant for unaligned
>> checkpoints.
>> > >
>> > > Shape of the job graph is also an issue with unaligned checkpoints.
>> > > Changing record types/serialisation causes obvious problems with the
>> > > in-flight records, but if you change job graph via for example
>> changing
>> > > type of the network connection (like random -> broadcast, keyed -> non
>> > > keyed), or remove some operators, we also have problems with the
>> > in-flight
>> > > records in the affected connections.
>> > >
>> > > > 4)
>> > >
>> > > I think configuration change should be always supported. If you think
>> > > that's important, I can add this to the documentation/FLIP proposal
>> as a
>> > > separate row.
>> > >
>> > > > 5) Do the guarantees that a Savepoint/Checkpoint provide change when
>> > > > generalized incremental checkpoints [1] are enabled? My
>> understanding
>> > is:
>> > > > No, the same guarantees apply.
>> > >
>> > > This will be more tricky and it will highly depend on the FLIP-158
>> > > implementation.
>> > >
>> > > Yun:
>> > >
>> > > >   1.  From my understanding, native savepoint appears much closer to
>> > > current alignment checkpoint. What's their difference?
>> > >
>> > > Technically there would be no difference, but we might decide to limit
>> > what
>> > > we officially support, to allow us easier changes in the future. Just
>> as
>> > > for the most part so far between savepoint and checkpoints there was
>> very
>> > > little difference. For us, as developers, the fewer things we
>> officially
>> > > support and claim are stable, the better.
>> > >
>> > > > 2.  If self-contained and relocatable are the most important
>> > difference,
>> > > why not include them in the proposal table?
>> > >
>> > > Good point. I will add this.
>> > >
>> > > >  What does "Job full upgrade" means?
>> > >
>> > > I have clarified it to:
>> > > > Arbitrary job upgrade (changed graph shape/record types)
>> > >
>> > > It's an arbitrary job change. Anything that doesn't fall into the
>> second
>> > > c

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-19 Thread David Morávek
t; > There
> > > it has worked just fine.
> > >
> > > Thanks for this information.
> > >
> > > > 0) What exactly does the "State Processor API" row mean? Is it: Can
> it
> > be
> > > > read by the State Processor API? Can it be written by the State
> > Processor
> > > > API? Both? Something else?
> > >
> > > Good question. I'm not sure how State Processor API is working. Can
> > someone
> > > help answer what we guarantee/support right now and what we can
> > > reasonably support?
> > >
> > > > 1) and 2)
> > >
> > > I guess you are simply in favour of the 2nd proposal? So
> > >
> > > * rescaling
> > > * Job upgrade w/o changing graph shape and record types
> > > * Flink bug/patch (1.14.x → 1.14.y) version upgrade
> > >
> > > + Flink minor (1.x → 1.y) version upgrade
> > >
> > > which I think is important for the native savepoint to be truly
> > savepoints.
> > >
> > > > 3) Should "Job upgrade w/o changing graph shape and record types" be
> > > split? I guess "record types" is only relevant for unaligned
> checkpoints.
> > >
> > > Shape of the job graph is also an issue with unaligned checkpoints.
> > > Changing record types/serialisation causes obvious problems with the
> > > in-flight records, but if you change job graph via for example changing
> > > type of the network connection (like random -> broadcast, keyed -> non
> > > keyed), or remove some operators, we also have problems with the
> > in-flight
> > > records in the affected connections.
> > >
> > > > 4)
> > >
> > > I think configuration change should be always supported. If you think
> > > that's important, I can add this to the documentation/FLIP proposal as
> a
> > > separate row.
> > >
> > > > 5) Do the guarantees that a Savepoint/Checkpoint provide change when
> > > > generalized incremental checkpoints [1] are enabled? My understanding
> > is:
> > > > No, the same guarantees apply.
> > >
> > > This will be more tricky and it will highly depend on the FLIP-158
> > > implementation.
> > >
> > > Yun:
> > >
> > > >   1.  From my understanding, native savepoint appears much closer to
> > > current alignment checkpoint. What's their difference?
> > >
> > > Technically there would be no difference, but we might decide to limit
> > what
> > > we officially support, to allow us easier changes in the future. Just
> as
> > > for the most part so far between savepoint and checkpoints there was
> very
> > > little difference. For us, as developers, the fewer things we
> officially
> > > support and claim are stable, the better.
> > >
> > > > 2.  If self-contained and relocatable are the most important
> > difference,
> > > why not include them in the proposal table?
> > >
> > > Good point. I will add this.
> > >
> > > >  What does "Job full upgrade" means?
> > >
> > > I have clarified it to:
> > > > Arbitrary job upgrade (changed graph shape/record types)
> > >
> > > It's an arbitrary job change. Anything that doesn't fall into the
> second
> > > category "Job upgrade w/o changing graph shape and record types"
> > >
> > > Best,
> > > Piotrek
> > >
> > > pon., 17 sty 2022 o 11:25 Yun Tang  napisał(a):
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for Piotr to drive this topic.
> > > >
> > > > I have several questions on this FLIP.
> > > >
> > > >   1.  From my understanding, native savepoint appears much closer to
> > > > current alignment checkpoint. What's their difference?
> > > >   2.  If self-contained and relocatable are the most important
> > > difference,
> > > > why not include them in the proposal table?
> > > >
> > > >   1.  What does "Job full upgrade" means?
> > > >
> > > > For the question of RocksDB upgrading, this depends on the backwards
> > > > compatibility [1], and it proves to be very well as the documentation
> > > said.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/facebook/rocksdb/wiki/Roc

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-19 Thread Piotr Nowojski
 record types/serialisation causes obvious problems with the
> > in-flight records, but if you change job graph via for example changing
> > type of the network connection (like random -> broadcast, keyed -> non
> > keyed), or remove some operators, we also have problems with the
> in-flight
> > records in the affected connections.
> >
> > > 4)
> >
> > I think configuration change should be always supported. If you think
> > that's important, I can add this to the documentation/FLIP proposal as a
> > separate row.
> >
> > > 5) Do the guarantees that a Savepoint/Checkpoint provide change when
> > > generalized incremental checkpoints [1] are enabled? My understanding
> is:
> > > No, the same guarantees apply.
> >
> > This will be more tricky and it will highly depend on the FLIP-158
> > implementation.
> >
> > Yun:
> >
> > >   1.  From my understanding, native savepoint appears much closer to
> > current alignment checkpoint. What's their difference?
> >
> > Technically there would be no difference, but we might decide to limit
> what
> > we officially support, to allow us easier changes in the future. Just as
> > for the most part so far between savepoint and checkpoints there was very
> > little difference. For us, as developers, the fewer things we officially
> > support and claim are stable, the better.
> >
> > > 2.  If self-contained and relocatable are the most important
> difference,
> > why not include them in the proposal table?
> >
> > Good point. I will add this.
> >
> > >  What does "Job full upgrade" means?
> >
> > I have clarified it to:
> > > Arbitrary job upgrade (changed graph shape/record types)
> >
> > It's an arbitrary job change. Anything that doesn't fall into the second
> > category "Job upgrade w/o changing graph shape and record types"
> >
> > Best,
> > Piotrek
> >
> > pon., 17 sty 2022 o 11:25 Yun Tang  napisał(a):
> >
> > > Hi everyone,
> > >
> > > Thanks for Piotr to drive this topic.
> > >
> > > I have several questions on this FLIP.
> > >
> > >   1.  From my understanding, native savepoint appears much closer to
> > > current alignment checkpoint. What's their difference?
> > >   2.  If self-contained and relocatable are the most important
> > difference,
> > > why not include them in the proposal table?
> > >
> > >   1.  What does "Job full upgrade" means?
> > >
> > > For the question of RocksDB upgrading, this depends on the backwards
> > > compatibility [1], and it proves to be very well as the documentation
> > said.
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/facebook/rocksdb/wiki/RocksDB-Compatibility-Between-Different-Releases
> > >
> > > Best,
> > > Yun Tang
> > >
> > >
> > >
> > > 
> > > From: Konstantin Knauf 
> > > Sent: Friday, January 14, 2022 20:39
> > > To: dev ; Seth Wiesman ;
> Nico
> > > Kruber ; dander...@apache.org <
> dander...@apache.org>
> > > Subject: Re: [DISCUSS] FLIP-203: Incremental savepoints
> > >
> > > Hi everyone,
> > >
> > > Thank you, Piotr. Please find my thoughts on the topic below:
> > >
> > > 0) What exactly does the "State Processor API" row mean? Is it: Can it
> be
> > > read by the State Processor API? Can it be written by the State
> Processor
> > > API? Both? Something else?
> > >
> > > 1) If we take the assumption from FLIP-193 "that ownership should be
> the
> > > only difference between Checkpoints and Savepoints.", we would need to
> > work
> > > in the direction of "Proposal 2". The distinction would then be the
> > > following:
> > >
> > > * Canonical Savepoint = Guarantees A
> > > * Canonical Checkpoint = Guarantees A (in theory; does not exist)
> > > * Aligned, Native Checkpoint = Guarantees B
> > > * Aligned, Native Savepoint = Guarantees B
> > > * Unaligned, Native Checkpoint = Guarantees C
> > > * Unaligned, Native Savepoint = Guarantees C (if this would exist in
> the
> > > future)
> > >
> > > I think it is important to make this matrix not too complicated like:
> > there
> > > are 8 different sets of guarantees depending on all kinds of more or
> less
> > &

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-18 Thread David Morávek
ypes)
>
> It's an arbitrary job change. Anything that doesn't fall into the second
> category "Job upgrade w/o changing graph shape and record types"
>
> Best,
> Piotrek
>
> pon., 17 sty 2022 o 11:25 Yun Tang  napisał(a):
>
> > Hi everyone,
> >
> > Thanks for Piotr to drive this topic.
> >
> > I have several questions on this FLIP.
> >
> >   1.  From my understanding, native savepoint appears much closer to
> > current alignment checkpoint. What's their difference?
> >   2.  If self-contained and relocatable are the most important
> difference,
> > why not include them in the proposal table?
> >
> >   1.  What does "Job full upgrade" means?
> >
> > For the question of RocksDB upgrading, this depends on the backwards
> > compatibility [1], and it proves to be very well as the documentation
> said.
> >
> >
> > [1]
> >
> https://github.com/facebook/rocksdb/wiki/RocksDB-Compatibility-Between-Different-Releases
> >
> > Best,
> > Yun Tang
> >
> >
> >
> > 
> > From: Konstantin Knauf 
> > Sent: Friday, January 14, 2022 20:39
> > To: dev ; Seth Wiesman ; Nico
> > Kruber ; dander...@apache.org 
> > Subject: Re: [DISCUSS] FLIP-203: Incremental savepoints
> >
> > Hi everyone,
> >
> > Thank you, Piotr. Please find my thoughts on the topic below:
> >
> > 0) What exactly does the "State Processor API" row mean? Is it: Can it be
> > read by the State Processor API? Can it be written by the State Processor
> > API? Both? Something else?
> >
> > 1) If we take the assumption from FLIP-193 "that ownership should be the
> > only difference between Checkpoints and Savepoints.", we would need to
> work
> > in the direction of "Proposal 2". The distinction would then be the
> > following:
> >
> > * Canonical Savepoint = Guarantees A
> > * Canonical Checkpoint = Guarantees A (in theory; does not exist)
> > * Aligned, Native Checkpoint = Guarantees B
> > * Aligned, Native Savepoint = Guarantees B
> > * Unaligned, Native Checkpoint = Guarantees C
> > * Unaligned, Native Savepoint = Guarantees C (if this would exist in the
> > future)
> >
> > I think it is important to make this matrix not too complicated like:
> there
> > are 8 different sets of guarantees depending on all kinds of more or less
> > well-known configuration options.
> >
> > 2) With respect to the concrete guarantees, I believe, it's important
> that
> > we can cover all important use cases in "green", so that users can rely
> on
> > official, tested behavior in regular operations. In my experience this
> > includes manual recovery of a Job from a retained checkpoint. I would
> argue
> > that most users operating a long-running, stateful Apache Flink
> application
> > have been in the situation, where a graceful "stop" was not possible
> > anymore, because the Job was unable to take a Savepoint. This could be,
> > because the Job is frequently restarting (e.g. poison pill) or because it
> > fails on taking the Savepoint itself for some reason (e.g. unable to
> commit
> > a transaction to an external system). The solution strategy in this
> > scenario is to cancel the job, make some changes to the Job or
> > configuration that fix the problem and restore from the last successful
> > (retained) checkpoint. I think the following changes would need to be
> > officially supported for Native Checkpoints/Savepoint (Guarantees B,
> > ideally also Guarantees C), in order to fix a Job in most of these cases.
> >
> > * rescaling
> > * Job upgrade w/o changing graph shape and record types
> > * Flink bug/patch (1.14.x → 1.14.y) version upgrade
> >
> > I would be very interested to hear from users as well as people like
> Seth,
> > Nico or David (cc), who work with many users, what  in their experience
> > would be needed here.
> >
> > 3) Should "Job upgrade w/o changing graph shape and record types" be
> split?
> > I guess "record types" is only relevant for unaligned checkpoints.
> >
> > 4) Does it make sense to consider Flink configuration changes besides the
> > statebackend type as another row? Maybe split by "pipeline.*" options,
> > "execution.*" options, and whichever other categories would make sense.
> > Just to give a few examples: it should be *officially* supported to take
> a
> > native retained checkpoint and restart a the Job with a

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-18 Thread Piotr Nowojski
Hi All,

Yu, sorry I was somehow confused about the configuration. I've changed
the FLIP as you sugested.

Yu/Yun Tang:

Regarding the RocksDB incompatibility, I think we can claim that Flink
version upgrades are/will be supported. If ever we need to break the
backward compatibility via bumping RocksDB version in a such way, that
RocksDB won't be able to provide that compatibility, we will need to make
this a prominent notice in the release notes.

> I have used the State Processor API with aligned, full checkpoints. There
it has worked just fine.

Thanks for this information.

> 0) What exactly does the "State Processor API" row mean? Is it: Can it be
> read by the State Processor API? Can it be written by the State Processor
> API? Both? Something else?

Good question. I'm not sure how State Processor API is working. Can someone
help answer what we guarantee/support right now and what we can
reasonably support?

> 1) and 2)

I guess you are simply in favour of the 2nd proposal? So

* rescaling
* Job upgrade w/o changing graph shape and record types
* Flink bug/patch (1.14.x → 1.14.y) version upgrade

+ Flink minor (1.x → 1.y) version upgrade

which I think is important for the native savepoint to be truly savepoints.

> 3) Should "Job upgrade w/o changing graph shape and record types" be
split? I guess "record types" is only relevant for unaligned checkpoints.

Shape of the job graph is also an issue with unaligned checkpoints.
Changing record types/serialisation causes obvious problems with the
in-flight records, but if you change job graph via for example changing
type of the network connection (like random -> broadcast, keyed -> non
keyed), or remove some operators, we also have problems with the in-flight
records in the affected connections.

> 4)

I think configuration change should be always supported. If you think
that's important, I can add this to the documentation/FLIP proposal as a
separate row.

> 5) Do the guarantees that a Savepoint/Checkpoint provide change when
> generalized incremental checkpoints [1] are enabled? My understanding is:
> No, the same guarantees apply.

This will be more tricky and it will highly depend on the FLIP-158
implementation.

Yun:

>   1.  From my understanding, native savepoint appears much closer to
current alignment checkpoint. What's their difference?

Technically there would be no difference, but we might decide to limit what
we officially support, to allow us easier changes in the future. Just as
for the most part so far between savepoint and checkpoints there was very
little difference. For us, as developers, the fewer things we officially
support and claim are stable, the better.

> 2.  If self-contained and relocatable are the most important difference,
why not include them in the proposal table?

Good point. I will add this.

>  What does "Job full upgrade" means?

I have clarified it to:
> Arbitrary job upgrade (changed graph shape/record types)

It's an arbitrary job change. Anything that doesn't fall into the second
category "Job upgrade w/o changing graph shape and record types"

Best,
Piotrek

pon., 17 sty 2022 o 11:25 Yun Tang  napisał(a):

> Hi everyone,
>
> Thanks for Piotr to drive this topic.
>
> I have several questions on this FLIP.
>
>   1.  From my understanding, native savepoint appears much closer to
> current alignment checkpoint. What's their difference?
>   2.  If self-contained and relocatable are the most important difference,
> why not include them in the proposal table?
>
>   1.  What does "Job full upgrade" means?
>
> For the question of RocksDB upgrading, this depends on the backwards
> compatibility [1], and it proves to be very well as the documentation said.
>
>
> [1]
> https://github.com/facebook/rocksdb/wiki/RocksDB-Compatibility-Between-Different-Releases
>
> Best,
> Yun Tang
>
>
>
> ____________
> From: Konstantin Knauf 
> Sent: Friday, January 14, 2022 20:39
> To: dev ; Seth Wiesman ; Nico
> Kruber ; dander...@apache.org 
> Subject: Re: [DISCUSS] FLIP-203: Incremental savepoints
>
> Hi everyone,
>
> Thank you, Piotr. Please find my thoughts on the topic below:
>
> 0) What exactly does the "State Processor API" row mean? Is it: Can it be
> read by the State Processor API? Can it be written by the State Processor
> API? Both? Something else?
>
> 1) If we take the assumption from FLIP-193 "that ownership should be the
> only difference between Checkpoints and Savepoints.", we would need to work
> in the direction of "Proposal 2". The distinction would then be the
> following:
>
> * Canonical Savepoint = Guarantees A
> * Canonical Checkpoint = Guarantees A (in theory; does not exist)
> * Aligned, Native Checkpoint = Guarantees 

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-17 Thread Yun Tang
Hi everyone,

Thanks for Piotr to drive this topic.

I have several questions on this FLIP.

  1.  From my understanding, native savepoint appears much closer to current 
alignment checkpoint. What's their difference?
  2.  If self-contained and relocatable are the most important difference, why 
not include them in the proposal table?

  1.  What does "Job full upgrade" means?

For the question of RocksDB upgrading, this depends on the backwards 
compatibility [1], and it proves to be very well as the documentation said.


[1] 
https://github.com/facebook/rocksdb/wiki/RocksDB-Compatibility-Between-Different-Releases

Best,
Yun Tang




From: Konstantin Knauf 
Sent: Friday, January 14, 2022 20:39
To: dev ; Seth Wiesman ; Nico Kruber 
; dander...@apache.org 
Subject: Re: [DISCUSS] FLIP-203: Incremental savepoints

Hi everyone,

Thank you, Piotr. Please find my thoughts on the topic below:

0) What exactly does the "State Processor API" row mean? Is it: Can it be
read by the State Processor API? Can it be written by the State Processor
API? Both? Something else?

1) If we take the assumption from FLIP-193 "that ownership should be the
only difference between Checkpoints and Savepoints.", we would need to work
in the direction of "Proposal 2". The distinction would then be the
following:

* Canonical Savepoint = Guarantees A
* Canonical Checkpoint = Guarantees A (in theory; does not exist)
* Aligned, Native Checkpoint = Guarantees B
* Aligned, Native Savepoint = Guarantees B
* Unaligned, Native Checkpoint = Guarantees C
* Unaligned, Native Savepoint = Guarantees C (if this would exist in the
future)

I think it is important to make this matrix not too complicated like: there
are 8 different sets of guarantees depending on all kinds of more or less
well-known configuration options.

2) With respect to the concrete guarantees, I believe, it's important that
we can cover all important use cases in "green", so that users can rely on
official, tested behavior in regular operations. In my experience this
includes manual recovery of a Job from a retained checkpoint. I would argue
that most users operating a long-running, stateful Apache Flink application
have been in the situation, where a graceful "stop" was not possible
anymore, because the Job was unable to take a Savepoint. This could be,
because the Job is frequently restarting (e.g. poison pill) or because it
fails on taking the Savepoint itself for some reason (e.g. unable to commit
a transaction to an external system). The solution strategy in this
scenario is to cancel the job, make some changes to the Job or
configuration that fix the problem and restore from the last successful
(retained) checkpoint. I think the following changes would need to be
officially supported for Native Checkpoints/Savepoint (Guarantees B,
ideally also Guarantees C), in order to fix a Job in most of these cases.

* rescaling
* Job upgrade w/o changing graph shape and record types
* Flink bug/patch (1.14.x → 1.14.y) version upgrade

I would be very interested to hear from users as well as people like Seth,
Nico or David (cc), who work with many users, what  in their experience
would be needed here.

3) Should "Job upgrade w/o changing graph shape and record types" be split?
I guess "record types" is only relevant for unaligned checkpoints.

4) Does it make sense to consider Flink configuration changes besides the
statebackend type as another row? Maybe split by "pipeline.*" options,
"execution.*" options, and whichever other categories would make sense.
Just to give a few examples: it should be *officially* supported to take a
native retained checkpoint and restart a the Job with a
pipeline.auto-watermark-interval and different high-availability
configurations

5) Do the guarantees that a Savepoint/Checkpoint provide change when
generalized incremental checkpoints [1] are enabled? My understanding is:
No, the same guarantees apply.

Cheers and thank you,

Konstantin

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints?src=contextnavpagetreemode

On Fri, Jan 14, 2022 at 11:24 AM David Anderson 
wrote:

> > I have a very similar question to State Processor API. Is it the same
> scenario in this case?
> > Should it also be working with checkpoints but might be just untested?
>
> I have used the State Processor API with aligned, full checkpoints. There
> it has worked just fine.
>
> David
>
> On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > Thanks for the comments and questions. Starting from the top:
> >
> > Seth: good point about schema evolution. Actually, I have a very similar
> > question to State Processor API. Is it the same scenario in this case?
> > Should it also be working with check

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-14 Thread Konstantin Knauf
Hi everyone,

Thank you, Piotr. Please find my thoughts on the topic below:

0) What exactly does the "State Processor API" row mean? Is it: Can it be
read by the State Processor API? Can it be written by the State Processor
API? Both? Something else?

1) If we take the assumption from FLIP-193 "that ownership should be the
only difference between Checkpoints and Savepoints.", we would need to work
in the direction of "Proposal 2". The distinction would then be the
following:

* Canonical Savepoint = Guarantees A
* Canonical Checkpoint = Guarantees A (in theory; does not exist)
* Aligned, Native Checkpoint = Guarantees B
* Aligned, Native Savepoint = Guarantees B
* Unaligned, Native Checkpoint = Guarantees C
* Unaligned, Native Savepoint = Guarantees C (if this would exist in the
future)

I think it is important to make this matrix not too complicated like: there
are 8 different sets of guarantees depending on all kinds of more or less
well-known configuration options.

2) With respect to the concrete guarantees, I believe, it's important that
we can cover all important use cases in "green", so that users can rely on
official, tested behavior in regular operations. In my experience this
includes manual recovery of a Job from a retained checkpoint. I would argue
that most users operating a long-running, stateful Apache Flink application
have been in the situation, where a graceful "stop" was not possible
anymore, because the Job was unable to take a Savepoint. This could be,
because the Job is frequently restarting (e.g. poison pill) or because it
fails on taking the Savepoint itself for some reason (e.g. unable to commit
a transaction to an external system). The solution strategy in this
scenario is to cancel the job, make some changes to the Job or
configuration that fix the problem and restore from the last successful
(retained) checkpoint. I think the following changes would need to be
officially supported for Native Checkpoints/Savepoint (Guarantees B,
ideally also Guarantees C), in order to fix a Job in most of these cases.

* rescaling
* Job upgrade w/o changing graph shape and record types
* Flink bug/patch (1.14.x → 1.14.y) version upgrade

I would be very interested to hear from users as well as people like Seth,
Nico or David (cc), who work with many users, what  in their experience
would be needed here.

3) Should "Job upgrade w/o changing graph shape and record types" be split?
I guess "record types" is only relevant for unaligned checkpoints.

4) Does it make sense to consider Flink configuration changes besides the
statebackend type as another row? Maybe split by "pipeline.*" options,
"execution.*" options, and whichever other categories would make sense.
Just to give a few examples: it should be *officially* supported to take a
native retained checkpoint and restart a the Job with a
pipeline.auto-watermark-interval and different high-availability
configurations

5) Do the guarantees that a Savepoint/Checkpoint provide change when
generalized incremental checkpoints [1] are enabled? My understanding is:
No, the same guarantees apply.

Cheers and thank you,

Konstantin

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints?src=contextnavpagetreemode

On Fri, Jan 14, 2022 at 11:24 AM David Anderson 
wrote:

> > I have a very similar question to State Processor API. Is it the same
> scenario in this case?
> > Should it also be working with checkpoints but might be just untested?
>
> I have used the State Processor API with aligned, full checkpoints. There
> it has worked just fine.
>
> David
>
> On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > Thanks for the comments and questions. Starting from the top:
> >
> > Seth: good point about schema evolution. Actually, I have a very similar
> > question to State Processor API. Is it the same scenario in this case?
> > Should it also be working with checkpoints but might be just untested?
> >
> > And next question, should we commit to supporting those two things (State
> > Processor API and schema evolution) for native savepoints? What about
> > aligned checkpoints? (please check [1] for that).
> >
> > Yu Li: 1, 2 and 4 done.
> >
> > > 3. How about changing the description of "the default configuration of
> > the
> > > checkpoints will be used to determine whether the savepoint should be
> > > incremental or not" to something like "the `state.backend.incremental`
> > > setting now denotes the type of native format snapshot and will take
> > effect
> > > for both checkpoint and savepoint (with native type)", to prevent
> concept
> > > confusion between checkpoint and savepoint?
> >
> > Is `state.backend.incremental` the only configuration parameter that can
> be
> > used in this context? I would guess not? What about for example
> > "state.storage.fs.memory-threshold" or all of the Advanced RocksDB State
> > Backends Options [2]?
> >
> > David:
> >
> > > does this mean that we 

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-14 Thread David Anderson
> I have a very similar question to State Processor API. Is it the same
scenario in this case?
> Should it also be working with checkpoints but might be just untested?

I have used the State Processor API with aligned, full checkpoints. There
it has worked just fine.

David

On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski 
wrote:

> Hi,
>
> Thanks for the comments and questions. Starting from the top:
>
> Seth: good point about schema evolution. Actually, I have a very similar
> question to State Processor API. Is it the same scenario in this case?
> Should it also be working with checkpoints but might be just untested?
>
> And next question, should we commit to supporting those two things (State
> Processor API and schema evolution) for native savepoints? What about
> aligned checkpoints? (please check [1] for that).
>
> Yu Li: 1, 2 and 4 done.
>
> > 3. How about changing the description of "the default configuration of
> the
> > checkpoints will be used to determine whether the savepoint should be
> > incremental or not" to something like "the `state.backend.incremental`
> > setting now denotes the type of native format snapshot and will take
> effect
> > for both checkpoint and savepoint (with native type)", to prevent concept
> > confusion between checkpoint and savepoint?
>
> Is `state.backend.incremental` the only configuration parameter that can be
> used in this context? I would guess not? What about for example
> "state.storage.fs.memory-threshold" or all of the Advanced RocksDB State
> Backends Options [2]?
>
> David:
>
> > does this mean that we need to keep the checkpoints compatible across
> minor
> > versions? Or can we say, that the minor version upgrades are only
> > guaranteed with canonical savepoints?
>
> Good question. Frankly I was always assuming that this is implicitly given.
> Otherwise users would not be able to recover jobs that are failing because
> of bugs in Flink. But I'm pretty sure that was never explicitly stated.
>
> As Konstantin suggested, I've written down the pre-existing guarantees of
> checkpoints and savepoints followed by two proposals on how they should be
> changed [1]. Could you take a look?
>
> I'm especially unsure about the following things:
> a) What about RocksDB upgrades? If we bump RocksDB version between Flink
> versions, do we support recovering from a native format snapshot
> (incremental checkpoint)?
> b) State Processor API - both pre-existing and what do we want to provide
> in the future
> c) Schema Evolution - both pre-existing and what do we want to provide in
> the future
>
> Best,
> Piotrek
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options
>
> wt., 11 sty 2022 o 09:45 Konstantin Knauf  napisał(a):
>
> > Hi Piotr,
> >
> > would it be possible to provide a table that shows the
> > compatibility guarantees provided by the different snapshots going
> forward?
> > Like type of change (Topology. State Schema, Parallelism, ..) in one
> > dimension, and type of snapshot as the other dimension. Based on that, it
> > would be easier to discuss those guarantees, I believe.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Mon, Jan 3, 2022 at 9:11 AM David Morávek  wrote:
> >
> > > Hi Piotr,
> > >
> > > does this mean that we need to keep the checkpoints compatible across
> > minor
> > > versions? Or can we say, that the minor version upgrades are only
> > > guaranteed with canonical savepoints?
> > >
> > > My concern is especially if we'd want to change layout of the
> checkpoint.
> > >
> > > D.
> > >
> > >
> > >
> > > On Wed, Dec 29, 2021 at 5:19 AM Yu Li  wrote:
> > >
> > > > Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below
> > are
> > > > my two cents:
> > > >
> > > > 1. How about adding a "Term Definition" section and clarify what
> > "native
> > > > format" (the "native" data persistence format of the current state
> > > backend)
> > > > and "canonical format" (the "uniform" format that supports switching
> > > state
> > > > backends) means?
> > > >
> > > > 2. IIUC, currently the FLIP proposes to only support incremental
> > > savepoint
> > > > with native format, and there's no plan to add such support for
> > canonical
> > > > format, right? If so, how about writing this down explicitly in the
> > FLIP
> > > > doc, maybe in a "Limitations" section, plus the fact that
> > > > `HashMapStateBackend` cannot support incremental savepoint before
> > > FLIP-151
> > > > is done? (side note: @Roman just a kindly reminder, that please take
> > > > FLIP-203 into account when implementing FLIP-151)
> > > >
> > > > 3. How about changing the description of "the default configuration
> of
> > > the
> > > > checkpoints will be used to determine whether the savepoint should be
> > > > incremental or not" to 

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-13 Thread Yu Li
Thanks for the update, Piotr!

> Is `state.backend.incremental` the only configuration parameter that can
be
> used in this context?
According to FLIP-193 [1], all the existing checkpoint configurations are
actually for *Snapshot*, ownership (lifecycle) is the only difference
between Checkpoints and Savepoints, and I suggest we keep the description
aligned with FLIP-193.

> a) What about RocksDB upgrades? If we bump RocksDB version between Flink
> versions, do we support recovering from a native format snapshot
> (incremental checkpoint)?
Below are my two cents:
* The functionality of incremental native-format savepoint is (like
*snapshot* in traditional database [2]) to (fast) produce a persisted,
self-contained version of the current state of the job for point-in-time
recovery, but cannot replace canonical savepoint (like *backup* in
traditional database) for upgrading or state-backend-switching, etc.
* We prefer such functionality to be supplied by a *savepoint* instead of a
(retained) *checkpoint* because the life-cycle of the data should be
user-controlled rather than system-controlled [1].
* If we'd like to cover all functionalities the canonical savepoint has
now, the design for incremental *canonical-format* savepoint would be
required, which is more complicated and could be considered as future work.

Best Regards,
Yu

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
[2] https://www.hitechnectar.com/blogs/snapshot-vs-backup/


On Thu, 13 Jan 2022 at 19:40, Piotr Nowojski  wrote:

> Hi,
>
> Thanks for the comments and questions. Starting from the top:
>
> Seth: good point about schema evolution. Actually, I have a very similar
> question to State Processor API. Is it the same scenario in this case?
> Should it also be working with checkpoints but might be just untested?
>
> And next question, should we commit to supporting those two things (State
> Processor API and schema evolution) for native savepoints? What about
> aligned checkpoints? (please check [1] for that).
>
> Yu Li: 1, 2 and 4 done.
>
> > 3. How about changing the description of "the default configuration of
> the
> > checkpoints will be used to determine whether the savepoint should be
> > incremental or not" to something like "the `state.backend.incremental`
> > setting now denotes the type of native format snapshot and will take
> effect
> > for both checkpoint and savepoint (with native type)", to prevent concept
> > confusion between checkpoint and savepoint?
>
> Is `state.backend.incremental` the only configuration parameter that can be
> used in this context? I would guess not? What about for example
> "state.storage.fs.memory-threshold" or all of the Advanced RocksDB State
> Backends Options [2]?
>
> David:
>
> > does this mean that we need to keep the checkpoints compatible across
> minor
> > versions? Or can we say, that the minor version upgrades are only
> > guaranteed with canonical savepoints?
>
> Good question. Frankly I was always assuming that this is implicitly given.
> Otherwise users would not be able to recover jobs that are failing because
> of bugs in Flink. But I'm pretty sure that was never explicitly stated.
>
> As Konstantin suggested, I've written down the pre-existing guarantees of
> checkpoints and savepoints followed by two proposals on how they should be
> changed [1]. Could you take a look?
>
> I'm especially unsure about the following things:
> a) What about RocksDB upgrades? If we bump RocksDB version between Flink
> versions, do we support recovering from a native format snapshot
> (incremental checkpoint)?
> b) State Processor API - both pre-existing and what do we want to provide
> in the future
> c) Schema Evolution - both pre-existing and what do we want to provide in
> the future
>
> Best,
> Piotrek
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options
>
> wt., 11 sty 2022 o 09:45 Konstantin Knauf  napisał(a):
>
> > Hi Piotr,
> >
> > would it be possible to provide a table that shows the
> > compatibility guarantees provided by the different snapshots going
> forward?
> > Like type of change (Topology. State Schema, Parallelism, ..) in one
> > dimension, and type of snapshot as the other dimension. Based on that, it
> > would be easier to discuss those guarantees, I believe.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Mon, Jan 3, 2022 at 9:11 AM David Morávek  wrote:
> >
> > > Hi Piotr,
> > >
> > > does this mean that we need to keep the checkpoints compatible across
> > minor
> > > versions? Or can we say, that the minor version upgrades are only
> > > guaranteed with canonical savepoints?
> > >
> > > My concern is especially if we'd want to change layout of the
> checkpoint.
> > >
> > > D.
> > >
> > >
> > >
> > > On Wed, Dec 

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-13 Thread Piotr Nowojski
Hi,

Thanks for the comments and questions. Starting from the top:

Seth: good point about schema evolution. Actually, I have a very similar
question to State Processor API. Is it the same scenario in this case?
Should it also be working with checkpoints but might be just untested?

And next question, should we commit to supporting those two things (State
Processor API and schema evolution) for native savepoints? What about
aligned checkpoints? (please check [1] for that).

Yu Li: 1, 2 and 4 done.

> 3. How about changing the description of "the default configuration of the
> checkpoints will be used to determine whether the savepoint should be
> incremental or not" to something like "the `state.backend.incremental`
> setting now denotes the type of native format snapshot and will take
effect
> for both checkpoint and savepoint (with native type)", to prevent concept
> confusion between checkpoint and savepoint?

Is `state.backend.incremental` the only configuration parameter that can be
used in this context? I would guess not? What about for example
"state.storage.fs.memory-threshold" or all of the Advanced RocksDB State
Backends Options [2]?

David:

> does this mean that we need to keep the checkpoints compatible across
minor
> versions? Or can we say, that the minor version upgrades are only
> guaranteed with canonical savepoints?

Good question. Frankly I was always assuming that this is implicitly given.
Otherwise users would not be able to recover jobs that are failing because
of bugs in Flink. But I'm pretty sure that was never explicitly stated.

As Konstantin suggested, I've written down the pre-existing guarantees of
checkpoints and savepoints followed by two proposals on how they should be
changed [1]. Could you take a look?

I'm especially unsure about the following things:
a) What about RocksDB upgrades? If we bump RocksDB version between Flink
versions, do we support recovering from a native format snapshot
(incremental checkpoint)?
b) State Processor API - both pre-existing and what do we want to provide
in the future
c) Schema Evolution - both pre-existing and what do we want to provide in
the future

Best,
Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options

wt., 11 sty 2022 o 09:45 Konstantin Knauf  napisał(a):

> Hi Piotr,
>
> would it be possible to provide a table that shows the
> compatibility guarantees provided by the different snapshots going forward?
> Like type of change (Topology. State Schema, Parallelism, ..) in one
> dimension, and type of snapshot as the other dimension. Based on that, it
> would be easier to discuss those guarantees, I believe.
>
> Cheers,
>
> Konstantin
>
> On Mon, Jan 3, 2022 at 9:11 AM David Morávek  wrote:
>
> > Hi Piotr,
> >
> > does this mean that we need to keep the checkpoints compatible across
> minor
> > versions? Or can we say, that the minor version upgrades are only
> > guaranteed with canonical savepoints?
> >
> > My concern is especially if we'd want to change layout of the checkpoint.
> >
> > D.
> >
> >
> >
> > On Wed, Dec 29, 2021 at 5:19 AM Yu Li  wrote:
> >
> > > Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below
> are
> > > my two cents:
> > >
> > > 1. How about adding a "Term Definition" section and clarify what
> "native
> > > format" (the "native" data persistence format of the current state
> > backend)
> > > and "canonical format" (the "uniform" format that supports switching
> > state
> > > backends) means?
> > >
> > > 2. IIUC, currently the FLIP proposes to only support incremental
> > savepoint
> > > with native format, and there's no plan to add such support for
> canonical
> > > format, right? If so, how about writing this down explicitly in the
> FLIP
> > > doc, maybe in a "Limitations" section, plus the fact that
> > > `HashMapStateBackend` cannot support incremental savepoint before
> > FLIP-151
> > > is done? (side note: @Roman just a kindly reminder, that please take
> > > FLIP-203 into account when implementing FLIP-151)
> > >
> > > 3. How about changing the description of "the default configuration of
> > the
> > > checkpoints will be used to determine whether the savepoint should be
> > > incremental or not" to something like "the `state.backend.incremental`
> > > setting now denotes the type of native format snapshot and will take
> > effect
> > > for both checkpoint and savepoint (with native type)", to prevent
> concept
> > > confusion between checkpoint and savepoint?
> > >
> > > 4. How about putting the notes of behavior change (the default type of
> > > savepoint will be changed to `native` in the future, and by then the
> > taken
> > > savepoint cannot be used to switch state backends by default) to a more
> > > obvious place, for example moving from the "CLI" section 

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-11 Thread Konstantin Knauf
Hi Piotr,

would it be possible to provide a table that shows the
compatibility guarantees provided by the different snapshots going forward?
Like type of change (Topology. State Schema, Parallelism, ..) in one
dimension, and type of snapshot as the other dimension. Based on that, it
would be easier to discuss those guarantees, I believe.

Cheers,

Konstantin

On Mon, Jan 3, 2022 at 9:11 AM David Morávek  wrote:

> Hi Piotr,
>
> does this mean that we need to keep the checkpoints compatible across minor
> versions? Or can we say, that the minor version upgrades are only
> guaranteed with canonical savepoints?
>
> My concern is especially if we'd want to change layout of the checkpoint.
>
> D.
>
>
>
> On Wed, Dec 29, 2021 at 5:19 AM Yu Li  wrote:
>
> > Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below are
> > my two cents:
> >
> > 1. How about adding a "Term Definition" section and clarify what "native
> > format" (the "native" data persistence format of the current state
> backend)
> > and "canonical format" (the "uniform" format that supports switching
> state
> > backends) means?
> >
> > 2. IIUC, currently the FLIP proposes to only support incremental
> savepoint
> > with native format, and there's no plan to add such support for canonical
> > format, right? If so, how about writing this down explicitly in the FLIP
> > doc, maybe in a "Limitations" section, plus the fact that
> > `HashMapStateBackend` cannot support incremental savepoint before
> FLIP-151
> > is done? (side note: @Roman just a kindly reminder, that please take
> > FLIP-203 into account when implementing FLIP-151)
> >
> > 3. How about changing the description of "the default configuration of
> the
> > checkpoints will be used to determine whether the savepoint should be
> > incremental or not" to something like "the `state.backend.incremental`
> > setting now denotes the type of native format snapshot and will take
> effect
> > for both checkpoint and savepoint (with native type)", to prevent concept
> > confusion between checkpoint and savepoint?
> >
> > 4. How about putting the notes of behavior change (the default type of
> > savepoint will be changed to `native` in the future, and by then the
> taken
> > savepoint cannot be used to switch state backends by default) to a more
> > obvious place, for example moving from the "CLI" section to the
> > "Compatibility" section? (although it will only happen in 1.16 release
> > based on the proposed plan)
> >
> > And all above suggestions apply for our user-facing document after the
> FLIP
> > is (partially or completely, accordingly) done, if taken (smile).
> >
> > Best Regards,
> > Yu
> >
> >
> > On Tue, 21 Dec 2021 at 22:23, Seth Wiesman  wrote:
> >
> > > >> AFAIK state schema evolution should work both for native and
> canonical
> > > >> savepoints.
> > >
> > > Schema evolution does technically work for both formats, it happens
> after
> > > the code paths have been unified, but the community has up until this
> > point
> > > considered that an unsupported feature. From my perspective making this
> > > supported could be as simple as adding test coverage but that's an
> active
> > > decision we'd need to make.
> > >
> > > On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski 
> > > wrote:
> > >
> > > > Hi Konstantin,
> > > >
> > > > > In this context: will the native format support state schema
> > evolution?
> > > > If
> > > > > not, I am not sure, we can let the format default to native.
> > > >
> > > > AFAIK state schema evolution should work both for native and
> canonical
> > > > savepoints.
> > > >
> > > > Regarding what is/will be supported we will document as part of this
> > > > FLIP-203. But it's not as simple as just the difference between
> native
> > > and
> > > > canonical formats.
> > > >
> > > > Best, Piotrek
> > > >
> > > > pon., 20 gru 2021 o 14:28 Konstantin Knauf 
> > > napisał(a):
> > > >
> > > > > Hi Piotr,
> > > > >
> > > > > Thanks a lot for starting the discussion. Big +1.
> > > > >
> > > > > In my understanding, this FLIP introduces the snapshot format as a
> > > > *really*
> > > > > user facing concept. IMO it is important that we document
> > > > >
> > > > > a) that it is not longer the checkpoint/savepoint characteristics
> > that
> > > > > determines the kind of changes that a snapshots allows (user code,
> > > state
> > > > > schema evolution, topology changes), but now this becomes a
> property
> > of
> > > > the
> > > > > format regardless of whether this is a snapshots or a checkpoint
> > > > > b) the exact changes that each format allows (code, state schema,
> > > > topology,
> > > > > state backend, max parallelism)
> > > > >
> > > > > In this context: will the native format support state schema
> > evolution?
> > > > If
> > > > > not, I am not sure, we can let the format default to native.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Konstantin
> > > > >
> > > > >
> > > > > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski <
> 

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-03 Thread David Morávek
Hi Piotr,

does this mean that we need to keep the checkpoints compatible across minor
versions? Or can we say, that the minor version upgrades are only
guaranteed with canonical savepoints?

My concern is especially if we'd want to change layout of the checkpoint.

D.



On Wed, Dec 29, 2021 at 5:19 AM Yu Li  wrote:

> Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below are
> my two cents:
>
> 1. How about adding a "Term Definition" section and clarify what "native
> format" (the "native" data persistence format of the current state backend)
> and "canonical format" (the "uniform" format that supports switching state
> backends) means?
>
> 2. IIUC, currently the FLIP proposes to only support incremental savepoint
> with native format, and there's no plan to add such support for canonical
> format, right? If so, how about writing this down explicitly in the FLIP
> doc, maybe in a "Limitations" section, plus the fact that
> `HashMapStateBackend` cannot support incremental savepoint before FLIP-151
> is done? (side note: @Roman just a kindly reminder, that please take
> FLIP-203 into account when implementing FLIP-151)
>
> 3. How about changing the description of "the default configuration of the
> checkpoints will be used to determine whether the savepoint should be
> incremental or not" to something like "the `state.backend.incremental`
> setting now denotes the type of native format snapshot and will take effect
> for both checkpoint and savepoint (with native type)", to prevent concept
> confusion between checkpoint and savepoint?
>
> 4. How about putting the notes of behavior change (the default type of
> savepoint will be changed to `native` in the future, and by then the taken
> savepoint cannot be used to switch state backends by default) to a more
> obvious place, for example moving from the "CLI" section to the
> "Compatibility" section? (although it will only happen in 1.16 release
> based on the proposed plan)
>
> And all above suggestions apply for our user-facing document after the FLIP
> is (partially or completely, accordingly) done, if taken (smile).
>
> Best Regards,
> Yu
>
>
> On Tue, 21 Dec 2021 at 22:23, Seth Wiesman  wrote:
>
> > >> AFAIK state schema evolution should work both for native and canonical
> > >> savepoints.
> >
> > Schema evolution does technically work for both formats, it happens after
> > the code paths have been unified, but the community has up until this
> point
> > considered that an unsupported feature. From my perspective making this
> > supported could be as simple as adding test coverage but that's an active
> > decision we'd need to make.
> >
> > On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski 
> > wrote:
> >
> > > Hi Konstantin,
> > >
> > > > In this context: will the native format support state schema
> evolution?
> > > If
> > > > not, I am not sure, we can let the format default to native.
> > >
> > > AFAIK state schema evolution should work both for native and canonical
> > > savepoints.
> > >
> > > Regarding what is/will be supported we will document as part of this
> > > FLIP-203. But it's not as simple as just the difference between native
> > and
> > > canonical formats.
> > >
> > > Best, Piotrek
> > >
> > > pon., 20 gru 2021 o 14:28 Konstantin Knauf 
> > napisał(a):
> > >
> > > > Hi Piotr,
> > > >
> > > > Thanks a lot for starting the discussion. Big +1.
> > > >
> > > > In my understanding, this FLIP introduces the snapshot format as a
> > > *really*
> > > > user facing concept. IMO it is important that we document
> > > >
> > > > a) that it is not longer the checkpoint/savepoint characteristics
> that
> > > > determines the kind of changes that a snapshots allows (user code,
> > state
> > > > schema evolution, topology changes), but now this becomes a property
> of
> > > the
> > > > format regardless of whether this is a snapshots or a checkpoint
> > > > b) the exact changes that each format allows (code, state schema,
> > > topology,
> > > > state backend, max parallelism)
> > > >
> > > > In this context: will the native format support state schema
> evolution?
> > > If
> > > > not, I am not sure, we can let the format default to native.
> > > >
> > > > Thanks,
> > > >
> > > > Konstantin
> > > >
> > > >
> > > > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski  >
> > > > wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > I would like to start a discussion about a previously announced
> > follow
> > > up
> > > > > of the FLIP-193 [1], namely allowing savepoints to be in native
> > format
> > > > and
> > > > > incremental. The changes do not seem invasive. The full proposal is
> > > > > written down as FLIP-203: Incremental savepoints [2]. Please take a
> > > look,
> > > > > and let me know what you think.
> > > > >
> > > > > Best,
> > > > > Piotrek
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > > > > [2]
> > > > >
> > > > >
> > > >

Re: [DISCUSS] FLIP-203: Incremental savepoints

2021-12-28 Thread Yu Li
Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below are
my two cents:

1. How about adding a "Term Definition" section and clarify what "native
format" (the "native" data persistence format of the current state backend)
and "canonical format" (the "uniform" format that supports switching state
backends) means?

2. IIUC, currently the FLIP proposes to only support incremental savepoint
with native format, and there's no plan to add such support for canonical
format, right? If so, how about writing this down explicitly in the FLIP
doc, maybe in a "Limitations" section, plus the fact that
`HashMapStateBackend` cannot support incremental savepoint before FLIP-151
is done? (side note: @Roman just a kindly reminder, that please take
FLIP-203 into account when implementing FLIP-151)

3. How about changing the description of "the default configuration of the
checkpoints will be used to determine whether the savepoint should be
incremental or not" to something like "the `state.backend.incremental`
setting now denotes the type of native format snapshot and will take effect
for both checkpoint and savepoint (with native type)", to prevent concept
confusion between checkpoint and savepoint?

4. How about putting the notes of behavior change (the default type of
savepoint will be changed to `native` in the future, and by then the taken
savepoint cannot be used to switch state backends by default) to a more
obvious place, for example moving from the "CLI" section to the
"Compatibility" section? (although it will only happen in 1.16 release
based on the proposed plan)

And all above suggestions apply for our user-facing document after the FLIP
is (partially or completely, accordingly) done, if taken (smile).

Best Regards,
Yu


On Tue, 21 Dec 2021 at 22:23, Seth Wiesman  wrote:

> >> AFAIK state schema evolution should work both for native and canonical
> >> savepoints.
>
> Schema evolution does technically work for both formats, it happens after
> the code paths have been unified, but the community has up until this point
> considered that an unsupported feature. From my perspective making this
> supported could be as simple as adding test coverage but that's an active
> decision we'd need to make.
>
> On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski 
> wrote:
>
> > Hi Konstantin,
> >
> > > In this context: will the native format support state schema evolution?
> > If
> > > not, I am not sure, we can let the format default to native.
> >
> > AFAIK state schema evolution should work both for native and canonical
> > savepoints.
> >
> > Regarding what is/will be supported we will document as part of this
> > FLIP-203. But it's not as simple as just the difference between native
> and
> > canonical formats.
> >
> > Best, Piotrek
> >
> > pon., 20 gru 2021 o 14:28 Konstantin Knauf 
> napisał(a):
> >
> > > Hi Piotr,
> > >
> > > Thanks a lot for starting the discussion. Big +1.
> > >
> > > In my understanding, this FLIP introduces the snapshot format as a
> > *really*
> > > user facing concept. IMO it is important that we document
> > >
> > > a) that it is not longer the checkpoint/savepoint characteristics that
> > > determines the kind of changes that a snapshots allows (user code,
> state
> > > schema evolution, topology changes), but now this becomes a property of
> > the
> > > format regardless of whether this is a snapshots or a checkpoint
> > > b) the exact changes that each format allows (code, state schema,
> > topology,
> > > state backend, max parallelism)
> > >
> > > In this context: will the native format support state schema evolution?
> > If
> > > not, I am not sure, we can let the format default to native.
> > >
> > > Thanks,
> > >
> > > Konstantin
> > >
> > >
> > > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski 
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I would like to start a discussion about a previously announced
> follow
> > up
> > > > of the FLIP-193 [1], namely allowing savepoints to be in native
> format
> > > and
> > > > incremental. The changes do not seem invasive. The full proposal is
> > > > written down as FLIP-203: Incremental savepoints [2]. Please take a
> > look,
> > > > and let me know what you think.
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
> > > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
> >
>


Re: [DISCUSS] FLIP-203: Incremental savepoints

2021-12-21 Thread Seth Wiesman
>> AFAIK state schema evolution should work both for native and canonical
>> savepoints.

Schema evolution does technically work for both formats, it happens after
the code paths have been unified, but the community has up until this point
considered that an unsupported feature. From my perspective making this
supported could be as simple as adding test coverage but that's an active
decision we'd need to make.

On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski  wrote:

> Hi Konstantin,
>
> > In this context: will the native format support state schema evolution?
> If
> > not, I am not sure, we can let the format default to native.
>
> AFAIK state schema evolution should work both for native and canonical
> savepoints.
>
> Regarding what is/will be supported we will document as part of this
> FLIP-203. But it's not as simple as just the difference between native and
> canonical formats.
>
> Best, Piotrek
>
> pon., 20 gru 2021 o 14:28 Konstantin Knauf  napisał(a):
>
> > Hi Piotr,
> >
> > Thanks a lot for starting the discussion. Big +1.
> >
> > In my understanding, this FLIP introduces the snapshot format as a
> *really*
> > user facing concept. IMO it is important that we document
> >
> > a) that it is not longer the checkpoint/savepoint characteristics that
> > determines the kind of changes that a snapshots allows (user code, state
> > schema evolution, topology changes), but now this becomes a property of
> the
> > format regardless of whether this is a snapshots or a checkpoint
> > b) the exact changes that each format allows (code, state schema,
> topology,
> > state backend, max parallelism)
> >
> > In this context: will the native format support state schema evolution?
> If
> > not, I am not sure, we can let the format default to native.
> >
> > Thanks,
> >
> > Konstantin
> >
> >
> > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski 
> > wrote:
> >
> > > Hi devs,
> > >
> > > I would like to start a discussion about a previously announced follow
> up
> > > of the FLIP-193 [1], namely allowing savepoints to be in native format
> > and
> > > incremental. The changes do not seem invasive. The full proposal is
> > > written down as FLIP-203: Incremental savepoints [2]. Please take a
> look,
> > > and let me know what you think.
> > >
> > > Best,
> > > Piotrek
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
> > >
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>


Re: [DISCUSS] FLIP-203: Incremental savepoints

2021-12-21 Thread Piotr Nowojski
Hi Konstantin,

> In this context: will the native format support state schema evolution? If
> not, I am not sure, we can let the format default to native.

AFAIK state schema evolution should work both for native and canonical
savepoints.

Regarding what is/will be supported we will document as part of this
FLIP-203. But it's not as simple as just the difference between native and
canonical formats.

Best, Piotrek

pon., 20 gru 2021 o 14:28 Konstantin Knauf  napisał(a):

> Hi Piotr,
>
> Thanks a lot for starting the discussion. Big +1.
>
> In my understanding, this FLIP introduces the snapshot format as a *really*
> user facing concept. IMO it is important that we document
>
> a) that it is not longer the checkpoint/savepoint characteristics that
> determines the kind of changes that a snapshots allows (user code, state
> schema evolution, topology changes), but now this becomes a property of the
> format regardless of whether this is a snapshots or a checkpoint
> b) the exact changes that each format allows (code, state schema, topology,
> state backend, max parallelism)
>
> In this context: will the native format support state schema evolution? If
> not, I am not sure, we can let the format default to native.
>
> Thanks,
>
> Konstantin
>
>
> On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski 
> wrote:
>
> > Hi devs,
> >
> > I would like to start a discussion about a previously announced follow up
> > of the FLIP-193 [1], namely allowing savepoints to be in native format
> and
> > incremental. The changes do not seem invasive. The full proposal is
> > written down as FLIP-203: Incremental savepoints [2]. Please take a look,
> > and let me know what you think.
> >
> > Best,
> > Piotrek
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


Re: [DISCUSS] FLIP-203: Incremental savepoints

2021-12-20 Thread Konstantin Knauf
Hi Piotr,

Thanks a lot for starting the discussion. Big +1.

In my understanding, this FLIP introduces the snapshot format as a *really*
user facing concept. IMO it is important that we document

a) that it is not longer the checkpoint/savepoint characteristics that
determines the kind of changes that a snapshots allows (user code, state
schema evolution, topology changes), but now this becomes a property of the
format regardless of whether this is a snapshots or a checkpoint
b) the exact changes that each format allows (code, state schema, topology,
state backend, max parallelism)

In this context: will the native format support state schema evolution? If
not, I am not sure, we can let the format default to native.

Thanks,

Konstantin


On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski  wrote:

> Hi devs,
>
> I would like to start a discussion about a previously announced follow up
> of the FLIP-193 [1], namely allowing savepoints to be in native format and
> incremental. The changes do not seem invasive. The full proposal is
> written down as FLIP-203: Incremental savepoints [2]. Please take a look,
> and let me know what you think.
>
> Best,
> Piotrek
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[DISCUSS] FLIP-203: Incremental savepoints

2021-12-20 Thread Piotr Nowojski
Hi devs,

I would like to start a discussion about a previously announced follow up
of the FLIP-193 [1], namely allowing savepoints to be in native format and
incremental. The changes do not seem invasive. The full proposal is
written down as FLIP-203: Incremental savepoints [2]. Please take a look,
and let me know what you think.

Best,
Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic