[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628029#comment-15628029 ] ASF GitHub Bot commented on FLINK-4445: --- Github user uce closed the pull request at: https://github.com/apache/flink/pull/2713 > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.1.4 > > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627948#comment-15627948 ] ASF GitHub Bot commented on FLINK-4445: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2712 > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622672#comment-15622672 ] ASF GitHub Bot commented on FLINK-4445: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/2712 Very much agree Stephan! I don't know what sounds better to native speakers and more intuitive to users though... unresumed state or non restored state? ;-) @greghogan @jgrier do you have any input on this? The internal behaviour is the following: The checkpoint/savepoint stores state for each operator of the original job graph (from which the checkpoint/savepoint was triggered) keyed by the operator ID. When a user resumes from this checkpoint/savepoint, the checkpoint coordinator tries to map each state (keyed by operator ID) to the operators of the new job. This PR *allows* ;-) that some of this state is not restored. Any ideas on how to call this? > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622651#comment-15622651 ] ASF GitHub Bot commented on FLINK-4445: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2712 Looks good to me, except the name `ignoreUnmappedState` ;-) As usual, the three most difficult things in computer science are (1) finding good names, and (2) off-by-one errors. What do you think about calling something like `allowUnresumedState` or `allowNonRestoredState`? The *"allow"* to me implies that this is a valid scenario. > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622362#comment-15622362 ] ASF GitHub Bot commented on FLINK-4445: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/2712 Thanks for the review. Going to merge this. > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622364#comment-15622364 ] ASF GitHub Bot commented on FLINK-4445: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/2713 Going to merge this as #2712 was reviewed and this is essentially the same. > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615140#comment-15615140 ] ASF GitHub Bot commented on FLINK-4445: --- GitHub user uce opened a pull request: https://github.com/apache/flink/pull/2713 [FLINK-4445] Add option to ignore unmapped checkpoint state Backport of #2712 for `release-1.1`. Technically, this adds new behaviour to a bugfix release, but the default behaviour is not changed and multiple users already ran into this. In such a case, there is no straight forward way to work around this issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink 4445-unmatched_state-backport_1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2713 commit d45e13d2f1e9143458b8c45e2c5201196bf70375 Author: Ufuk Celebi Date: 2016-10-26T16:00:21Z [FLINK-4445] [client] Add ignoreUnmappedState flag to CLI Allow to specify whether a checkpoint restore should ignore checkpoint state that it cannot map to the program. This is exposed via the CLI in the run command: bin/flink run -s -i ... Furthermore, the savepoint restore settings are moved out of the snapshotting settings. commit 91f677d8906d7ad92a6919b7756011280a20a5f7 Author: Ufuk Celebi Date: 2016-10-27T07:49:01Z [FLINK-4445] [checkpointing] Add option to ignore unmatched savepoint state Allows to ignore savepoint state that cannot be mapped to a job vertex when restoring. > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615136#comment-15615136 ] ASF GitHub Bot commented on FLINK-4445: --- GitHub user uce opened a pull request: https://github.com/apache/flink/pull/2712 [FLINK-4445] Add option to ignore unmapped checkpoint state When restoring from a checkpoint/savepoint, state for each operator has to be restored. For savepoints, this means that the user cannot remove an operator from her topology and still use the savepoint. With this change, we will allow to ignore state that cannot be mapped back to the job being restored. The default behaviour does not change. ## Changes - I've removed the `allOrNothingState` flag as it was only effecting non-partitioned operator state and never set to `true` anyways (except tests). The flag controlled whether each non-partitioned operator state was restored. - Moved the savepoint path from the `JobSnapshottingSettings` to the `JobGraph` - Added the `--ignoreUnmappedState` (short `-i`) flag to the run command: `bin/flink run -s -i ...` I've tested this manually by triggering a savepoint for a job, adjusting the job (removing an operator), and then trying to resume from the savepoint. By default, restoring fails, but with the flag everything works. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink 4445-unmatched_state Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2712.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2712 commit dc278c51b2bf1f580a6b4cb1670fb70ac871515f Author: Ufuk Celebi Date: 2016-10-26T16:00:21Z [FLINK-4445] [client] Add ignoreUnmappedState flag to CLI Allow to specify whether a checkpoint restore should ignore checkpoint state that it cannot map to the program. This is exposed via the CLI in the run command: bin/flink run -s -i ... Furthermore, the savepoint restore settings are moved out of the snapshotting settings. commit 57621d30dfc4360c786d557a1a00fb57e2ade372 Author: Ufuk Celebi Date: 2016-10-26T16:05:26Z [FLINK-4445] [checkpointing] Add option to ignore unmapped checkpoint state Allows to ignore checkpoint state that cannot be mapped to a job vertex when restoring. > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586067#comment-15586067 ] Stephan Ewen commented on FLINK-4445: - +1 for option (1) > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430524#comment-15430524 ] Aljoscha Krettek commented on FLINK-4445: - +1 > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430400#comment-15430400 ] Ufuk Celebi commented on FLINK-4445: Thanks Gyula! I agree with this. Furthermore, users would only need to use the flag once in a while, because after restoring with ignored state, newer savepoints can be restored like usual. +1 for option 1. > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4445) Ignore unmatched state when restoring from savepoint
[ https://issues.apache.org/jira/browse/FLINK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430336#comment-15430336 ] Gyula Fora commented on FLINK-4445: --- Hi Ufuk, My personal experience is that it's very easy to run into mistakes when dealing with more complex stateful job such as forget uids on kafka source/sink and other built-in stateful operators. Ignoring the unmatched state by default would be super dangerous and would have caused me serious issues in the past. I think adding a force ignore flag (option 1) would be the good way to go and is also very useful :) Cheers, Gyula > Ignore unmatched state when restoring from savepoint > > > Key: FLINK-4445 > URL: https://issues.apache.org/jira/browse/FLINK-4445 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.1.1 >Reporter: Ufuk Celebi > > When currently submitting a job with a savepoint, we require that all state > is matched to the new job. Many users have noted that this is overly strict. > I would like to loosen this and allow savepoints to be restored without > matching all state. > The following options come to mind: > (1) Keep the current behaviour, but add a flag to allow ignoring state when > restoring, e.g. {{bin/flink -s --ignoreUnmatchedState}}. This > would be non-API breaking. > (2) Ignore unmatched state and continue. Additionally add a flag to be strict > about checking the state, e.g. {{bin/flink -s --strict}}. This > would be API-breaking as the default behaviour would change. Users might be > confused by this because there is no straight forward way to notice that > nothing has been restored. > I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you > think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)