[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-24 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254741#comment-17254741 ] Yang Wang commented on FLINK-20648: --- [~dmvk] Could you help to verify that this issue is fixed in your

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-23 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254392#comment-17254392 ] Yang Wang commented on FLINK-20648: --- I have attached a PR to fix this issue via starting leader

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-22 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253508#comment-17253508 ] Till Rohrmann commented on FLINK-20648: --- My concern is that introducing special case logic into

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-22 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253417#comment-17253417 ] Yang Wang commented on FLINK-20648: --- I think we could have the similar logics in 

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-22 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253397#comment-17253397 ] Till Rohrmann commented on FLINK-20648: --- How would we handle resetting the checkpoint counter to

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-22 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253385#comment-17253385 ] Yang Wang commented on FLINK-20648: --- Thanks [~xintongsong] for suggestion. Maybe we do not need to

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-21 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253302#comment-17253302 ] Xintong Song commented on FLINK-20648: -- Just to add my two cents. IIUC, the reason we convert a

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-21 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253266#comment-17253266 ] Yang Wang commented on FLINK-20648: --- Yes, the {{KubernetesLeaderElector#run}} is started in a separate

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-21 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252880#comment-17252880 ] Till Rohrmann commented on FLINK-20648: --- Is it because the ConfigMap gets created lazily when the

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-21 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252879#comment-17252879 ] Till Rohrmann commented on FLINK-20648: --- If it turns out to be the same as what I did for

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-21 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252749#comment-17252749 ] Yang Wang commented on FLINK-20648: --- Thanks for your information. But it seems that we could not have

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-21 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252736#comment-17252736 ] Till Rohrmann commented on FLINK-20648: --- As a quick update, I don't intend to fix FLINK-11719 for

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251850#comment-17251850 ] Yang Wang commented on FLINK-20648: --- Yes. We are justing using the {{leaderContenderDescription}}(aka

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251821#comment-17251821 ] Till Rohrmann commented on FLINK-20648: --- The address of the contender does not need to be known

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251637#comment-17251637 ] Yang Wang commented on FLINK-20648: --- Hmm. Maybe we could build the rpc endpoint id for JobMaster

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251634#comment-17251634 ] Yang Wang commented on FLINK-20648: --- [~trohrmann] Thanks for your comments. I am afraid we could not

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-18 Thread Till Rohrmann (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251620#comment-17251620 ] Till Rohrmann commented on FLINK-20648: --- I am currently working on FLINK-11719 which would resolve

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-17 Thread Yang Wang (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251523#comment-17251523 ] Yang Wang commented on FLINK-20648: --- [~dmvk] Thanks for creating this issue and debugging the root

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-17 Thread Xintong Song (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251444#comment-17251444 ] Xintong Song commented on FLINK-20648: -- [~fly_in_gis], could you help look into this? > Unable to

[jira] [Commented] (FLINK-20648) Unable to restore job from savepoint when using Kubernetes based HA services

2020-12-17 Thread Jira
[ https://issues.apache.org/jira/browse/FLINK-20648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250945#comment-17250945 ] David Morávek commented on FLINK-20648: --- Possible solution is outlined here: