[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2020-04-08 Thread Piotr Nowojski (Jira)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078030#comment-17078030 ] Piotr Nowojski commented on FLINK-13497: The bug was finally fixed by combination of FLINK-16945

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-06 Thread Biao Liu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900790#comment-16900790 ] Biao Liu commented on FLINK-13497: -- To [~pnowojski], yes, my pleasure :) WRT the performance concern,

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-06 Thread Piotr Nowojski (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900776#comment-16900776 ] Piotr Nowojski commented on FLINK-13497: [~SleePy] we are also not aware of any performance

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-06 Thread Biao Liu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900775#comment-16900775 ] Biao Liu commented on FLINK-13497: -- To [~till.rohrmann], thanks for explanation of history. I just

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-06 Thread Till Rohrmann (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900764#comment-16900764 ] Till Rohrmann commented on FLINK-13497: --- The reason why we don't run the {{CheckpointCoordinator}}

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-06 Thread Biao Liu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900758#comment-16900758 ] Biao Liu commented on FLINK-13497: -- To [~pnowojski], I can't agree more. I have another concern, the

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-06 Thread Piotr Nowojski (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900732#comment-16900732 ] Piotr Nowojski commented on FLINK-13497: As far as we (me, [~carp84] and [~StephanEwen])

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-05 Thread Till Rohrmann (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899896#comment-16899896 ] Till Rohrmann commented on FLINK-13497: --- [~pnowojski] I think this issue would benefit from your

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-02 Thread vinoyang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898822#comment-16898822 ] vinoyang commented on FLINK-13497: -- [~SleePy] I discussed with [~yunta]. Yes, your analysis is correct.

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-02 Thread Biao Liu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898775#comment-16898775 ] Biao Liu commented on FLINK-13497: -- The thread model of {{CheckpointCoordinator}} seems to be a bit

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-01 Thread vinoyang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897896#comment-16897896 ] vinoyang commented on FLINK-13497: -- [~SleePy] Your comment is reasonable, I am discussing with [~yunta]

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-08-01 Thread Biao Liu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897891#comment-16897891 ] Biao Liu commented on FLINK-13497: -- There is another problem. The {{failGlobal}} might fail the job in

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-07-31 Thread vinoyang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896988#comment-16896988 ] vinoyang commented on FLINK-13497: -- When I thought deeply, even if we did not introduce FLINK-12364,

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-07-30 Thread vinoyang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896773#comment-16896773 ] vinoyang commented on FLINK-13497: -- [~yunta] I have no objection to stopping the checkpoint scheduler.

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-07-30 Thread Yun Tang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896767#comment-16896767 ] Yun Tang commented on FLINK-13497: -- [~yanghua], there still exists a gap between fail all pending

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-07-30 Thread vinoyang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896752#comment-16896752 ] vinoyang commented on FLINK-13497: -- Currently, the {{CheckpointFailureManager}} choose a simple

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

2019-07-30 Thread Yun Tang (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896390#comment-16896390 ] Yun Tang commented on FLINK-13497: -- One quick fix to this problem is to let