[jira] [Commented] (FLINK-12514) Refactor the failure checkpoint counting mechanism with ordered checkpoint id
[ https://issues.apache.org/jira/browse/FLINK-12514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323277#comment-17323277 ] Flink Jira Bot commented on FLINK-12514: This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > Refactor the failure checkpoint counting mechanism with ordered checkpoint id > - > > Key: FLINK-12514 > URL: https://issues.apache.org/jira/browse/FLINK-12514 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.9.0 >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available, stale-assigned > Time Spent: 10m > Remaining Estimate: 0h > > Currently, the checkpoint failure manager uses a simple counting mechanism > which does not tract checkpoint id sequence. > However, a more graceful counting mechanism is based on ordered checkpoint id > sequence. > It should be refactored after the FLINK-12364 would been merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12514) Refactor the failure checkpoint counting mechanism with ordered checkpoint id
[ https://issues.apache.org/jira/browse/FLINK-12514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909089#comment-16909089 ] vinoyang commented on FLINK-12514: -- [~pnowojski] I will give a more detailed description. I said this is a refactor, because I have implemented a simple counting mechanism based on {{AtomicInteger}}. The context of this idea comes from the PR of FLINK-12364, Stefan proposed it. Whatever, I totally agree with your comment. And rework the title and description. > Refactor the failure checkpoint counting mechanism with ordered checkpoint id > - > > Key: FLINK-12514 > URL: https://issues.apache.org/jira/browse/FLINK-12514 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.9.0 >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, the checkpoint failure manager uses a simple counting mechanism > which does not tract checkpoint id sequence. > However, a more graceful counting mechanism is based on ordered checkpoint id > sequence. > It should be refactored after the FLINK-12364 would been merged. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-12514) Refactor the failure checkpoint counting mechanism with ordered checkpoint id
[ https://issues.apache.org/jira/browse/FLINK-12514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909072#comment-16909072 ] Piotr Nowojski commented on FLINK-12514: I've briefly checked your PR and it's adding even more locking. I think whatever we do about this feature, it has to be done after refactoring FLINK-13698 and fixing bugs like FLINK-13497 caused by FLINK-12364. Can you also add better description in this ticket, what this change is about (what is it trying to fix/improve)? Also I think the title is misleading, as this is not a refactor, but a new feature. > Refactor the failure checkpoint counting mechanism with ordered checkpoint id > - > > Key: FLINK-12514 > URL: https://issues.apache.org/jira/browse/FLINK-12514 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, the checkpoint failure manager uses a simple counting mechanism > which does not tract checkpoint id sequence. > However, a more graceful counting mechanism is based on ordered checkpoint id > sequence. > It should be refactored after the FLINK-12364 would been merged. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-12514) Refactor the failure checkpoint counting mechanism with ordered checkpoint id
[ https://issues.apache.org/jira/browse/FLINK-12514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867498#comment-16867498 ] vinoyang commented on FLINK-12514: -- Hi [~pnowojski] , in the last few days [~srichter] as my mentor to review the PR of FLINK-12364. Now it is been merged. Thanks for [~srichter]'s efforts! In that PR, we discussed and agreed that we need a more reasonable mechanism of counting for concurrent failed checkpoint ids. We need to consider the checkpoint id sequence. I sincerely invite you as my mentor about this issue. WDYT? cc [~till.rohrmann] > Refactor the failure checkpoint counting mechanism with ordered checkpoint id > - > > Key: FLINK-12514 > URL: https://issues.apache.org/jira/browse/FLINK-12514 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Currently, the checkpoint failure manager uses a simple counting mechanism > which does not tract checkpoint id sequence. > However, a more graceful counting mechanism is based on ordered checkpoint id > sequence. > It should be refactored after the FLINK-12364 would been merged. -- This message was sent by Atlassian JIRA (v7.6.3#76005)