[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846874#comment-17846874 ] Ryan Skraba commented on FLINK-33186: - * 1.18 AdaptiveScheduler / Test (module: tests) https://github.com/apache/flink/actions/runs/9088951392/job/24979573762#step:10:7852 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846284#comment-17846284 ] Weijie Guo commented on FLINK-33186: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59529=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8036 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838235#comment-17838235 ] Ryan Skraba commented on FLINK-33186: - 1.20 test_cron_hadoop313 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58958=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8314 1.20 Java 8: Test (module: tests) https://github.com/apache/flink/actions/runs/8719280474/job/23918749100#step:10:8028 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829501#comment-17829501 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/8369823390/job/22916375709#step:10:7894 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829496#comment-17829496 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/8320416262/job/22765302151#step:10:7608 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829488#comment-17829488 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/8304570591/job/22730524813#step:10:7494 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828331#comment-17828331 ] Ryan Skraba commented on FLINK-33186: - * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063] > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820600#comment-17820600 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/8027473900/job/21931656512#step:10:7665 {code} Error: 02:29:11 02:29:11.608 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 7.619 s <<< FAILURE! -- in org.apache.flink.test.checkpointing.CheckpointAfterAllTasksFinishedITCase Error: 02:29:11 02:29:11.609 [ERROR] org.apache.flink.test.checkpointing.CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished -- Time elapsed: 1.601 s <<< ERROR! Feb 24 02:29:11 java.util.concurrent.ExecutionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task local checkpoint failure. Feb 24 02:29:11 at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) Feb 24 02:29:11 at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) Feb 24 02:29:11 at org.apache.flink.test.checkpointing.CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished(CheckpointAfterAllTasksFinishedITCase.java:124) Feb 24 02:29:11 at java.base/java.lang.reflect.Method.invoke(Method.java:568) Feb 24 02:29:11 at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) Feb 24 02:29:11 at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) Feb 24 02:29:11 at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) Feb 24 02:29:11 at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) Feb 24 02:29:11 at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) Feb 24 02:29:11 at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) Feb 24 02:29:11 Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task local checkpoint failure. Feb 24 02:29:11 at org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:554) Feb 24 02:29:11 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2260) Feb 24 02:29:11 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2247) Feb 24 02:29:11 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$triggerCheckpointRequest$9(CheckpointCoordinator.java:817) Feb 24 02:29:11 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) Feb 24 02:29:11 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) Feb 24 02:29:11 at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) Feb 24 02:29:11 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) Feb 24 02:29:11 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) Feb 24 02:29:11 at java.base/java.lang.Thread.run(Thread.java:833) {code} > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816534#comment-17816534 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/7866453155/job/21460933108#step:10:7710 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810806#comment-17810806 ] Matthias Pohl commented on FLINK-33186: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56823=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=8426 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810247#comment-17810247 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/XComp/flink/actions/runs/7632434711/job/20792993223#step:10:8585 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782059#comment-17782059 ] Matthias Pohl commented on FLINK-33186: --- This issue popped up twice in the efforts around moving to Github Actions (FLINK-27075): * https://github.com/XComp/flink/actions/runs/6724579458/job/18277659778#step:12:8453 * https://github.com/XComp/flink/actions/runs/6723279744/job/18273667605#step:12:8453 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773565#comment-17773565 ] Jiang Xin commented on FLINK-33186: --- [~Sergey Nuyanzin] I don't think the failure is related to FLINK-28386 or FLINK-32996. The test code triggered a savepoint after some subtasks were completed, but from the logs, it can be seen that the subtask `passA ->Sink: sinkA (2/4)` went to the finished state right after initiating the savepoint. Then, TaskExecutor believed that it had received a checkpoint request for an unknown task, thus it failed. So I think it is an existing concurrency issue. ``` 01:23:40,823 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - passA -> Sink: sinkA (1/4) (3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_0_0) switc hed from RUNNING to FINISHED. 01:23:40,823 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - passA -> Sink: sinkA (3/4) (3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_2_0) switc hed from RUNNING to FINISHED. 01:23:40,826 [flink-pekko.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Triggering cancel-with-savepoint for job c82d241f9952e043dfed65318f0d962a. 01:23:40,828 [passA -> Sink: sinkA (2/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - passA -> Sink: sinkA (2/4)#0 (3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_1_0) switched from RUNNING to FINISHED. 01:23:40,828 [passA -> Sink: sinkA (2/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Freeing task resources for passA -> Sink: sinkA (2/4)#0 (3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_1_0). 01:23:40,828 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Un-registering task and sending final execution state FINISHED to JobManager for task passA -> Sink: sinkA (2/4)#0 3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_1_0. 01:23:40,829 [flink-pekko.actor.default-dispatcher-4] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - passA -> Sink: sinkA (2/4) (3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_1_0) switched from RUNNING to FINISHED. 01:23:40,829 [ Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 2 (type=SavepointType\{name='Savepoint', postCheckpointAction=NONE, formatType=CANONICAL}) @ 1695864220828 for job c82d241f9952e043dfed65318f0d962a. 01:23:40,917 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering Checkpoint 2 for job c82d241f9952e043dfed65318f0d962a failed due to org.apache.flink.runtime.checkpoint.CheckpointException: TaskManager received a checkpoint request for unknown task 3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_1_0. Failure reason: Task local checkpoint failure. 01:23:40,918 [ Checkpoint Timer] WARN org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to trigger or complete checkpoint 2 for job c82d241f9952e043dfed65318f0d962a. (0 consecutive failed attempts so far) org.apache.flink.runtime.checkpoint.CheckpointException: TaskManager received a checkpoint request for unknown task 3146192e86ef62554451af0e39df80b5_51397532e2d9c7a21097a30d590b3114_1_0. Failure reason: Task local checkpoint failure. at org.apache.flink.runtime.taskexecutor.TaskExecutor.triggerCheckpoint(TaskExecutor.java:1046) ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309) ~[?:?] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) ~[flink-rpc-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168) ~[?:?] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) ~[?:?] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) ~[?:?] at
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772246#comment-17772246 ] Jiang Xin commented on FLINK-33186: --- I'll take a look at it. > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772212#comment-17772212 ] Dong Lin commented on FLINK-33186: -- cc [~xtsong] > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772078#comment-17772078 ] Matthias Pohl commented on FLINK-33186: --- Isn't that a blocker as long it's not determined what the cause is? Just asking because the {{CheckpointAfterAllTasksFinishedITCase}}-related issues FLINK-32996 and FLINK-32907 have a fix version 1.18.0. > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771942#comment-17771942 ] Sergey Nuyanzin commented on FLINK-33186: - [~Jiang Xin], [~lindong] since it is very similar to FLINK-32996 could you please have a look? > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)