[ https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908534#comment-16908534 ]
Nico Kruber edited comment on FLINK-13020 at 8/15/19 11:00 PM: --------------------------------------------------------------- Actually, I just encountered this error in a branch of mine which is based on [latest master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb]. So either there has been a regression, or the fix does not work in all cases, or it is no duplicate afterall: {code} 17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.113 s <<< FAILURE! - in org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest 17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) Time elapsed: 0.268 s <<< ERROR! java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs {code} https://api.travis-ci.com/v3/job/225588484/log.txt {code} 17:30:17,408 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Configuring application-defined state backend with job/cluster config 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (2/4) (ffb5e756d6acddab9cab76e2a0a32904) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (4/4) (79fcf333d4d11eae297b65e52e397658) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (2/4) (aedaa4a61e74a3b766fafbef46e6aea6) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (4/4) (a1f07e2714e73b2533291a322961ea67) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (3/4) (6073be38d7be0ee571558f1dc865837a) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (1/4) (e4bc84d8137769b513d1a5107027500d) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (3/4) (6834950d9742da9c6a784ecc5ee892df) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,413 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,414 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,416 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,417 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,423 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from DEPLOYING to RUNNING. 17:30:17,423 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Using application-defined state backend: MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: UNDEFINED, maxStateSize: 5242880) 17:30:17,423 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Configuring application-defined state backend with job/cluster config 17:30:17,424 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from DEPLOYING to RUNNING. 17:30:17,425 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 54 @ 1565890217425 for job 075cea7da1d0690f96c879ae07b058c0. 17:30:17,442 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 54 by task 6834950d9742da9c6a784ecc5ee892df of job 075cea7da1d0690f96c879ae07b058c0 at b57a17ba-32e1-42ad-991d-abf402ea07fa @ localhost (dataPort=-1). 17:30:17,442 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding checkpoint 54 of job 075cea7da1d0690f96c879ae07b058c0. org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyAbortOnCancellationBarrier(BarrierBuffer.java:428) at org.apache.flink.streaming.runtime.io.BarrierBuffer.processCancellationBarrier(BarrierBuffer.java:327) at org.apache.flink.streaming.runtime.io.BarrierBuffer.pollNext(BarrierBuffer.java:208) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:102) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:47) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:134) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.performDefaultAction(OneInputStreamTask.java:102) at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:268) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:376) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:690) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:520) at java.lang.Thread.run(Thread.java:748) 17:30:17,444 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 54 from task 8b302fefb0c10b7fd0b66f4fdb253632 of job 075cea7da1d0690f96c879ae07b058c0 at b57a17ba-32e1-42ad-991d-abf402ea07fa @ localhost (dataPort=-1). 17:30:17,445 ERROR org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest - {code} was (Author: nicok): Actually, I just encountered this error in a branch of mine which is based on [latest master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb]. So either there has been a regression, or the fix does not work in all cases, or it is no duplicate afterall: {code} 17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.113 s <<< FAILURE! - in org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest 17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) Time elapsed: 0.268 s <<< ERROR! java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs {code} https://api.travis-ci.com/v3/job/225588484/log.txt > UT Failure: ChainLengthDecreaseTest > ----------------------------------- > > Key: FLINK-13020 > URL: https://issues.apache.org/jira/browse/FLINK-13020 > Project: Flink > Issue Type: Improvement > Reporter: Bowen Li > Priority: Major > > {code:java} > 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 19.836 s <<< FAILURE! - in > org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest > 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: > 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) > Time elapsed: 1.501 s <<< ERROR! > java.util.concurrent.ExecutionException: > java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > ... > 05:48:27.736 [ERROR] Errors: > 05:48:27.736 [ERROR] > ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138 > ยป Execution > 05:48:27.736 [INFO] > {code} > https://travis-ci.org/apache/flink/jobs/551053821 -- This message was sent by Atlassian JIRA (v7.6.14#76016)