[ 
https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908534#comment-16908534
 ] 

Nico Kruber edited comment on FLINK-13020 at 8/15/19 11:00 PM:
---------------------------------------------------------------

Actually, I just encountered this error in a branch of mine which is based on 
[latest 
master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb].
 So either there has been a regression, or the fix does not work in all cases, 
or it is no duplicate afterall:
{code}
17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 14.113 s <<< FAILURE! - in 
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
  Time elapsed: 0.268 s  <<< ERROR!
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
{code}

https://api.travis-ci.com/v3/job/225588484/log.txt

{code}
17:30:17,408 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask          
 - Configuring application-defined state backend with job/cluster config
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Source: Custom Source (2/4) (ffb5e756d6acddab9cab76e2a0a32904) switched from 
DEPLOYING to RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Map (4/4) (79fcf333d4d11eae297b65e52e397658) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Map (2/4) (aedaa4a61e74a3b766fafbef46e6aea6) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Source: Custom Source (4/4) (a1f07e2714e73b2533291a322961ea67) switched from 
DEPLOYING to RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Source: Custom Source (3/4) (6073be38d7be0ee571558f1dc865837a) switched from 
DEPLOYING to RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Map (1/4) (e4bc84d8137769b513d1a5107027500d) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Map (3/4) (6834950d9742da9c6a784ecc5ee892df) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,413 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,414 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,416 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,417 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,423 INFO  org.apache.flink.runtime.taskmanager.Task                    
 - Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from 
DEPLOYING to RUNNING.
17:30:17,423 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask          
 - Using application-defined state backend: MemoryStateBackend (data in heap 
memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', 
asynchronous: UNDEFINED, maxStateSize: 5242880)
17:30:17,423 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask          
 - Configuring application-defined state backend with job/cluster config
17:30:17,424 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from 
DEPLOYING to RUNNING.
17:30:17,425 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Triggering checkpoint 54 @ 1565890217425 for job 
075cea7da1d0690f96c879ae07b058c0.
17:30:17,442 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Decline checkpoint 54 by task 6834950d9742da9c6a784ecc5ee892df of job 
075cea7da1d0690f96c879ae07b058c0 at b57a17ba-32e1-42ad-991d-abf402ea07fa @ 
localhost (dataPort=-1).
17:30:17,442 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Discarding checkpoint 54 of job 075cea7da1d0690f96c879ae07b058c0.
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
        at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyAbortOnCancellationBarrier(BarrierBuffer.java:428)
        at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.processCancellationBarrier(BarrierBuffer.java:327)
        at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.pollNext(BarrierBuffer.java:208)
        at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:102)
        at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:47)
        at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:134)
        at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.performDefaultAction(OneInputStreamTask.java:102)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:268)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:376)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:690)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:520)
        at java.lang.Thread.run(Thread.java:748)
17:30:17,444 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    
 - Received late message for now expired checkpoint attempt 54 from task 
8b302fefb0c10b7fd0b66f4fdb253632 of job 075cea7da1d0690f96c879ae07b058c0 at 
b57a17ba-32e1-42ad-991d-abf402ea07fa @ localhost (dataPort=-1).
17:30:17,445 ERROR 
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest  - 
{code}


was (Author: nicok):
Actually, I just encountered this error in a branch of mine which is based on 
[latest 
master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb].
 So either there has been a regression, or the fix does not work in all cases, 
or it is no duplicate afterall:
{code}
17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 14.113 s <<< FAILURE! - in 
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
  Time elapsed: 0.268 s  <<< ERROR!
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
{code}

https://api.travis-ci.com/v3/job/225588484/log.txt

> UT Failure: ChainLengthDecreaseTest
> -----------------------------------
>
>                 Key: FLINK-13020
>                 URL: https://issues.apache.org/jira/browse/FLINK-13020
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Bowen Li
>            Priority: Major
>
> {code:java}
> 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 19.836 s <<< FAILURE! - in 
> org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
> 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
> 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
>   Time elapsed: 1.501 s  <<< ERROR!
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> ...
> 05:48:27.736 [ERROR] Errors: 
> 05:48:27.736 [ERROR]   
> ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138
>  ยป Execution
> 05:48:27.736 [INFO] 
> {code}
> https://travis-ci.org/apache/flink/jobs/551053821



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to