[jira] [Commented] (FLINK-9900) Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
[ https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128038#comment-17128038 ] Till Rohrmann commented on FLINK-9900: -- The latest test failure has been caused by FLINK-16866 because the job submission took longer than 10 seconds. I would be ok with closing this ticket for the moment. If the problem should re-occur, then I would suggest to increase the rpc timeouts for the {{MiniCluster}} as a hotfix (similar to what we did in FLINK-16018). > Fix unstable test > ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles > > > Key: FLINK-9900 > URL: https://issues.apache.org/jira/browse/FLINK-9900 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.5.1, 1.6.0, 1.9.0 >Reporter: zhangminglei >Assignee: Chesnay Schepler >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.10.2, 1.9.4 > > Attachments: mvn-2.log > > Time Spent: 40m > Remaining Estimate: 0h > > https://api.travis-ci.org/v3/job/405843617/log.txt > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec > <<< FAILURE! - in > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase > > testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) > Time elapsed: 120.036 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244) > Results : > Tests in error: > > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 > » TestTimedOut > Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9900) Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
[ https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123719#comment-17123719 ] Chesnay Schepler commented on FLINK-9900: - I would ignore the recent failure for the time being since it occurred in the legacy scheduler. I could not reproduce the failure locally after ~1000 runs. > Fix unstable test > ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles > > > Key: FLINK-9900 > URL: https://issues.apache.org/jira/browse/FLINK-9900 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.5.1, 1.6.0, 1.9.0 >Reporter: zhangminglei >Assignee: Chesnay Schepler >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.9.1, 1.10.0 > > Attachments: mvn-2.log > > Time Spent: 40m > Remaining Estimate: 0h > > https://api.travis-ci.org/v3/job/405843617/log.txt > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec > <<< FAILURE! - in > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase > > testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) > Time elapsed: 120.036 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244) > Results : > Tests in error: > > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 > » TestTimedOut > Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9900) Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
[ https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101335#comment-17101335 ] Biao Liu commented on FLINK-9900: - Thanks [~rmetzger] for reporting. This time, the case failed to submit job to cluster. The cluster didn't start the job within 10 seconds, so timeout happened. It's hard to say which step it got stuck in. The last log of {{JobMaster}} is "Configuring application-defined state backend with job/cluster config". I have attached the relevant log (mvn-2.log). [~trohrmann] do you have any idea? > Fix unstable test > ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles > > > Key: FLINK-9900 > URL: https://issues.apache.org/jira/browse/FLINK-9900 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.5.1, 1.6.0, 1.9.0 >Reporter: zhangminglei >Assignee: Biao Liu >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.9.1, 1.10.0 > > Attachments: mvn-2.log > > Time Spent: 40m > Remaining Estimate: 0h > > https://api.travis-ci.org/v3/job/405843617/log.txt > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec > <<< FAILURE! - in > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase > > testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) > Time elapsed: 120.036 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244) > Results : > Tests in error: > > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 > » TestTimedOut > Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9900) Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
[ https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100963#comment-17100963 ] Till Rohrmann commented on FLINK-9900: -- The current fixes seem to not fully fix the problem [~SleePy]. Do you have time to take another look at the problem Robert reported? > Fix unstable test > ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles > > > Key: FLINK-9900 > URL: https://issues.apache.org/jira/browse/FLINK-9900 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.5.1, 1.6.0, 1.9.0 >Reporter: zhangminglei >Assignee: Biao Liu >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.9.1, 1.10.0 > > Time Spent: 40m > Remaining Estimate: 0h > > https://api.travis-ci.org/v3/job/405843617/log.txt > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec > <<< FAILURE! - in > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase > > testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) > Time elapsed: 120.036 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244) > Results : > Tests in error: > > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 > » TestTimedOut > Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9900) Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
[ https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099721#comment-17099721 ] Robert Metzger commented on FLINK-9900: --- Another instance: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=593=logs=16ccbdb7-2a3e-53da-36eb-fb718edc424a=cf61ce33-6fba-5fbe-2c0c-e41c4013e891 > Fix unstable test > ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles > > > Key: FLINK-9900 > URL: https://issues.apache.org/jira/browse/FLINK-9900 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.5.1, 1.6.0, 1.9.0 >Reporter: zhangminglei >Assignee: Biao Liu >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.9.1, 1.10.0 > > Time Spent: 40m > Remaining Estimate: 0h > > https://api.travis-ci.org/v3/job/405843617/log.txt > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec > <<< FAILURE! - in > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase > > testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) > Time elapsed: 120.036 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244) > Results : > Tests in error: > > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244 > » TestTimedOut > Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29 -- This message was sent by Atlassian Jira (v8.3.4#803005)