[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-16 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909013#comment-16909013
 ] 

Till Rohrmann commented on FLINK-9900:
--

Great to hear [~SleePy]. I'll take a look at your PR.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-16 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908843#comment-16908843
 ] 

Biao Liu commented on FLINK-9900:
-

Hi [~till.rohrmann], there is another race condition of this case. However the 
good news is that it's just about the case, not a bug.
I have started a PR to fix it.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-15 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907873#comment-16907873
 ] 

Biao Liu commented on FLINK-9900:
-

Hi [~till.rohrmann], thanks for feedback.

It seems to be a different scenario of this case. The stack is different with 
the prior one.

I'll check it later. Let's keep the "reopened" status for now.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-15 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907864#comment-16907864
 ] 

Till Rohrmann commented on FLINK-9900:
--

You are right [~SleePy]. However I found another instance from the latest 
master cron job where it failed: 
https://api.travis-ci.org/v3/job/571776566/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-14 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907343#comment-16907343
 ] 

Biao Liu commented on FLINK-9900:
-

Hi [~till.rohrmann], I have just checked the failed building. This PR is a bit 
old. It does not include the fixing on this ticket. Could you rebase master 
branch and try it again?

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-14 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907287#comment-16907287
 ] 

Biao Liu commented on FLINK-9900:
-

[~till.rohrmann] I will check it later.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-08-01 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897844#comment-16897844
 ] 

Biao Liu commented on FLINK-9900:
-

Another instance, https://travis-ci.com/flink-ci/flink/jobs/221451189.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-30 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896376#comment-16896376
 ] 

Till Rohrmann commented on FLINK-9900:
--

Another instance: https://api.travis-ci.com/v3/job/220853113/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-30 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896182#comment-16896182
 ] 

Till Rohrmann commented on FLINK-9900:
--

I think this test instability actually points towards an inconsistency which we 
introduced with FLINK-12364. It seems that it is now possible that we complete 
checkpoints after the {{CheckpointFailureManager}} has decided that the job 
should fail. I think this is a somewhat unexpected behavior.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-30 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895873#comment-16895873
 ] 

Biao Liu commented on FLINK-9900:
-

Hi Gordon, I think it's already under "In progress". I am not in office 
currently. I updated the state through my phone. I'll make sure later when I'm 
back to office.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-30 Thread Tzu-Li (Gordon) Tai (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895827#comment-16895827
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-9900:


[~azagrebin] could you move this ticket to "In Progress"? That way we will be 
able to see that it is actively worked on in the 1.9 burndown chart.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Biao Liu
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-29 Thread Biao Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895793#comment-16895793
 ] 

Biao Liu commented on FLINK-9900:
-

Hi [~till.rohrmann], I have built a PR to fix this. The unstable scenario is a 
bit complicated. I described the details in PR, 
https://github.com/apache/flink/pull/9269.


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-29 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895155#comment-16895155
 ] 

Till Rohrmann commented on FLINK-9900:
--

Made a blocker issue to understand what the cause is.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-29 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895150#comment-16895150
 ] 

Stephan Ewen commented on FLINK-9900:
-

Another instance: https://api.travis-ci.org/v3/job/564847877/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894711#comment-16894711
 ] 

Congxian Qiu(klion26) commented on FLINK-9900:
--

Another instance https://travis-ci.com/flink-ci/flink/jobs/220237256

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-26 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893912#comment-16893912
 ] 

Till Rohrmann commented on FLINK-9900:
--

Another instance: https://api.travis-ci.org/v3/job/563889124/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-22 Thread Andrey Zagrebin (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889976#comment-16889976
 ] 

Andrey Zagrebin commented on FLINK-9900:


Looks like one more: [https://api.travis-ci.com/v3/job/217584395/log.txt]

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.9.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-08 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880415#comment-16880415
 ] 

Till Rohrmann commented on FLINK-9900:
--

Another instance: https://api.travis-ci.org/v3/job/554991848/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.7.3
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-07-08 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880107#comment-16880107
 ] 

Till Rohrmann commented on FLINK-9900:
--

Another instance: https://api.travis-ci.org/v3/job/555252049/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.7.3
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-06-21 Thread Yun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869224#comment-16869224
 ] 

Yun Tang commented on FLINK-9900:
-

Another instance which exceed 300 seconds 
[https://api.travis-ci.org/v3/job/548326670/log.txt]

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0, 1.9.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.7.3
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-06-17 Thread Dawid Wysakowicz (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865394#comment-16865394
 ] 

Dawid Wysakowicz commented on FLINK-9900:
-

I saw this test failing also with 1.9 master: 
https://api.travis-ci.org/v3/job/546443720/log.txt

The cause might be different though, as I got a different error.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.7.3
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-02-21 Thread Stefan Richter (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774231#comment-16774231
 ] 

Stefan Richter commented on FLINK-9900:
---

Saw another instance here: https://api.travis-ci.org/v3/job/495933065/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.8.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-02-13 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766966#comment-16766966
 ] 

Chesnay Schepler commented on FLINK-9900:
-

[~sunjincheng121] No, because these only added more debugging details and did 
not fix the underlying issue.

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.4, 1.7.3, 1.8.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2019-02-13 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766913#comment-16766913
 ] 

sunjincheng commented on FLINK-9900:


Can we close the JIRA due to the PRs has been merged. [~Zentol] [~mingleizhang] 
?

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.4, 1.7.3, 1.8.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586104#comment-16586104
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

zentol closed pull request #6468: [FLINK-9900][tests] Include more information 
on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 
b/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
index b83f89e4ca5..642af40aaf2 100644
--- 
a/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
+++ 
b/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
@@ -50,6 +50,7 @@
 import org.apache.flink.streaming.api.functions.source.SourceFunction;
 import org.apache.flink.test.util.MiniClusterResource;
 import org.apache.flink.test.util.MiniClusterResourceConfiguration;
+import org.apache.flink.util.ExceptionUtils;
 import org.apache.flink.util.Preconditions;
 import org.apache.flink.util.TestLogger;
 
@@ -62,6 +63,11 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.lang.management.ManagementFactory;
+import java.lang.management.ThreadInfo;
+import java.lang.management.ThreadMXBean;
 import java.nio.file.FileVisitResult;
 import java.nio.file.Files;
 import java.nio.file.Path;
@@ -163,7 +169,7 @@ public static void tearDown() throws Exception {
 *   restored successfully
 * 
 */
-   @Test(timeout = 120_000L)
+   @Test
public void testRestoreBehaviourWithFaultyStateHandles() throws 
Exception {

CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.set(1);
CheckpointBlockingFunction.successfulRestores.set(0);
@@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+ 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579495#comment-16579495
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

StefanRRichter commented on a change in pull request #6468: [FLINK-9900][tests] 
Include more information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#discussion_r209881971
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+   final Thread.State state = threadInfo.getThreadState();
+   dump.append(System.lineSeparator());
+   dump.append("   java.lang.Thread.State: ");
+   dump.append(state);
+   final StackTraceElement[] stackTraceElements = 
threadInfo.getStackTrace();
+   for (final StackTraceElement stackTraceElement : 
stackTraceElements) {
 
 Review comment:
   Fine with me, especially when we consider that the code is probably not 
permanently there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579487#comment-16579487
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

zentol commented on a change in pull request #6468: [FLINK-9900][tests] Include 
more information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#discussion_r209879523
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+   final Thread.State state = threadInfo.getThreadState();
+   dump.append(System.lineSeparator());
+   dump.append("   java.lang.Thread.State: ");
+   dump.append(state);
+   final StackTraceElement[] stackTraceElements = 
threadInfo.getStackTrace();
+   for (final StackTraceElement stackTraceElement : 
stackTraceElements) {
 
 Review comment:
   The output is nicer if we go with the original option:
   without Throwable (i.e. current PR revision):
   ```
   "Monitor Ctrl-Break"
  java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
   at java.net.SocketInputStream.read(SocketInputStream.java:170)
   at java.net.SocketInputStream.read(SocketInputStream.java:141)
   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
   at java.io.InputStreamReader.read(InputStreamReader.java:184)
   at java.io.BufferedReader.fill(BufferedReader.java:161)
   at java.io.BufferedReader.readLine(BufferedReader.java:324)
   at java.io.BufferedReader.readLine(BufferedReader.java:389)
   at 
com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:64)
   ```
   with throwable:
   ```
   "Monitor Ctrl-Break"
  java.lang.Thread.State: RUNNABLE
   java.lang.Throwable
at java.net.SocketInputStream.socketRead0(Native 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579434#comment-16579434
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

zentol commented on a change in pull request #6468: [FLINK-9900][tests] Include 
more information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#discussion_r209867134
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+   final Thread.State state = threadInfo.getThreadState();
+   dump.append(System.lineSeparator());
+   dump.append("   java.lang.Thread.State: ");
+   dump.append(state);
+   final StackTraceElement[] stackTraceElements = 
threadInfo.getStackTrace();
+   for (final StackTraceElement stackTraceElement : 
stackTraceElements) {
 
 Review comment:
   ah ok, that makes sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579432#comment-16579432
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

StefanRRichter commented on a change in pull request #6468: [FLINK-9900][tests] 
Include more information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#discussion_r209866827
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+   final Thread.State state = threadInfo.getThreadState();
+   dump.append(System.lineSeparator());
+   dump.append("   java.lang.Thread.State: ");
+   dump.append(state);
+   final StackTraceElement[] stackTraceElements = 
threadInfo.getStackTrace();
+   for (final StackTraceElement stackTraceElement : 
stackTraceElements) {
 
 Review comment:
   Sure, but you could use it in the loop that iterates all thread info objects 
to create the trace string per thread, which you can set via the 
`Throwable.setStackTrace` before printing?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579425#comment-16579425
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

zentol commented on a change in pull request #6468: [FLINK-9900][tests] Include 
more information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#discussion_r209865380
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+   final Thread.State state = threadInfo.getThreadState();
+   dump.append(System.lineSeparator());
+   dump.append("   java.lang.Thread.State: ");
+   dump.append(state);
+   final StackTraceElement[] stackTraceElements = 
threadInfo.getStackTrace();
+   for (final StackTraceElement stackTraceElement : 
stackTraceElements) {
 
 Review comment:
   but doesn't that just print the exception stack trace, whereas this method 
prints a full thread dump?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576188#comment-16576188
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

StefanRRichter commented on a change in pull request #6468: [FLINK-9900][tests] 
Include more information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#discussion_r209236626
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -256,13 +262,54 @@ public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) throws IO
() -> clusterClient.getJobStatus(jobID),
Time.milliseconds(50),
deadline,
-   (jobStatus) -> jobStatus == JobStatus.FINISHED,
+   JobStatus::isGloballyTerminalState,
TestingUtils.defaultScheduledExecutor());
-   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   try {
+   assertEquals(JobStatus.FINISHED, jobStatusFuture.get());
+   } catch (Throwable e) {
+   // include additional debugging information
+   StringWriter error = new StringWriter();
+   try (PrintWriter out = new PrintWriter(error)) {
+   out.println("The job did not finish in time.");
+   
out.println("allowedInitializeCallsWithoutRestore= " + 
CheckpointBlockingFunction.allowedInitializeCallsWithoutRestore.get());
+   out.println("illegalRestores= " + 
CheckpointBlockingFunction.illegalRestores.get());
+   out.println("successfulRestores= " + 
CheckpointBlockingFunction.successfulRestores.get());
+   out.println("afterMessWithZooKeeper= " + 
CheckpointBlockingFunction.afterMessWithZooKeeper.get());
+   out.println("failedAlready= " + 
CheckpointBlockingFunction.failedAlready.get());
+   out.println("currentJobStatus= " + 
clusterClient.getJobStatus(jobID).get());
+   out.println("numRestarts= " + 
RestartReporter.numRestarts.getValue());
+   out.println("threadDump= " + 
generateThreadDump());
+   }
+   throw new AssertionError(error.toString(), 
ExceptionUtils.stripCompletionException(e));
+   }
 
assertThat("We saw illegal restores.", 
CheckpointBlockingFunction.illegalRestores.get(), is(0));
}
 
+   private static String generateThreadDump() {
+   final StringBuilder dump = new StringBuilder();
+   final ThreadMXBean threadMXBean = 
ManagementFactory.getThreadMXBean();
+   final ThreadInfo[] threadInfos = 
threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
+   for (ThreadInfo threadInfo : threadInfos) {
+   dump.append('"');
+   dump.append(threadInfo.getThreadName());
+   dump.append('"');
+   final Thread.State state = threadInfo.getThreadState();
+   dump.append(System.lineSeparator());
+   dump.append("   java.lang.Thread.State: ");
+   dump.append(state);
+   final StackTraceElement[] stackTraceElements = 
threadInfo.getStackTrace();
+   for (final StackTraceElement stackTraceElement : 
stackTraceElements) {
 
 Review comment:
   Could probably use `Throwable.printStackTrace(PrintWriter)` instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575789#comment-16575789
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

yanghua commented on issue #6468: [FLINK-9900][tests] Include more information 
on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#issuecomment-411985618
 
 
   hi @zentol , checkstyle error : 
   
   ```
   
src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java:[312]
 (regexp) RegexpSingleline: Trailing whitespace
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.1, 1.7.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565539#comment-16565539
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

yanghua commented on issue #6468: [FLINK-9900][tests] Include more information 
on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468#issuecomment-409626058
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565083#comment-16565083
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

zentol opened a new pull request #6468: [FLINK-9900][tests] Include more 
information on timeout in Zookeeper HA ITCase
URL: https://github.com/apache/flink/pull/6468
 
 
   ## What is the purpose of the change
   
   With this PR we include more debugging information in the error message if 
the `ZookeeperHighAvailabilityITCase` times out.
   
   ## Brief change log
   
   * remove `Timeout` annotation to a) deduplicate timeout logics and b) allow 
addition of new logic
   * add utility method creating thread dump
   * on failure to retrieve job information or if the job isn't in `FINISHED` 
state, print more debug info


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.3, 1.6.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561643#comment-16561643
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

zentol closed pull request #6395: [FLINK-9900][tests] Harden 
ZooKeeperHighAvailabilityITCase
URL: https://github.com/apache/flink/pull/6395
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 
b/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
index e02ed010242..b83f89e4ca5 100644
--- 
a/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
+++ 
b/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
@@ -31,7 +31,12 @@
 import org.apache.flink.configuration.HighAvailabilityOptions;
 import org.apache.flink.configuration.TaskManagerOptions;
 import org.apache.flink.core.testutils.OneShotLatch;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.metrics.MetricConfig;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.metrics.reporter.MetricReporter;
 import org.apache.flink.runtime.concurrent.FutureUtils;
+import 
org.apache.flink.runtime.executiongraph.metrics.NumberOfFullRestartsGauge;
 import org.apache.flink.runtime.jobgraph.JobGraph;
 import org.apache.flink.runtime.jobgraph.JobStatus;
 import org.apache.flink.runtime.state.FunctionInitializationContext;
@@ -56,6 +61,12 @@
 import org.junit.rules.TemporaryFolder;
 
 import java.io.File;
+import java.io.IOException;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
 import java.time.Duration;
 import java.util.UUID;
 import java.util.concurrent.CompletableFuture;
@@ -63,6 +74,7 @@
 import java.util.concurrent.atomic.AtomicInteger;
 
 import static org.hamcrest.core.Is.is;
+import static org.hamcrest.number.OrderingComparison.greaterThan;
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertNotNull;
 import static org.junit.Assert.assertThat;
@@ -107,6 +119,10 @@ public static void setup() throws Exception {
config.setString(HighAvailabilityOptions.HA_ZOOKEEPER_QUORUM, 
zkServer.getConnectString());
config.setString(HighAvailabilityOptions.HA_MODE, "zookeeper");
 
+   config.setString(
+   ConfigConstants.METRICS_REPORTER_PREFIX + "restarts." + 
ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX,
+   RestartReporter.class.getName());
+
// we have to manage this manually because we have to create 
the ZooKeeper server
// ahead of this
miniClusterResource = new MiniClusterResource(
@@ -184,58 +200,59 @@ public void testRestoreBehaviourWithFaultyStateHandles() 
throws Exception {
// wait until we did some checkpoints
waitForCheckpointLatch.await();
 
+   log.debug("Messing with HA directory");
// mess with the HA directory so that the job cannot restore
File movedCheckpointLocation = TEMPORARY_FOLDER.newFolder();
-   int numCheckpoints = 0;
-   File[] files = haStorageDir.listFiles();
-   assertNotNull(files);
-   for (File file : files) {
-   if (file.getName().startsWith("completedCheckpoint")) {
-   assertTrue(file.renameTo(new 
File(movedCheckpointLocation, file.getName(;
-   numCheckpoints++;
+   AtomicInteger numCheckpoints = new AtomicInteger();
+   Files.walkFileTree(haStorageDir.toPath(), new 
SimpleFileVisitor() {
+   @Override
+   public FileVisitResult visitFile(Path file, 
BasicFileAttributes attrs) {
+   if 
(file.getFileName().toString().startsWith("completedCheckpoint")) {
+   log.debug("Moving original checkpoint 
file {}.", file);
+   try {
+   Files.move(file, 
movedCheckpointLocation.toPath().resolve(file.getFileName()));
+   
numCheckpoints.incrementAndGet();
+   } catch (IOException ioe) {
+   // previous checkpoint files 
may be deleted asynchronously
+  

[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-07-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559495#comment-16559495
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

StefanRRichter commented on a change in pull request #6395: [FLINK-9900][tests] 
Harden ZooKeeperHighAvailabilityITCase
URL: https://github.com/apache/flink/pull/6395#discussion_r205721646
 
 

 ##
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##
 @@ -107,6 +119,8 @@ public static void setup() throws Exception {
config.setString(HighAvailabilityOptions.HA_ZOOKEEPER_QUORUM, 
zkServer.getConnectString());
config.setString(HighAvailabilityOptions.HA_MODE, "zookeeper");
 
+   config.setString(ConfigConstants.METRICS_REPORTER_PREFIX + 
"restarts." + ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, 
RestartReporter.class.getName());
 
 Review comment:
   Break down line in two.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552939#comment-16552939
 ] 

ASF GitHub Bot commented on FLINK-9900:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/6395

[FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase

## What is the purpose of the change

This PR makes a few modifications to the `ZooKeeperHighAvailabilityITCase` 
to reduce the chances for intermittent test failures and timeouts.

Changes:
## 1)
The test was moving files out of the HA storage directory with a simple 
loop using `File#renameTo`. The test enforced that the moving is successful, 
however since old checkpoints may be deleted asynchronously this may not always 
be the case.
We now use a `FileVisitor` and ignore `IOExceptions` that occur while 
moving.
If no checkpoint file could be moved the test will still fail.

## 2)
After the checkpoint files were moved out of the HA storage directory the 
job is thrown into a restart loop. To verify the restart behavior the test was 
polling the job state and checked for the `RESTARTING` and `FAILING` states.
Due to the small size the job is in these states only for a short time, 
effectively adding a race condition. Thus this loop mayrun for longer than 
anticipated; the largest outlier i got locally was 50 seconds which isn't 
_that_ for off from the 2 minute timeout. I suspect this to be the failure 
cause raised in the JIRA, but I can't guarantee it.
Instead we now access the `fullRestarts` metric using a custom reporter to 
check how many restarts have occurred. The actual _state transitions_ should be 
irrelevant to the test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 9900

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6395


commit b8827dc3723558c52ad567bf88f24ae34129ea08
Author: zentol 
Date:   2018-07-23T14:21:32Z

[FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase




> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9900) Failed to testRestoreBehaviourWithFaultyStateHandles (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)

2018-07-22 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552137#comment-16552137
 ] 

Till Rohrmann commented on FLINK-9900:
--

Another instance: https://api.travis-ci.org/v3/job/406882682/log.txt

> Failed to testRestoreBehaviourWithFaultyStateHandles 
> (org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase) 
> ---
>
> Key: FLINK-9900
> URL: https://issues.apache.org/jira/browse/FLINK-9900
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.6.0
>Reporter: zhangminglei
>Priority: Critical
> Fix For: 1.6.0
>
>
> https://api.travis-ci.org/v3/job/405843617/log.txt
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 124.598 sec 
> <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase
>  
> testRestoreBehaviourWithFaultyStateHandles(org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase)
>  Time elapsed: 120.036 sec <<< ERROR!
>  org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.test.checkpointing.ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles(ZooKeeperHighAvailabilityITCase.java:244)
> Results :
> Tests in error: 
>  
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles:244
>  » TestTimedOut
> Tests run: 1453, Failures: 0, Errors: 1, Skipped: 29



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)