[jira] [Comment Edited] (IGNITE-15300) Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment

2021-09-08 Thread Pavel Pereslegin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411795#comment-17411795
 ] 

Pavel Pereslegin edited comment on IGNITE-15300 at 9/8/21, 8:54 AM:


The test hangs when the restore process is initiated from node 1, whose 
communication is later blocked (and cannot be unblocked).
The test flaky fails due to a state sync issue. We are canceling the process on 
two nodes, but only waiting on the initiator to complete (this has been fixed 
in IGNITE-14794).

It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked 
it on TeamCity (the problem is hardly reproducible locally), [suite started 80+ 
times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=pull%2F9186%2Fhead]:
* Execution timeouts (not related to this issue) - 2 times.
* testBaselineCollectCrd - 6 failures.
* testBaselineCollect - 1 failure.
* testSnapshotRestoreCancelAndStatus - *0* failures.


was (Author: xtern):
The test hangs when the restore process is initiated from node 1, whose 
communication is later blocked (and cannot be unblocked).
The test flaky fails due to a state sync issue. We are canceling the process on 
two nodes, but only waiting on the initiator to complete (this has been fixed 
in IGNITE-14794).

It looks like the patch proposed in IGNITE-14794 fixes this completely.

Checked it on TeamCity (the problem is hardly reproducible locally), [suite 
started 80+ 
times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=pull%2F9186%2Fhead].

Execution timeouts (not related to this issue) - 2 times.
testBaselineCollectCrd - 6 failures.
testBaselineCollect - 1 failure.
testSnapshotRestoreCancelAndStatus - *0* failures.

> Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment
> -
>
> Key: IGNITE-15300
> URL: https://issues.apache.org/jira/browse/IGNITE-15300
> Project: Ignite
>  Issue Type: Test
>Reporter: Maxim Muzafarov
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: iep-43
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci.ignite.apache.org/viewLog.html?buildId=6123288=buildResultsDiv=IgniteTests24Java8_ControlUtilityZookeeper#testNameId-4389213602152674112
> {code}
> [2021-08-09 22:59:49,757][ERROR][main][root] Test failed 
> [test=GridCommandHandlerTest#testSnapshotRestoreCancelAndStatus, 
> duration=16514]
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.ignite.testframework.GridTestUtils.assertContains(GridTestUtils.java:391)
>   at 
> org.apache.ignite.util.GridCommandHandlerTest.testSnapshotRestoreCancelAndStatus(GridCommandHandlerTest.java:3312)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2432)
> {code}
> Sometimes zk suite hangs ([execution 
> timeout|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=%3Cdefault%3E=failed])
>  on this test with the following stacktrace.
> {noformat}
> "rest-#15365%gridCommandHandlerTest0%" #16591 prio=5 os_prio=0 
> tid=0x7f7e7842b800 nid=0x1a79 waiting on condition [0x7f7e30416000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
>   at 
> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:152)
>   

[jira] [Comment Edited] (IGNITE-15300) Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment

2021-09-08 Thread Pavel Pereslegin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411795#comment-17411795
 ] 

Pavel Pereslegin edited comment on IGNITE-15300 at 9/8/21, 8:50 AM:


The test hangs when the restore process is initiated from node 1, whose 
communication is later blocked (and cannot be unblocked).
The test flaky fails due to a state sync issue. We are canceling the process on 
two nodes, but only waiting on the initiator to complete (this has been fixed 
in IGNITE-14794).

It looks like the patch proposed in IGNITE-14794 fixes this completely.

Checked it on TeamCity (the problem is hardly reproducible locally), [suite 
started 80+ 
times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=pull%2F9186%2Fhead].

Execution timeouts (not related to this issue) - 2 times.
testBaselineCollectCrd - 6 failures.
testBaselineCollect - 1 failure.
testSnapshotRestoreCancelAndStatus - *0* failures.


was (Author: xtern):
The test hangs when the restore process is initiated from node 1, whose 
communication is later blocked (and cannot be unblocked).
The test flaky fails due to a state sync issue. We are canceling the process on 
two nodes, but only waiting on the initiator to complete (this has been fixed 
in IGNITE-14794).

It looks like the patch proposed in IGNITE-14794 fixes this completely.

Checked it on TeamCity (the problem is hardly reproducible locally), suite 
started 80+ times.

Execution timeouts (not related to this issue) - 2 times.
testBaselineCollectCrd - 6 failures.
testBaselineCollect - 1 failure.
testSnapshotRestoreCancelAndStatus - *0* failures.

> Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment
> -
>
> Key: IGNITE-15300
> URL: https://issues.apache.org/jira/browse/IGNITE-15300
> Project: Ignite
>  Issue Type: Test
>Reporter: Maxim Muzafarov
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: iep-43
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci.ignite.apache.org/viewLog.html?buildId=6123288=buildResultsDiv=IgniteTests24Java8_ControlUtilityZookeeper#testNameId-4389213602152674112
> {code}
> [2021-08-09 22:59:49,757][ERROR][main][root] Test failed 
> [test=GridCommandHandlerTest#testSnapshotRestoreCancelAndStatus, 
> duration=16514]
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.ignite.testframework.GridTestUtils.assertContains(GridTestUtils.java:391)
>   at 
> org.apache.ignite.util.GridCommandHandlerTest.testSnapshotRestoreCancelAndStatus(GridCommandHandlerTest.java:3312)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2432)
> {code}
> Sometimes zk suite hangs ([execution 
> timeout|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=%3Cdefault%3E=failed])
>  on this test with the following stacktrace.
> {noformat}
> "rest-#15365%gridCommandHandlerTest0%" #16591 prio=5 os_prio=0 
> tid=0x7f7e7842b800 nid=0x1a79 waiting on condition [0x7f7e30416000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
>   at 
> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:152)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreCancelTask$1.execute(SnapshotRestoreCancelTask.java:43)
>   at 
>