[jira] [Comment Edited] (IGNITE-15300) Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment
[ https://issues.apache.org/jira/browse/IGNITE-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411795#comment-17411795 ] Pavel Pereslegin edited comment on IGNITE-15300 at 9/8/21, 8:54 AM: The test hangs when the restore process is initiated from node 1, whose communication is later blocked (and cannot be unblocked). The test flaky fails due to a state sync issue. We are canceling the process on two nodes, but only waiting on the initiator to complete (this has been fixed in IGNITE-14794). It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked it on TeamCity (the problem is hardly reproducible locally), [suite started 80+ times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=pull%2F9186%2Fhead]: * Execution timeouts (not related to this issue) - 2 times. * testBaselineCollectCrd - 6 failures. * testBaselineCollect - 1 failure. * testSnapshotRestoreCancelAndStatus - *0* failures. was (Author: xtern): The test hangs when the restore process is initiated from node 1, whose communication is later blocked (and cannot be unblocked). The test flaky fails due to a state sync issue. We are canceling the process on two nodes, but only waiting on the initiator to complete (this has been fixed in IGNITE-14794). It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked it on TeamCity (the problem is hardly reproducible locally), [suite started 80+ times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=pull%2F9186%2Fhead]. Execution timeouts (not related to this issue) - 2 times. testBaselineCollectCrd - 6 failures. testBaselineCollect - 1 failure. testSnapshotRestoreCancelAndStatus - *0* failures. > Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment > - > > Key: IGNITE-15300 > URL: https://issues.apache.org/jira/browse/IGNITE-15300 > Project: Ignite > Issue Type: Test >Reporter: Maxim Muzafarov >Assignee: Pavel Pereslegin >Priority: Major > Labels: iep-43 > Time Spent: 10m > Remaining Estimate: 0h > > https://ci.ignite.apache.org/viewLog.html?buildId=6123288=buildResultsDiv=IgniteTests24Java8_ControlUtilityZookeeper#testNameId-4389213602152674112 > {code} > [2021-08-09 22:59:49,757][ERROR][main][root] Test failed > [test=GridCommandHandlerTest#testSnapshotRestoreCancelAndStatus, > duration=16514] > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.ignite.testframework.GridTestUtils.assertContains(GridTestUtils.java:391) > at > org.apache.ignite.util.GridCommandHandlerTest.testSnapshotRestoreCancelAndStatus(GridCommandHandlerTest.java:3312) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2432) > {code} > Sometimes zk suite hangs ([execution > timeout|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=%3Cdefault%3E=failed]) > on this test with the following stacktrace. > {noformat} > "rest-#15365%gridCommandHandlerTest0%" #16591 prio=5 os_prio=0 > tid=0x7f7e7842b800 nid=0x1a79 waiting on condition [0x7f7e30416000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:152) >
[jira] [Comment Edited] (IGNITE-15300) Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment
[ https://issues.apache.org/jira/browse/IGNITE-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411795#comment-17411795 ] Pavel Pereslegin edited comment on IGNITE-15300 at 9/8/21, 8:50 AM: The test hangs when the restore process is initiated from node 1, whose communication is later blocked (and cannot be unblocked). The test flaky fails due to a state sync issue. We are canceling the process on two nodes, but only waiting on the initiator to complete (this has been fixed in IGNITE-14794). It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked it on TeamCity (the problem is hardly reproducible locally), [suite started 80+ times|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=pull%2F9186%2Fhead]. Execution timeouts (not related to this issue) - 2 times. testBaselineCollectCrd - 6 failures. testBaselineCollect - 1 failure. testSnapshotRestoreCancelAndStatus - *0* failures. was (Author: xtern): The test hangs when the restore process is initiated from node 1, whose communication is later blocked (and cannot be unblocked). The test flaky fails due to a state sync issue. We are canceling the process on two nodes, but only waiting on the initiator to complete (this has been fixed in IGNITE-14794). It looks like the patch proposed in IGNITE-14794 fixes this completely. Checked it on TeamCity (the problem is hardly reproducible locally), suite started 80+ times. Execution timeouts (not related to this issue) - 2 times. testBaselineCollectCrd - 6 failures. testBaselineCollect - 1 failure. testSnapshotRestoreCancelAndStatus - *0* failures. > Test testSnapshotRestoreCancelAndStatus flaky in Zookeepr SPI environment > - > > Key: IGNITE-15300 > URL: https://issues.apache.org/jira/browse/IGNITE-15300 > Project: Ignite > Issue Type: Test >Reporter: Maxim Muzafarov >Assignee: Pavel Pereslegin >Priority: Major > Labels: iep-43 > Time Spent: 10m > Remaining Estimate: 0h > > https://ci.ignite.apache.org/viewLog.html?buildId=6123288=buildResultsDiv=IgniteTests24Java8_ControlUtilityZookeeper#testNameId-4389213602152674112 > {code} > [2021-08-09 22:59:49,757][ERROR][main][root] Test failed > [test=GridCommandHandlerTest#testSnapshotRestoreCancelAndStatus, > duration=16514] > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.ignite.testframework.GridTestUtils.assertContains(GridTestUtils.java:391) > at > org.apache.ignite.util.GridCommandHandlerTest.testSnapshotRestoreCancelAndStatus(GridCommandHandlerTest.java:3312) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2432) > {code} > Sometimes zk suite hangs ([execution > timeout|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ControlUtilityZookeeper=buildTypeHistoryList_IgniteTests24Java8=%3Cdefault%3E=failed]) > on this test with the following stacktrace. > {noformat} > "rest-#15365%gridCommandHandlerTest0%" #16591 prio=5 os_prio=0 > tid=0x7f7e7842b800 nid=0x1a79 waiting on condition [0x7f7e30416000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreCancelTask$1.execute(SnapshotRestoreCancelTask.java:43) > at >