[ 
https://issues.apache.org/jira/browse/IGNITE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372547#comment-16372547
 ] 

Vyacheslav Daradur edited comment on IGNITE-5910 at 2/22/18 9:50 AM:
---------------------------------------------------------------------

I have investigated the issue and found that stopping node in separate JVM may 
stuck thread or leave system process alive after test finished.


 The main reason is {{StopGridTask}} that we send from node in local JVM to 
node in separate JVM via remote computing. We send job synchronously to be sure 
that node will be stopped, but job calls synchronously 
{{G.stop(igniteInstanceName, cancel))}} with {{cancel = false}}, that means 
node must wait to compute jobs before it goes down what leads to some kind of 
deadlock. Using of {{cancel = true}} would solve the issue but may break some 
tests’ logic, for this reason, I've reworked the method’s synchronization logic.


 We have not noticed that before because we use only {{stopAllGrids()}} in out 
tests which stop local JVM without waiting for nodes in other JVMs.


 I believe this fix should reduce the number of flaky tests on TeamCity, 
especially which fails because of a cluster from the previous test has not been 
stopped properly.

[Ci.tests|https://ci.ignite.apache.org/viewLog.html?buildId=1105939] look a bit 
better than in master.

[~dpavlov], could you please review [the prepared 
PR|https://github.com/apache/ignite/pull/2382]?


was (Author: daradurvs):
I have investigated the issue and found that stopping node in separate JVM may 
stuck thread or leave system process alive after test finished.
The main reason is {{StopGridTask}} that we send from node in local JVM to node 
in separate JVM via remote computing.
We send job synchronously to be sure that node will be stopped, but job calls 
synchronously {{G.stop(igniteInstanceName, cancel))}} with {{cancel = false}}, 
that means node must wait to compute jobs before it goes down what leads to 
some kind of deadlock. Using of {{cancel = true}} would solve the issue but may 
break some tests’ logic, for this reason, I've reworked the method’s 
synchronization logic.
We have not noticed that before because we use only {{stopAllGrids()}} in out 
tests which stop local JVM without waiting for nodes in other JVMs.
I believe this fix should reduce the number of flaky tests on TeamCity, 
especially which fails because of a cluster from the previous test has not been 
stopped properly.

[Ci.tests|https://ci.ignite.apache.org/viewLog.html?buildId=1105939] look a bit 
better than in master.

[~dpavlov], could you please review [the prepared 
PR|https://github.com/apache/ignite/pull/2382]?

> Method stopGrid(name) doesn't work in multiJvm mode
> ---------------------------------------------------
>
>                 Key: IGNITE-5910
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5910
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Vyacheslav Daradur
>            Assignee: Vyacheslav Daradur
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, tests
>             Fix For: 2.5
>
>
> {code:title=Exception at call}
> java.lang.ClassCastException: 
> org.apache.ignite.testframework.junits.multijvm.IgniteProcessProxy cannot be 
> cast to org.apache.ignite.internal.IgniteKernal
> {code}
> {code:title=Reproducer snippet}
>     /** {@inheritDoc} */
>     @Override protected boolean isMultiJvm() {
>         return true;
>     }
>     /**
>      * @throws Exception If failed.
>      */
>     public void testGrid() throws Exception {
>         try {
>             startGrid(0);
>             startGrid(1);
>         }
>         finally {
>             stopGrid(1);
>             stopGrid(0);
>         }
>     }
> {code}
> *UPD:* It is necessary to fix possibility of hangup of a system thread of 
> separate JVM at Ignite's node shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to