[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.

2018-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512697#comment-16512697
 ] 

ASF GitHub Bot commented on IGNITE-8751:


Github user agura closed the pull request at:

https://github.com/apache/ignite/pull/4171


> Possible race on node segmentation.
> ---
>
> Key: IGNITE-8751
> URL: https://issues.apache.org/jira/browse/IGNITE-8751
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Andrew Mashenkov
>Assignee: Andrey Gura
>Priority: Major
> Fix For: 2.6
>
>
> Segmentation policy may be ignored, probably, due to a race.
> See [1] for details.
>  [1] 
> [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html]
> Logs from segmented node.
> [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 
> [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local 
> node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, 
> addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, 
> /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, 
> lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, 
> isClient=false] 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=class 
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 
> java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated 
> unexpectedly. 
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>  
>         at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately 
> due to the failure: [failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.

2018-06-13 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511245#comment-16511245
 ] 

Alexey Goncharuk commented on IGNITE-8751:
--

Changes look good to me.

> Possible race on node segmentation.
> ---
>
> Key: IGNITE-8751
> URL: https://issues.apache.org/jira/browse/IGNITE-8751
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Andrew Mashenkov
>Assignee: Andrey Gura
>Priority: Major
> Fix For: 2.6
>
>
> Segmentation policy may be ignored, probably, due to a race.
> See [1] for details.
>  [1] 
> [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html]
> Logs from segmented node.
> [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 
> [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local 
> node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, 
> addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, 
> /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, 
> lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, 
> isClient=false] 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=class 
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 
> java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated 
> unexpectedly. 
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>  
>         at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately 
> due to the failure: [failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.

2018-06-13 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510928#comment-16510928
 ] 

Andrey Gura commented on IGNITE-8751:
-

TC looks good: 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8_IgniteTests24Java8=pull%2F4171%2Fhead
Please review.

> Possible race on node segmentation.
> ---
>
> Key: IGNITE-8751
> URL: https://issues.apache.org/jira/browse/IGNITE-8751
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Andrew Mashenkov
>Assignee: Andrey Gura
>Priority: Major
> Fix For: 2.6
>
>
> Segmentation policy may be ignored, probably, due to a race.
> See [1] for details.
>  [1] 
> [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html]
> Logs from segmented node.
> [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 
> [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local 
> node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, 
> addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, 
> /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, 
> lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, 
> isClient=false] 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=class 
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 
> java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated 
> unexpectedly. 
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>  
>         at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately 
> due to the failure: [failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.

2018-06-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507004#comment-16507004
 ] 

ASF GitHub Bot commented on IGNITE-8751:


GitHub user agura opened a pull request:

https://github.com/apache/ignite/pull/4171

IGNITE-8751 Failure handler accordingly to segmentation policy should be 
invoked on node segmentation instead of configured failure handler

… be invoked on node segmentation instead of configured failure handler

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agura/incubator-ignite ignite-8751

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4171


commit 56f02086cdabe05af5001fb228406935ca000994
Author: Andrey Gura 
Date:   2018-06-09T13:37:49Z

IGNITE-8751 Failure handler accordingly to segmentation policy should be 
invoked on node segmentation instead of configured failure handler




> Possible race on node segmentation.
> ---
>
> Key: IGNITE-8751
> URL: https://issues.apache.org/jira/browse/IGNITE-8751
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Andrew Mashenkov
>Assignee: Andrey Gura
>Priority: Major
> Fix For: 2.6
>
>
> Segmentation policy may be ignored, probably, due to a race.
> See [1] for details.
>  [1] 
> [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html]
> Logs from segmented node.
> [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 
> [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local 
> node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, 
> addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, 
> /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, 
> lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, 
> isClient=false] 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=class 
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 
> java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated 
> unexpectedly. 
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>  
>         at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately 
> due to the failure: [failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.

2018-06-08 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506083#comment-16506083
 ] 

Andrey Gura commented on IGNITE-8751:
-

It isn't race. {{tcp-disco-srvr}} is interrupted earlier than segmentation 
policy handles segmentation. See 
{{org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.DiscoveryWorker#onSegmentation}}
 where we first disconnect SPI and then handle segmentation.

It seems could be fixed by adding check on SPI state in exception handler of 
{{tcp-disco-srvr}}.

> Possible race on node segmentation.
> ---
>
> Key: IGNITE-8751
> URL: https://issues.apache.org/jira/browse/IGNITE-8751
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Andrew Mashenkov
>Assignee: Andrey Gura
>Priority: Major
> Fix For: 2.6
>
>
> Segmentation policy may be ignored, probably, due to a race.
> See [1] for details.
>  [1] 
> [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html]
> Logs from segmented node.
> [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished 
> serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 
> [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local 
> node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, 
> addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, 
> /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, 
> lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, 
> isClient=false] 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=class 
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 
> java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated 
> unexpectedly. 
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>  
>         at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately 
> due to the failure: [failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread 
> tcp-disco-srvr-#2 is terminated unexpectedly.]] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)