[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.
[ https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512697#comment-16512697 ] ASF GitHub Bot commented on IGNITE-8751: Github user agura closed the pull request at: https://github.com/apache/ignite/pull/4171 > Possible race on node segmentation. > --- > > Key: IGNITE-8751 > URL: https://issues.apache.org/jira/browse/IGNITE-8751 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Andrew Mashenkov >Assignee: Andrey Gura >Priority: Major > Fix For: 2.6 > > > Segmentation policy may be ignored, probably, due to a race. > See [1] for details. > [1] > [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html] > Logs from segmented node. > [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished > serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 > [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local > node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, > addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, > /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, > lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, > isClient=false] > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. > Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] > java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated > unexpectedly. > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) > > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately > due to the failure: [failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.
[ https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511245#comment-16511245 ] Alexey Goncharuk commented on IGNITE-8751: -- Changes look good to me. > Possible race on node segmentation. > --- > > Key: IGNITE-8751 > URL: https://issues.apache.org/jira/browse/IGNITE-8751 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Andrew Mashenkov >Assignee: Andrey Gura >Priority: Major > Fix For: 2.6 > > > Segmentation policy may be ignored, probably, due to a race. > See [1] for details. > [1] > [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html] > Logs from segmented node. > [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished > serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 > [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local > node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, > addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, > /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, > lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, > isClient=false] > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. > Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] > java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated > unexpectedly. > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) > > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately > due to the failure: [failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.
[ https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510928#comment-16510928 ] Andrey Gura commented on IGNITE-8751: - TC looks good: https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8_IgniteTests24Java8=pull%2F4171%2Fhead Please review. > Possible race on node segmentation. > --- > > Key: IGNITE-8751 > URL: https://issues.apache.org/jira/browse/IGNITE-8751 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Andrew Mashenkov >Assignee: Andrey Gura >Priority: Major > Fix For: 2.6 > > > Segmentation policy may be ignored, probably, due to a race. > See [1] for details. > [1] > [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html] > Logs from segmented node. > [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished > serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 > [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local > node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, > addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, > /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, > lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, > isClient=false] > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. > Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] > java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated > unexpectedly. > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) > > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately > due to the failure: [failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.
[ https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507004#comment-16507004 ] ASF GitHub Bot commented on IGNITE-8751: GitHub user agura opened a pull request: https://github.com/apache/ignite/pull/4171 IGNITE-8751 Failure handler accordingly to segmentation policy should be invoked on node segmentation instead of configured failure handler … be invoked on node segmentation instead of configured failure handler You can merge this pull request into a Git repository by running: $ git pull https://github.com/agura/incubator-ignite ignite-8751 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/4171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4171 commit 56f02086cdabe05af5001fb228406935ca000994 Author: Andrey Gura Date: 2018-06-09T13:37:49Z IGNITE-8751 Failure handler accordingly to segmentation policy should be invoked on node segmentation instead of configured failure handler > Possible race on node segmentation. > --- > > Key: IGNITE-8751 > URL: https://issues.apache.org/jira/browse/IGNITE-8751 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Andrew Mashenkov >Assignee: Andrey Gura >Priority: Major > Fix For: 2.6 > > > Segmentation policy may be ignored, probably, due to a race. > See [1] for details. > [1] > [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html] > Logs from segmented node. > [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished > serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 > [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local > node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, > addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, > /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, > lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, > isClient=false] > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. > Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] > java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated > unexpectedly. > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) > > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately > due to the failure: [failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8751) Possible race on node segmentation.
[ https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506083#comment-16506083 ] Andrey Gura commented on IGNITE-8751: - It isn't race. {{tcp-disco-srvr}} is interrupted earlier than segmentation policy handles segmentation. See {{org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.DiscoveryWorker#onSegmentation}} where we first disconnect SPI and then handle segmentation. It seems could be fixed by adding check on SPI state in exception handler of {{tcp-disco-srvr}}. > Possible race on node segmentation. > --- > > Key: IGNITE-8751 > URL: https://issues.apache.org/jira/browse/IGNITE-8751 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Andrew Mashenkov >Assignee: Andrey Gura >Priority: Major > Fix For: 2.6 > > > Segmentation policy may be ignored, probably, due to a race. > See [1] for details. > [1] > [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html] > Logs from segmented node. > [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished > serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712 > [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local > node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b, > addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500, > /127.0.0.1:49500], discPort=49500, order=1, intOrder=1, > lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7, > isClient=false] > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected. > Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] > java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated > unexpectedly. > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) > > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately > due to the failure: [failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread > tcp-disco-srvr-#2 is terminated unexpectedly.]] -- This message was sent by Atlassian JIRA (v7.6.3#76005)