Hi,
Yesterday my Ignite 2.0 cluster went down with segmentation warning in the
logs.
I have set the segmentation resolver policy as RESTART_JVM and believed that
it doesn't work out of the box with Ignite 2.0 even when the node is started
through ignite.sh.

The logs though revealed that the policy was read and the grid restart was
attempted. The JVM was not successfully restarted though.
My understanding was that RESTART_JVM will restart only that JVM on which
the node was detected to be segmented but it brought the complete grid down.
Below are the relevant logs from the ignite server node.

Please help me understand the Ignite behavior in the case explained above.
Please let me know for any further details are required from my end.


----------------------------------------------
*LOGS*
----------------------------------------------

[06:04:21,211][WARNING][tcp-disco-msg-worker-#2%TANGO%][TcpDiscoverySpi]
Timed out waiting for message delivery receipt (most probably, the reason is
in long GC pauses on remote node; consider tuning GC and increasing
'ackTimeout' configuration property). Will retry to send message with
increased timeout. Current timeout: 5000.
[06:04:51,245][WARNING][tcp-disco-msg-worker-#2%TANGO%][TcpDiscoverySpi]
Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection please see 'Failure Detection' section under
javadoc for 'TcpDiscoverySpi'
[06:04:51,260][WARNING][tcp-disco-msg-worker-#2%TANGO%][TcpDiscoverySpi]
Node is out of topology (probably, due to short-time network problems).
[06:04:51,260][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=3b75965f-da92-4795-ba8e-c73ab6a0984f, addrs=[10.40.56.23, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, pserv200075.pk.tango.com/10.40.56.23:47500],
discPort=47500, order=236, intOrder=125, lastExchangeTime=1510856948826,
loc=true, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=false]
[06:04:51,286][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Restarting JVM according to configured segmentation policy.
[06:04:51,287][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=67281d71-8dc2-4539-8cd1-26319ea64896,
addrs=[10.55.188.21, 127.0.0.1],
sockAddrs=[pserv200080.pk.tango.com/10.55.188.21:47500, /127.0.0.1:47500],
discPort=47500, order=2, intOrder=2, lastExchangeTime=1510856953037,
loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=false]
[06:04:51,291][INFO][Thread-3][TcpDiscoveryZookeeperIpFinder] Destroying
ZooKeeper IP Finder.
[06:04:51,292][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=256, servers=5, clients=9, CPUs=56, heap=50.0GB]
[06:04:51,301][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=387e2a30-85b9-4ea2-bd4a-232cd9deda98,
addrs=[10.55.188.20, 127.0.0.1],
sockAddrs=[pserv200079.pk.tango.com/10.55.188.20:47500, /127.0.0.1:47500],
discPort=47500, order=3, intOrder=3, lastExchangeTime=1510856953037,
loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=false]
[06:04:51,305][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=257, servers=4, clients=9, CPUs=52, heap=46.0GB]
[06:04:51,313][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=6074eac2-e548-4fad-8496-f12967b38350,
addrs=[10.55.188.19, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
pserv200078.pk.tango.com/10.55.188.19:47500], discPort=47500, order=5,
intOrder=5, lastExchangeTime=1510856953037, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=false]
[06:04:51,320][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=258, servers=3, clients=9, CPUs=48, heap=43.0GB]
[06:04:51,326][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=f800ed1c-093c-4ad9-aa8e-401d481337f8,
addrs=[10.40.56.24, 127.0.0.1],
sockAddrs=[pserv200076.pk.tango.com/10.40.56.24:47500, /127.0.0.1:47500],
discPort=47500, order=6, intOrder=6, lastExchangeTime=1510856953037,
loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=false]
[06:04:51,331][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=259, servers=2, clients=9, CPUs=44, heap=39.0GB]
[06:04:51,336][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=bb27cdad-54b2-4900-9b6b-7004ce3fc5f2,
addrs=[10.40.56.4, 127.0.0.1],
sockAddrs=[pserv200021.pk.tango.com/10.40.56.4:0, /127.0.0.1:0], discPort=0,
order=219, intOrder=113, lastExchangeTime=1510856952787, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,341][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=260, servers=2, clients=8, CPUs=40, heap=35.0GB]
[06:04:51,345][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=6d3ee3b9-9749-43fb-a589-5abe6dffe7b8,
addrs=[10.40.56.5, 127.0.0.1],
sockAddrs=[pserv200022.pk.tango.com/10.40.56.5:0, /127.0.0.1:0], discPort=0,
order=220, intOrder=114, lastExchangeTime=1510856952787, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,350][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=261, servers=2, clients=7, CPUs=36, heap=32.0GB]
[06:04:51,357][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=4f256925-94df-455e-b6a0-893b9a906364,
addrs=[10.40.56.15, 127.0.0.1], sockAddrs=[/127.0.0.1:0,
pserv200056.pk.tango.com/10.40.56.15:0], discPort=0, order=221,
intOrder=115, lastExchangeTime=1510856952797, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,363][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=262, servers=2, clients=6, CPUs=32, heap=28.0GB]
[06:04:51,367][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=88d941a7-e6e9-4559-8586-3c69070c3354,
addrs=[10.40.56.18, 127.0.0.1], sockAddrs=[/127.0.0.1:0,
pserv200064.pk.tango.com/10.40.56.18:0], discPort=0, order=222,
intOrder=116, lastExchangeTime=1510856952797, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,372][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=263, servers=2, clients=5, CPUs=28, heap=25.0GB]
[06:04:51,377][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=fbf42d0b-2f35-4256-a86f-8a3a772e50e1,
addrs=[10.55.188.4, 127.0.0.1], sockAddrs=[/127.0.0.1:0,
pserv200057.pk.tango.com/10.55.188.4:0], discPort=0, order=223,
intOrder=117, lastExchangeTime=1510856952807, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,382][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=264, servers=2, clients=4, CPUs=24, heap=21.0GB]
[06:04:51,384][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=a1ea2882-a982-4f1c-b127-82cc47050969,
addrs=[10.55.188.5, 127.0.0.1], sockAddrs=[/127.0.0.1:0,
pserv200058.pk.tango.com/10.55.188.5:0], discPort=0, order=224,
intOrder=118, lastExchangeTime=1510856952817, loc=false,
ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,390][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=265, servers=2, clients=3, CPUs=20, heap=17.0GB]
[06:04:51,392][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=2707ea82-5c22-4ca2-bd6b-3f198ac90fe4,
addrs=[10.55.188.6, 127.0.0.1],
sockAddrs=[pserv200059.pk.tango.com/10.55.188.6:0, /127.0.0.1:0],
discPort=0, order=225, intOrder=119, lastExchangeTime=1510856952817,
loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,399][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=266, servers=2, clients=2, CPUs=16, heap=14.0GB]
[06:04:51,407][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=dbcdfcef-f041-42d3-84f0-9ff99a366b92,
addrs=[10.55.188.11, 127.0.0.1],
sockAddrs=[pserv200065.pk.tango.com/10.55.188.11:0, /127.0.0.1:0],
discPort=0, order=226, intOrder=120, lastExchangeTime=1510856952828,
loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=true]
[06:04:51,414][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=267, servers=2, clients=1, CPUs=12, heap=10.0GB]
[06:04:51,417][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=503acf5b-9d8a-42f2-9a50-8faa349ba1dd,
addrs=[10.40.56.25, 127.0.0.1],
sockAddrs=[pserv200077.pk.tango.com/10.40.56.25:47500, /127.0.0.1:47500],
discPort=47500, order=240, intOrder=127, lastExchangeTime=1510873991541,
loc=false, ver=2.0.0#20170430-sha1:d4eef3c6, isClient=false]
[06:04:51,422][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=268, servers=1, clients=1, CPUs=8, heap=6.4GB]
[06:04:51,472][WARNING][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=48590331-63e3-4dfa-a386-82ad65127952,
addrs=[10.227.140.189, 127.0.0.1], sockAddrs=[/127.0.0.1:0,
/10.227.140.189:0], discPort=0, order=255, intOrder=135,
lastExchangeTime=1510911380283, loc=false, ver=2.0.0#20170430-sha1:d4eef3c6,
isClient=true]
[06:04:51,479][INFO][disco-event-worker-#32%TANGO%][GridDiscoveryManager]
Topology snapshot [ver=269, servers=1, clients=0, CPUs=4, heap=3.9GB]
[06:04:53,421][INFO][Thread-3][GridTcpRestProtocol] Command protocol
successfully stopped: TCP binary
[06:04:53,441][INFO][Thread-3][GridCacheProcessor] Stopped cache:
ignite-sys-cache
[06:04:53,441][INFO][Thread-3][GridCacheProcessor] Stopped cache:
ignite-atomics-sys-cache
[06:04:53,441][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheOne
[06:04:53,442][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheTwo
[06:04:53,443][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheThree
[06:04:53,443][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheFour
[06:04:53,444][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheFive
[06:04:53,445][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheSix
[06:04:53,445][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheSeven
[06:04:53,446][INFO][Thread-3][GridCacheProcessor] Stopped cache: CacheEight
[06:04:53,464][INFO][Thread-3][IgniteKernal%TANGO]

>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 2.0.0#20170430-sha1:d4eef3c68ff116ee34bc13648cd82c640b3ea072
>>> stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Ignite instance name: TANGO
>>> Grid uptime: 59:35:37:703




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to