Hi. >From these logs:
2017-05-19 15:23:07,005 WARN [tcp-disco-msg-worker-#2%null%] java.JavaLogger (JavaLogger.java:278) - Node is out of topology (probably, due to short-time network problems). 2017-05-19 15:23:07,049 WARN [disco-event-worker-#20%null%] java.JavaLogger (JavaLogger.java:278) - Local node SEGMENTED: TcpDiscoveryNode [id=bc2a043b-b90a-45d9-9761-962b2e8b0404, addrs=[10.127.246.164, 127.0.0.1], sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47500, /127.0.0.1:47500], discPort=47500, order=244, intOrder=124, lastExchangeTime=1495203787004, loc=true, ver=1.8.0#20161205-sha1:9ca40dbe, isClient=false] It seems your node segmented from cluster. Default segmentation policy is stopping the node. Most probably it is because of network promlems or gc pauses. On Mon, May 22, 2017 at 1:29 PM, jpmoore40 <[email protected]> wrote: > Hi, > > I had a problem a few weeks ago with a shared cache stopping - I thought I > had fixed it by adding backups to the cache configuration but it happened > again at the end of last week and we had to restart all nodes on the grid > to > restore order. I'd appreciate any pointers on why this is happening and > what > I can do to prevent it. Log file extracts are below. > > Thanks > Jon > > > At 14:34 the cache was rebalanced successfully: > > 2017-05-19 14:34:19,845 INFO [exchange-worker-#27%null%] java.JavaLogger > (JavaLogger.java:273) - Cache rebalancing scheduled: > [cache=datastructures_0, waitList=[ignite-sys-cache, > ignite-marshaller-sys-cache, ignite-atomics-sys-cache]] > 2017-05-19 14:34:19,860 INFO [exchange-worker-#27%null%] java.JavaLogger > (JavaLogger.java:273) - Rebalancing required [top=AffinityTopologyVersion > [topVer=251, minorTopVer=0], evt=NODE_FAILED, > node=c64b9b52-3013-4f55-91ec-422045a83412] > 2017-05-19 14:34:19,863 INFO [sys-#406%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970, partitionsCount=13, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], > updateSeq=26784] > 2017-05-19 14:34:19,877 INFO [sys-#406%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, partitionsCount=27, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], > updateSeq=26784] > 2017-05-19 14:34:19,878 INFO [sys-#406%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=222368cd-1ea5-4247-90be-00343c7f183f, partitionsCount=20, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], > updateSeq=26784] > 2017-05-19 14:34:19,879 INFO [sys-#406%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=429e48e2-5500-426d-9849-1b1acdd58e0c, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], > updateSeq=26784] > 2017-05-19 14:34:20,219 INFO [sys-#407%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Completed rebalancing > [cache=datastructures_0, fromNode=222368cd-1ea5-4247-90be-00343c7f183f, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=352 ms] > 2017-05-19 14:34:25,818 INFO [sys-#408%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Completed rebalancing > [cache=datastructures_0, fromNode=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=5952 > ms] > 2017-05-19 14:34:29,454 INFO [sys-#420%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Completed rebalancing > [cache=datastructures_0, fromNode=429e48e2-5500-426d-9849-1b1acdd58e0c, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=9595 > ms] > 2017-05-19 14:34:33,054 INFO [sys-#417%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Completed (final) rebalancing > [cache=datastructures_0, fromNode=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, > topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=13195 > ms] > > But then less than an hour later the node failed and the following is an > condensed extract of the log: > > 2017-05-19 15:22:29,662 WARN [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode > [id=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970, addrs=[10.127.246.164, > 127.0.0.1], > sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47504, > /127.0.0.1:47504], discPort=47504, order=248, intOrder=128, > lastExchangeTime=1495066956338, loc=false, ver=1.8.0#20161205-sha1: > 9ca40dbe, > isClient=false] > 2017-05-19 15:22:29,992 INFO [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:273) - Topology snapshot [ver=252, servers=4, clients=0, > CPUs=24, heap=2.6GB] > 2017-05-19 15:22:36,403 INFO [exchange-worker-#27%null%] java.JavaLogger > (JavaLogger.java:273) - Cache rebalancing scheduled: > [cache=datastructures_0, waitList=[ignite-sys-cache, > ignite-marshaller-sys-cache, ignite-atomics-sys-cache]] > 2017-05-19 15:22:36,451 INFO [exchange-worker-#27%null%] java.JavaLogger > (JavaLogger.java:273) - Rebalancing required [top=AffinityTopologyVersion > [topVer=252, minorTopVer=0], evt=NODE_FAILED, > node=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970] > 2017-05-19 15:22:36,473 INFO [sys-#411%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=222368cd-1ea5-4247-90be-00343c7f183f, partitionsCount=34, > topology=AffinityTopologyVersion [topVer=252, minorTopVer=0], > updateSeq=27363] > 2017-05-19 15:22:36,514 INFO [sys-#411%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, partitionsCount=34, > topology=AffinityTopologyVersion [topVer=252, minorTopVer=0], > updateSeq=27363] > 2017-05-19 15:22:36,515 INFO [sys-#411%null%] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Starting rebalancing > [cache=datastructures_0, mode=SYNC, > fromNode=429e48e2-5500-426d-9849-1b1acdd58e0c, partitionsCount=47, > topology=AffinityTopologyVersion [topVer=252, minorTopVer=0], > updateSeq=27363] > 2017-05-19 15:23:06,576 INFO [tcp-disco-msg-worker-#2%null%] > java.JavaLogger (JavaLogger.java:273) - Local node seems to be disconnected > from topology (failure detection timeout is reached) > [failureDetectionTimeout=10000, connCheckFreq=3333] > 2017-05-19 15:23:07,005 WARN [tcp-disco-msg-worker-#2%null%] > java.JavaLogger (JavaLogger.java:278) - Node is out of topology (probably, > due to short-time network problems). > 2017-05-19 15:23:07,049 WARN [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:278) - Local node SEGMENTED: TcpDiscoveryNode > [id=bc2a043b-b90a-45d9-9761-962b2e8b0404, addrs=[10.127.246.164, > 127.0.0.1], > sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47500, > /127.0.0.1:47500], discPort=47500, order=244, intOrder=124, > lastExchangeTime=1495203787004, loc=true, ver=1.8.0#20161205-sha1: > 9ca40dbe, > isClient=false] > 2017-05-19 15:23:07,095 INFO [grid-timeout-worker-#7%null%] > java.JavaLogger > (JavaLogger.java:273) - > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=bc2a043b, name=null, uptime=38:00:32:160] > ^-- H/N/C [hosts=2, nodes=4, CPUs=24] > ^-- CPU [cur=31.33%, avg=1.42%, GC=50.87%] > ^-- Heap [used=158MB, free=65.18%, comm=343MB] > ^-- Non heap [used=69MB, free=-1%, comm=70MB] > ^-- Public thread pool [active=0, idle=2, qSize=0] > ^-- System thread pool [active=3, idle=29, qSize=0] > ^-- Outbound messages queue [size=0] > 2017-05-19 15:23:07,279 WARN [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:278) - Stopping local node according to configured > segmentation policy. > 2017-05-19 15:23:07,290 WARN [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode > [id=222368cd-1ea5-4247-90be-00343c7f183f, addrs=[10.127.246.74, > 127.0.0.1], > sockAddrs=[/127.0.0.1:47500, LNAPLN343PRD.ldn.emea.cib/10.127.246.74:47500 > ], > discPort=47500, order=211, intOrder=110, lastExchangeTime=1495066951942, > loc=false, ver=1.8.0#20161205-sha1:9ca40dbe, isClient=false] > 2017-05-19 15:23:07,319 INFO [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:273) - Topology snapshot [ver=253, servers=3, clients=0, > CPUs=16, heap=1.3GB] > 2017-05-19 15:23:07,360 WARN [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode > [id=429e48e2-5500-426d-9849-1b1acdd58e0c, addrs=[10.127.246.164, > 127.0.0.1], > sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47502, > /127.0.0.1:47502], discPort=47502, order=246, intOrder=126, > lastExchangeTime=1495066955707, loc=false, ver=1.8.0#20161205-sha1: > 9ca40dbe, > isClient=false] > 2017-05-19 15:23:07,365 INFO [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:273) - Topology snapshot [ver=254, servers=2, clients=0, > CPUs=16, heap=0.89GB] > 2017-05-19 15:23:07,394 WARN [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode > [id=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, addrs=[10.127.246.164, > 127.0.0.1], > sockAddrs=[/127.0.0.1:47503, > LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47503], discPort=47503, > order=247, > intOrder=127, lastExchangeTime=1495066956338, loc=false, > ver=1.8.0#20161205-sha1:9ca40dbe, isClient=false] > 2017-05-19 15:23:07,403 INFO [disco-event-worker-#20%null%] > java.JavaLogger > (JavaLogger.java:273) - Topology snapshot [ver=255, servers=1, clients=0, > CPUs=16, heap=0.44GB] > 2017-05-19 15:23:07,540 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - Command protocol successfully stopped: TCP binary > class org.apache.ignite.IgniteException: Failed to wait for topology > update, > cache (or node) is stopping. > at > org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils. > java:927) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueAdapter.iterator(GridCacheQueueAdapter.java:230) > at java.util.AbstractCollection.contains(Unknown Source) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueProxy.contains(GridCacheQueueProxy.java:170) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue( > CrossFxCurvePubService.java:267) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate( > CrossFxCurvePubService.java:230) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) > at java.util.concurrent.FutureTask.runAndReset(Unknown Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(Unknown > Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run( > Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > Caused by: class org.apache.ignite.IgniteCheckedException: Failed to wait > for topology update, cache (or node) is stopping. > at > org.apache.ignite.internal.processors.cache.GridCacheAffinityManager. > cancelFutures(GridCacheAffinityManager.java:92) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor. > onKernalStop(GridCacheProcessor.java:906) > at org.apache.ignite.internal.IgniteKernal.stop0( > IgniteKernal.java:1931) > at org.apache.ignite.internal.IgniteKernal.stop( > IgniteKernal.java:1877) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0( > IgnitionEx.java:2363) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop( > IgnitionEx.java:2326) > at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322) > at org.apache.ignite.Ignition.stop(Ignition.java:224) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$8.run( > GridDiscoveryManager.java:1949) > ... 1 more > 2017-05-19 15:23:07,709 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - <datastructures_0> Cancelled rebalancing from all > nodes [cache=datastructures_0, topology=AffinityTopologyVersion > [topVer=252, > minorTopVer=0]] > class org.apache.ignite.IgniteException: Operation has been cancelled > (node > is stopping). > at > org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils. > java:927) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueAdapter.iterator(GridCacheQueueAdapter.java:230) > at java.util.AbstractCollection.contains(Unknown Source) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueProxy.contains(GridCacheQueueProxy.java:170) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue( > CrossFxCurvePubService.java:267) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate( > CrossFxCurvePubService.java:230) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) > at java.util.concurrent.FutureTask.runAndReset(Unknown Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(Unknown > Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run( > Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > Caused by: class org.apache.ignite.internal.NodeStoppingException: > Operation > has been cancelled (node is stopping). > at > org.apache.ignite.internal.processors.cache.GridCacheMvccManager. > stopError(GridCacheMvccManager.java:414) > at > org.apache.ignite.internal.processors.cache.GridCacheMvccManager. > onFutureAdded(GridCacheMvccManager.java:601) > at > org.apache.ignite.internal.processors.cache.GridCacheMvccManager. > addFuture(GridCacheMvccManager.java:486) > at > org.apache.ignite.internal.processors.cache.distributed.dht. > GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture > .java:270) > at > org.apache.ignite.internal.processors.cache.distributed.dht. > GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture > .java:196) > at > org.apache.ignite.internal.processors.cache.distributed. > dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1439) > at > org.apache.ignite.internal.processors.cache.distributed. > dht.atomic.GridDhtAtomicCache.get(GridDhtAtomicCache.java:482) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get( > GridCacheAdapter.java:4783) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get( > GridCacheAdapter.java:1395) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueAdapter.iterator(GridCacheQueueAdapter.java:223) > ... 11 more > java.lang.IllegalStateException: Cache has been stopped: datastructures_0 > at > org.apache.ignite.internal.processors.cache.GridCacheGateway.checkState( > GridCacheGateway.java:85) > at > org.apache.ignite.internal.processors.cache.GridCacheGateway.enter( > GridCacheGateway.java:68) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueProxy.contains(GridCacheQueueProxy.java:160) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue( > CrossFxCurvePubService.java:267) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate( > CrossFxCurvePubService.java:230) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) > at java.util.concurrent.FutureTask.runAndReset(Unknown Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(Unknown > Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run( > Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > 2017-05-19 15:23:07,773 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - Stopped cache: ignite-marshaller-sys-cache > 2017-05-19 15:23:07,774 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - Stopped cache: ignite-sys-cache > 2017-05-19 15:23:07,775 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - Stopped cache: ignite-atomics-sys-cache > 2017-05-19 15:23:07,776 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - Stopped cache: datastructures_0 > 2017-05-19 15:23:07,785 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - Removed undeployed class: GridDeployment > [ts=1495066953881, depMode=SHARED, > clsLdr=sun.misc.Launcher$AppClassLoader@3b7951, > clsLdrId=b371fe81c51-bc2a043b-b90a-45d9-9761-962b2e8b0404, userVer=0, > loc=true, > sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht. > preloader.GridDhtPartitionFullMap, > pendingUndeploy=false, undeployed=true, usage=0] > 2017-05-19 15:23:07,835 INFO [Thread-1622] java.JavaLogger > (JavaLogger.java:273) - > >>> +----------------------------------------------------------- > ----------------------+ > >>> Ignite ver. 1.8.0#20161205-sha1:9ca40dbeb7d559fcb299bdb6f5c90c > df8ce7e533 > >>> stopped OK > >>> +----------------------------------------------------------- > ----------------------+ > >>> Grid uptime: 38:00:32:899 > java.lang.IllegalStateException: Cache has been stopped: datastructures_0 > at > org.apache.ignite.internal.processors.cache.GridCacheGateway.checkState( > GridCacheGateway.java:85) > at > org.apache.ignite.internal.processors.cache.GridCacheGateway.enter( > GridCacheGateway.java:68) > at > org.apache.ignite.internal.processors.datastructures. > GridCacheQueueProxy.contains(GridCacheQueueProxy.java:160) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue( > CrossFxCurvePubService.java:267) > at > com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate( > CrossFxCurvePubService.java:230) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) > at java.util.concurrent.FutureTask.runAndReset(Unknown Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(Unknown > Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run( > Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > > > > > > > -- > View this message in context: http://apache-ignite-users. > 70518.x6.nabble.com/Cache-has-been-stopped-tp13056.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. > -- Alper Tekinalp Software Developer Evam Streaming Analytics Atatürk Mah. Turgut Özal Bulv. Gardenya 5 Plaza K:6 Ataşehir 34758 İSTANBUL Tel: +90 216 455 01 53 Fax: +90 216 455 01 54 www.evam.com.tr <http://www.evam.com>
