Hello! These two nodes were removed from cluster due to inability to submit metrics in time: [15:07:16,330][WARNING][tcp-disco-msg-worker-[11cf0c06 10.212.120.71:57500 crd]-#2%hh_DynamicGrid_v2%][TcpDiscoverySpi] Failing client node due to not receiving metrics updates from client node within 'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing configuration property) [timeout=120000, node=TcpDiscoveryNode [id=9dbcfb86-a60e-4382-904f-57bffbe18c5c,consistentId=73B5811B-9644-48FD-A533-B4609FDAD591, addrs=ArrayList [10.212.120.190], sockAddrs=HashSet [ VWNV02AX07080.HH.com/10.212.120.190:0], discPort=0, order=488, intOrder=248, lastExchangeTime=1612397142960, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true]] [15:07:16,331][WARNING][disco-event-worker-#62%hh_DynamicGrid_v2%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=9dbcfb86-a60e-4382-904f-57bffbe18c5c, consistentId=73B5811B-9644-48FD-A533-B4609FDAD591, addrs=ArrayList [10.212.120.190], sockAddrs=HashSet [VWNV02AX07080.HH.com/10.212.120.190:0], discPort=0, order=488, intOrder=248, lastExchangeTime=1612397142960, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true] [15:07:16,332][INFO][disco-event-worker-#62%hh_DynamicGrid_v2%][GridDiscoveryManager] Topology snapshot [ver=499, locNode=83fd7c70, servers=3, clients=14, state=ACTIVE, CPUs=204, offheap=54.0GB, heap=120.0GB] [15:07:16,332][INFO][disco-event-worker-#62%hh_DynamicGrid_v2%][GridDiscoveryManager] ^-- Baseline [id=0, size=3, online=3, offline=0] [15:07:16,332][WARNING][tcp-disco-msg-worker-[11cf0c06 10.212.120.71:57500 crd]-#2%hh_DynamicGrid_v2%][TcpDiscoverySpi] Failing client node due to not receiving metrics updates from client node within 'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing configuration property) [timeout=120000, node=TcpDiscoveryNode [id=8d51aa56 -b67e-4d4c-a9ba-5c68699e6a47,consistentId=8FFFBE22-239A-4442-91FA-947EFE1207C0, addrs=ArrayList [10.212.120.187], sockAddrs=HashSet [ VWNV02AX07077.HH.com/10.212.120.187:0], discPort=0, order=493, intOrder=253, lastExchangeTime=1612397256739, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true]] [15:07:16,333][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], crd=true, evt=NODE_FAILED, evtNode=9dbcfb86-a60e-4382-904f-57bffbe18c5c, customEvt=null, allowMerge=true, exchangeFreeSwitch=false] [15:07:16,334][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true] [15:07:16,336][WARNING][disco-event-worker-#62%hh_DynamicGrid_v2%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=8d51aa56-b67e-4d4c-a9ba-5c68699e6a47, consistentId=8FFFBE22-239A-4442-91FA-947EFE1207C0, addrs=ArrayList [10.212.120.187], sockAddrs=HashSet [VWNV02AX07077.HH.com/10.212.120.187:0], discPort=0, order=493, intOrder=253, lastExchangeTime=1612397256739, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true] [15:07:16,337][INFO][disco-event-worker-#62%hh_DynamicGrid_v2%][GridDiscoveryManager] Topology snapshot [ver=500, locNode=83fd7c70, servers=3, clients=13, state=ACTIVE, CPUs=192, offheap=54.0GB, heap=120.0GB] [15:07:16,337][INFO][disco-event-worker-#62%hh_DynamicGrid_v2%][GridDiscoveryManager] ^-- Baseline [id=0, size=3, online=3, offline=0] [15:07:16,341][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=83fd7c70-839d-46ca-969f-bbb9661d6ab2, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode [id=9dbcfb86-a60e-4382-904f-57bffbe18c5c, consistentId=73B5811B-9644-48FD-A533-B4609FDAD591, addrs=ArrayList [10.212.120.190], sockAddrs=HashSet [VWNV02AX07080.HH.com/10.212.120.190:0], discPort=0, order=488, intOrder=248, lastExchangeTime=1612397142960, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true], done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=499, minorTopVer=0]] [15:07:16,341][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters initialization" (0 ms), stage="Determine exchange type" (1 ms), stage="Exchange done" (6 ms), stage="Total time" (7 ms)] [15:07:16,341][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=499, minorTopVer=0]] [15:07:16,341][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=499, minorTopVer=0], crd=true] [15:07:16,363][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=499, minorTopVer=0], force=false, evt=NODE_FAILED, node=9dbcfb86-a60e-4382-904f-57bffbe18c5c] [15:07:16,363][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=500, minorTopVer=0], crd=true, evt=NODE_FAILED, evtNode=8d51aa56-b67e-4d4c-a9ba-5c68699e6a47, customEvt=null, allowMerge=true, exchangeFreeSwitch=false] [15:07:16,364][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=500, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=500, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true] [15:07:16,372][INFO][exchange-worker-#63%hh_DynamicGrid_v2%][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=83fd7c70-839d-46ca-969f-bbb9661d6ab2, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=500, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode [id=8d51aa56-b67e-4d4c-a9ba-5c68699e6a47, consistentId=8FFFBE22-239A-4442-91FA-947EFE1207C0, addrs=ArrayList [10.212.120.187], sockAddrs=HashSet [VWNV02AX07077.HH.com/10.212.120.187:0], discPort=0, order=493, intOrder=253, lastExchangeTime=1612397256739, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true], done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=500, minorTopVer=0]]
I'm not sure why these nodes were not able to detect that they are no longer in the cluster. Perhaps some network issues could cause it. Normally this should not happen, client nodes should detect this and try to reconnect. Regards, -- Ilya Kasnacheev ср, 10 февр. 2021 г. в 09:17, Charlin S <[email protected]>: > Hi, > Please find log files here as requested. > Note: Client node connected with two different grids(Replicated Mode and > Partition Mode). We are facing issues with partition mode grid. > > Thanks & Regards, > Charlin > > > On Fri, 5 Feb 2021 at 16:18, Ilya Kasnacheev <[email protected]> > wrote: > >> Hello! >> >> Please share a complete log from both nodes. I expect that one of the >> nodes were previously segmented. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пт, 5 февр. 2021 г. в 12:41, Charlin S <[email protected]>: >> >>> Hi, >>> >>> i'm running an ASP.Net application with ignite 2.8.1 and seeing below >>> error details in ignite client log and my web site stopped working. >>> It's back to normal after restarting the application pool. >>> >>> [15:07:42,689][SEVERE][Thread-219][TcpCommunicationSpi] Failed to send >>> message to remote node [node=TcpDiscoveryNode >>> [id=83fd7c70-839d-46ca-969f-bbb9661d6ab2, consistentId=127.1.1.1:57500, >>> addrs=ArrayList [127.1.1.1], sockAddrs=HashSet [test.com/127.1.1.1:57500], >>> discPort=57500, order=1, intOrder=1, lastExchangeTime=1612397256785, >>> loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], >>> msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, >>> timeout=0, skipOnTimeout=false, msg=GridNearAtomicFullUpdateRequest >>> [keys=ArrayList [UserKeyCacheObjectImpl [part=292, >>> val=TestModel:TEST|bbf4da4d-c3d7-4b46-98b6-0de70c30f668, >>> hasValBytes=true]], conflictTtls=null, conflictExpireTimes=null, >>> expiryPlc=org.apache.ignite.internal.processors.platform.cache.expiry.PlatformExpiryPolicy@3fb1b76e, >>> initSize=1, filter=null, parent=GridNearAtomicAbstractUpdateRequest >>> [res=null, flags=keepBinary]]]] >>> class >>> org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Remote >>> node does not observe current node in topology : >>> 83fd7c70-839d-46ca-969f-bbb9661d6ab2 >>> at >>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622) >>> at >>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458) >>> at >>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198) >>> at >>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078) >>> at >>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918) >>> at >>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877) >>> at >>> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035) >>> at >>> org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132) >>> at >>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257) >>> at >>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296) >>> at >>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:312) >>> at >>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:486) >>> at >>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:446) >>> at >>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:249) >>> at >>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1164) >>> at >>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:624) >>> at >>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2580) >>> at >>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2557) >>> at >>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1299) >>> at >>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:856) >>> at >>> org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:432) >>> at >>> org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:67) >>> >>> Its working fine after restarted the website. >>> >>> Thanks & Regards, >>> Charlin >>> >>
