Hi Kevin, Looks like the topology got broken for some reason. Could you please attach logs from all nodes so that I can investigate it deeper?
Vladimir. On Wed, Apr 27, 2016 at 1:46 PM, Zhengqingzheng <zhengqingzh...@huawei.com> wrote: > Hi there, > > When I tried to clear one specific cache, service nodes closed unexpected. > > > > > > visor> cache -clear -c=@c7 > > [17:26:35] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16, > heap=63.0GB] > > [17:26:38,009][SEVERE][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] > TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node > in order to prevent cluster wide instability. > > java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095) > > at > java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519) > > at > java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682) > > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5779) > > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2161) > > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > > [17:26:38] Topology snapshot [ver=71, servers=1, clients=0, CPUs=8, > heap=7.0GB] > > [17:26:43] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16, > heap=63.0GB] > > [17:26:43] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16, > heap=63.0GB] > > [17:26:44] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16, > heap=63.0GB] > > [17:27:19] Topology snapshot [ver=63, servers=8, clients=0, CPUs=16, > heap=56.0GB] > > [17:27:19] Topology snapshot [ver=63, servers=8, clients=0, CPUs=16, > heap=56.0GB] > > [17:27:19] Topology snapshot [ver=64, servers=7, clients=0, CPUs=16, > heap=49.0GB] > > [17:27:19] Topology snapshot [ver=64, servers=7, clients=0, CPUs=16, > heap=49.0GB] > > [17:27:19] Topology snapshot [ver=65, servers=6, clients=0, CPUs=16, > heap=42.0GB] > > [17:27:19] Topology snapshot [ver=65, servers=6, clients=0, CPUs=16, > heap=42.0GB] > > [17:27:19] Topology snapshot [ver=67, servers=5, clients=0, CPUs=16, > heap=35.0GB] > > [17:27:19] Topology snapshot [ver=67, servers=4, clients=0, CPUs=16, > heap=28.0GB] > > [17:27:19] Topology snapshot [ver=67, servers=5, clients=0, CPUs=16, > heap=35.0GB] > > [17:27:19] Topology snapshot [ver=67, servers=4, clients=0, CPUs=16, > heap=28.0GB] > > [17:27:23,326][SEVERE][sys-#19%null%][GridCachePartitionExchangeManager] > Failed to send local partition map to node [node=TcpDiscoveryNode > [id=b247699c-8545-40a4-9b9c-aa478ea3ca55, addrs=[0:0:0:0:0:0:0:1%lo, > 10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47501, > /0:0:0:0:0:0:0:1%lo:47501, /10.120.70.122:47501, /127.0.0.1:47501], > discPort=47501, order=2, intOrder=2, lastExchangeTime=1461673945474, > loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false], > exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion > [topVer=62, minorTopVer=0], nodeId=fe76324d, evt=NODE_FAILED]] > > class org.apache.ignite.IgniteCheckedException: Failed to send message > (node may have left the grid or TCP connection cannot be established due to > firewall issues) [node=TcpDiscoveryNode > [id=b247699c-8545-40a4-9b9c-aa478ea3ca55, addrs=[0:0:0:0:0:0:0:1%lo, > 10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47501, > /0:0:0:0:0:0:0:1%lo:47501, /10.120.70.122:47501, /127.0.0.1:47501], > discPort=47501, order=2, intOrder=2, lastExchangeTime=1461673945474, > loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false], > topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage > [parts={1=GridDhtPartitionMap2 [moving=12, size=121], > -2146922738=GridDhtPartitionMap2 [moving=12, size=121], > 745661760=GridDhtPartitionMap2 [moving=12, size=121], > -2100569601=GridDhtPartitionMap2 [moving=0, size=100], > -1071296927=GridDhtPartitionMap2 [moving=12, size=121], > -1667118441=GridDhtPartitionMap2 [moving=12, size=121], > 689859866=GridDhtPartitionMap2 [moving=12, size=121], > 810756007=GridDhtPartitionMap2 [moving=12, size=121], > -1582327725=GridDhtPartitionMap2 [moving=12, size=121], > 1316949047=GridDhtPartitionMap2 [moving=12, size=121], > 1325947219=GridDhtPartitionMap2 [moving=0, size=20]}, partCntrs=null, > client=false, super=GridDhtPartitionsAbstractMessage > [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion > [topVer=62, minorTopVer=0], nodeId=fe76324d, evt=NODE_FAILED], > lastVer=GridCacheVersion [topVer=73153881, nodeOrderDrId=8, > globalTime=1461749161426, order=1461727649806], super=GridCacheMessage > [msgId=12782, depInfo=null, err=null, skipPrepare=false, cacheId=0, > cacheId=0]]], policy=2] > > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1082) > > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1146) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:873) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:814) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionRequest(GridCachePartitionExchangeManager.java:1087) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:107) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$4.onMessage(GridCachePartitionExchangeManager.java:291) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$4.onMessage(GridCachePartitionExchangeManager.java:289) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:1635) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:1617) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163) > > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:821) > > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$1600(GridIoManager.java:103) > > at > org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:784) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send > message to remote node: TcpDiscoveryNode > [id=b247699c-8545-40a4-9b9c-aa478ea3ca55, addrs=[0:0:0:0:0:0:0:1%lo, > 10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47501, > /0:0:0:0:0:0:0:1%lo:47501, /10.120.70.122:47501, /127.0.0.1:47501], > discPort=47501, order=2, intOrder=2, lastExchangeTime=1461673945474, > loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1959) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1899) > > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1077) > > ... 20 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Failed to > connect to node (is node still alive?). Make sure that each GridComputeTask > and GridCacheTransaction has a timeout set in order to prevent parties from > waiting forever in case of network issues > [nodeId=b247699c-8545-40a4-9b9c-aa478ea3ca55, addrs=[/10.120.70.122:47101, > /0:0:0:0:0:0:0:1%lo:47101, /127.0.0.1:47101]] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2462) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2103) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1997) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1933) > > ... 22 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: Failed > to connect to address: /10.120.70.122:47101 > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467) > > ... 25 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Failed > to read remote node recovery handshake (connection closed). > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2672) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334) > > ... 25 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: Failed > to connect to address: /0:0:0:0:0:0:0:1%lo:47101 > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467) > > ... 25 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Remote > node ID is not as expected [expected=b247699c-8545-40a4-9b9c-aa478ea3ca55, > rcvd=58708335-cd7e-4e54-b86a-73a63da9ed4d] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334) > > ... 25 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: Failed > to connect to address: /127.0.0.1:47101 > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467) > > ... 25 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Remote > node ID is not as expected [expected=b247699c-8545-40a4-9b9c-aa478ea3ca55, > rcvd=58708335-cd7e-4e54-b86a-73a63da9ed4d] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334) > > ... 25 more > > [17:27:27] Topology snapshot [ver=71, servers=1, clients=0, CPUs=8, > heap=7.0GB] > > [17:27:28,403][SEVERE][exchange-worker-#50%null%][GridCachePartitionExchangeManager] > Failed to send local partition map to node [node=TcpDiscoveryNode > [id=98ba7f1a-2815-4a05-b083-420066840ce5, addrs=[0:0:0:0:0:0:0:1%lo, > 10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, > /0:0:0:0:0:0:0:1%lo:47500, /10.120.70.122:47500, /127.0.0.1:47500], > discPort=47500, order=1, intOrder=1, lastExchangeTime=1461673938529, > loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false], exchId=null] > > class org.apache.ignite.IgniteCheckedException: Failed to send message > (node may have left the grid or TCP connection cannot be established due to > firewall issues) [node=TcpDiscoveryNode > [id=98ba7f1a-2815-4a05-b083-420066840ce5, addrs=[0:0:0:0:0:0:0:1%lo, > 10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, > /0:0:0:0:0:0:0:1%lo:47500, /10.120.70.122:47500, /127.0.0.1:47500], > discPort=47500, order=1, intOrder=1, lastExchangeTime=1461673938529, > loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false], > topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage > [parts={1=GridDhtPartitionMap2 [moving=0, size=111], > -2146922738=GridDhtPartitionMap2 [moving=0, size=111], > 745661760=GridDhtPartitionMap2 [moving=0, size=111], > -2100569601=GridDhtPartitionMap2 [moving=0, size=100], > -1071296927=GridDhtPartitionMap2 [moving=0, size=111], > -1667118441=GridDhtPartitionMap2 [moving=0, size=111], > 689859866=GridDhtPartitionMap2 [moving=0, size=111], > 810756007=GridDhtPartitionMap2 [moving=0, size=111], > -1582327725=GridDhtPartitionMap2 [moving=0, size=111], > 1316949047=GridDhtPartitionMap2 [moving=0, size=111], > 1325947219=GridDhtPartitionMap2 [moving=0, size=20]}, partCntrs=null, > client=false, super=GridDhtPartitionsAbstractMessage [exchId=null, > lastVer=GridCacheVersion [topVer=73153881, nodeOrderDrId=7, > globalTime=1461749161553, order=1461727649806], super=GridCacheMessage > [msgId=12784, depInfo=null, err=null, skipPrepare=false, cacheId=0, > cacheId=0]]], policy=2] > > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1082) > > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1146) > > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:873) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:814) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:705) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:724) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1600(GridCachePartitionExchangeManager.java:107) > > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1267) > > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send > message to remote node: TcpDiscoveryNode > [id=98ba7f1a-2815-4a05-b083-420066840ce5, addrs=[0:0:0:0:0:0:0:1%lo, > 10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, > /0:0:0:0:0:0:0:1%lo:47500, /10.120.70.122:47500, /127.0.0.1:47500], > discPort=47500, order=1, intOrder=1, lastExchangeTime=1461673938529, > loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1959) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1899) > > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1077) > > ... 9 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Failed to > connect to node (is node still alive?). Make sure that each GridComputeTask > and GridCacheTransaction has a timeout set in order to prevent parties from > waiting forever in case of network issues > [nodeId=98ba7f1a-2815-4a05-b083-420066840ce5, addrs=[/10.120.70.122:47100, > /0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2462) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2103) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1997) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1933) > > ... 11 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: Failed > to connect to address: /10.120.70.122:47100 > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467) > > ... 14 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Failed > to read remote node recovery handshake (connection closed). > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2672) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334) > > ... 14 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: Failed > to connect to address: /0:0:0:0:0:0:0:1%lo:47100 > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467) > > ... 14 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Remote > node ID is not as expected [expected=98ba7f1a-2815-4a05-b083-420066840ce5, > rcvd=8cbb2885-4f9f-4547-8b2b-b55e64cb3579] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334) > > ... 14 more > > Suppressed: class org.apache.ignite.IgniteCheckedException: Failed > to connect to address: /127.0.0.1:47100 > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467) > > ... 14 more > > Caused by: class org.apache.ignite.IgniteCheckedException: Remote > node ID is not as expected [expected=98ba7f1a-2815-4a05-b083-420066840ce5, > rcvd=8cbb2885-4f9f-4547-8b2b-b55e64cb3579] > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577) > > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334) > > ... 14 more > > > > > > >