OK. I have found steps to reproduce. For reproducing we need Ignite with slow disk. Or I known how to emulate slow hard disk:
Add to partitions cycle in GridCacheDatabaseSharedManager#restorePartitionStates: //... for (int i = 0; i < grp.affinity().partitions(); i++) { try { log.error("Wait"); Thread.sleep(10000); } catch (InterruptedException e) { e.printStackTrace(); } //... - Now we can start server - Then start client - Wait message "Join cluster while cluster state transition is in progress, waiting when transition finish." - Kill server - Wait repeatable exception java.net.ConnectException: Connection refused (Connection refused) - Start server and I have 100% chance to reproduce on my computer Thank you. On 11/29/18 14:03, Stanislav Lukyanov wrote: > > Hi, > > > > The interesting (and disappointing) part is the NPE: > > java.lang.NullPointerException: null > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:666) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:546) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:128) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.tryJoin(ClientImpl.java:1846) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1757) > > > > Which version do you use? > > Is this reproducible? Every time? > > > > Thanks, > > Stan > > > > > > *From: *Dmitry Lazurkin <mailto:dila...@gmail.com> > *Sent: *20 ноября 2018 г. 15:44 > *To: *user@ignite.apache.org <mailto:user@ignite.apache.org> > *Subject: *Client stucks and doesn't connect > > > > Hello. > > > > Ignite client stops connecting to server after exception: > > > > 2018-11-19 16:00:49,257 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Resolved addresses from IP finder: > > [/10.48.14.1:47500] > > 2018-11-19 16:00:49,257 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request > > [addr=/10.48.14.1:47500, reconnect=true, > > locNodeId=cd323c53-d1de-4608-8eec-e373b1f68b71] > > 2018-11-19 16:00:49,258 [tcp-client-disco-reconnector-#5] ERROR > > o.a.i.s.d.tcp.TcpDiscoverySpi - Exception on joining: Connection refused > > (Connection refused) > > java.net.ConnectException: Connection refused (Connection refused) > > at java.net.PlainSocketImpl.socketConnect(Native Method) > > at > > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > > at > > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > > at > > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > > at java.net.Socket.connect(Socket.java:589) > > at > > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1450) > > at > > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1413) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:637) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:546) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:128) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1408) > > at > > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > > 2018-11-19 16:00:49,258 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to join to address > > [addr=/10.48.14.1:47500, recon=true, errs=[java.net.ConnectException: > > Connection refused (Connection refused)]] > > 2018-11-19 16:00:51,344 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Resolved addresses from IP finder: > > [/10.48.14.1:47500] > > 2018-11-19 16:00:51,344 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request > > [addr=/10.48.14.1:47500, reconnect=true, > > locNodeId=cd323c53-d1de-4608-8eec-e373b1f68b71] > > 2018-11-19 16:00:51,364 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Message has been sent to address > > [msg=TcpDiscoveryClientReconnectMessage > > [routerNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, > > lastMsgId=c42b1bc2761-bbc61e78-97e8-49cd-844e-2dc3e8aacc68, > > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > > id=48722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, > > verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, > > isClient=true]], addr=/10.48.14.1:47500, > > rmtNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20] > > 2018-11-19 16:00:51,365 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Received response to join request > > [addr=/10.48.14.1:47500, res=100] > > 2018-11-19 16:00:51,365 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Will wait before retry join. > > 2018-11-19 16:00:53,365 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request > > [addr=/10.48.14.1:47500, reconnect=true, > > locNodeId=cd323c53-d1de-4608-8eec-e373b1f68b71] > > 2018-11-19 16:00:53,368 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Message has been sent to address > > [msg=TcpDiscoveryClientReconnectMessage > > [routerNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, > > lastMsgId=c42b1bc2761-bbc61e78-97e8-49cd-844e-2dc3e8aacc68, > > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > > id=a8722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, > > verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, > > isClient=true]], addr=/10.48.14.1:47500, > > rmtNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20] > > 2018-11-19 16:00:53,368 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Received response to join request > > [addr=/10.48.14.1:47500, res=1] > > 2018-11-19 16:00:53,372 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Received reconnect response > > [success=false, msg=TcpDiscoveryClientReconnectMessage > > [routerNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, > > lastMsgId=c42b1bc2761-bbc61e78-97e8-49cd-844e-2dc3e8aacc68, > > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > > id=a8722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, > > verifierNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, topVer=0, > > pendingIdx=0, failedNodes=null, isClient=true]]] > > 2018-11-19 16:00:53,372 [tcp-client-disco-reconnector-#5] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Grid runnable finished normally: > > tcp-client-disco-reconnector-#5 > > 2018-11-19 16:00:53,373 [tcp-client-disco-msg-worker-#4] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to restore closed connection, > > will try to reconnect [networkTimeout=360000, joinTimeout=0, failMsg=null] > > 2018-11-19 16:00:53,373 [tcp-client-disco-msg-worker-#4] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Discovery notification > > [node=TcpDiscoveryNode [id=cd323c53-d1de-4608-8eec-e373b1f68b71, > > addrs=[10.37.92.216], > > sockAddrs=[kafka-ignite-streamer-5bdb96f7d6-dfbhc/10.37.92.216:0], > > discPort=0, order=5, intOrder=0, lastExchangeTime=1542643135842, > > loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=true], > > type=CLIENT_NODE_DISCONNECTED, topVer=6] > > 2018-11-19 16:00:53,443 [tcp-client-disco-msg-worker-#4] INFO > > o.a.i.s.d.tcp.TcpDiscoverySpi - Client node disconnected from cluster, > > will try to reconnect with new id > > [newId=af2e6c3b-7d68-44f6-ad64-e403e560b800, > > prevId=cd323c53-d1de-4608-8eec-e373b1f68b71, locNode=TcpDiscoveryNode > > [id=cd323c53-d1de-4608-8eec-e373b1f68b71, addrs=[10.37.92.216], > > sockAddrs=[kafka-ignite-streamer-5bdb96f7d6-dfbhc/10.37.92.216:0], > > discPort=0, order=5, intOrder=0, lastExchangeTime=1542643135842, > > loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=true]] > > 2018-11-19 16:00:53,482 [tcp-client-disco-msg-worker-#4] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Resolved addresses from IP finder: > > [/10.48.14.1:47500] > > 2018-11-19 16:00:53,482 [tcp-client-disco-msg-worker-#4] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request > > [addr=/10.48.14.1:47500, reconnect=false, > > locNodeId=af2e6c3b-7d68-44f6-ad64-e403e560b800] > > 2018-11-19 16:00:53,553 [tcp-client-disco-msg-worker-#4] ERROR > > o.a.i.s.d.tcp.TcpDiscoverySpi - Runtime error caught during grid > > runnable execution: IgniteSpiThread [name=tcp-client-disco-msg-worker-#4] > > java.lang.NullPointerException: null > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:666) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:546) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:128) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.tryJoin(ClientImpl.java:1846) > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1757) > > at > > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > > 2018-11-19 16:00:53,553 [tcp-client-disco-msg-worker-#4] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Grid runnable finished normally: > > tcp-client-disco-msg-worker-#4 > > 2018-11-19 16:06:27,487 [tcp-client-disco-sock-writer-#2] DEBUG > > o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to get acknowledge for message, > > will try to reconnect [msg=TcpDiscoveryClientMetricsUpdateMessage > > [super=TcpDiscoveryAbstractMessage [sndNodeId=null, > > id=47722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, > > verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, > > isClient=true]], failureDetectionTimeout=360000] > > 2018-11-19 16:06:27,487 [tcp-client-disco-sock-writer-#2] ERROR > > o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to send message: null > > java.io.IOException: Failed to get acknowledge for message: > > TcpDiscoveryClientMetricsUpdateMessage > > [super=TcpDiscoveryAbstractMessage [sndNodeId=null, > > id=47722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, > > verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, > > isClient=true]] > > at > > org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1302) > > at > > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > > > > >