Issue with patch: https://issues.apache.org/jira/browse/IGNITE-10513
On 12/3/18 17:43, Dmitry Lazurkin wrote: > OK. I have found steps to reproduce. > > For reproducing we need Ignite with slow disk. Or I known how to > emulate slow hard disk: > > Add to partitions cycle in > GridCacheDatabaseSharedManager#restorePartitionStates: > //... > for (int i = 0; i < grp.affinity().partitions(); i++) { > try { > log.error("Wait"); > Thread.sleep(10000); > } catch (InterruptedException e) { > e.printStackTrace(); > } > //... > > - Now we can start server > - Then start client > - Wait message "Join cluster while cluster state transition is in > progress, waiting when transition finish." > - Kill server > - Wait repeatable exception java.net.ConnectException: Connection > refused (Connection refused) > - Start server and I have 100% chance to reproduce on my computer > > Thank you. > > On 11/29/18 14:03, Stanislav Lukyanov wrote: >> >> Hi, >> >> >> >> The interesting (and disappointing) part is the NPE: >> >> java.lang.NullPointerException: null >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:666) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:546) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:128) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.tryJoin(ClientImpl.java:1846) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1757) >> >> >> >> Which version do you use? >> >> Is this reproducible? Every time? >> >> >> >> Thanks, >> >> Stan >> >> >> >> >> >> *From: *Dmitry Lazurkin <mailto:dila...@gmail.com> >> *Sent: *20 ноября 2018 г. 15:44 >> *To: *user@ignite.apache.org <mailto:user@ignite.apache.org> >> *Subject: *Client stucks and doesn't connect >> >> >> >> Hello. >> >> >> >> Ignite client stops connecting to server after exception: >> >> >> >> 2018-11-19 16:00:49,257 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Resolved addresses from IP finder: >> >> [/10.48.14.1:47500] >> >> 2018-11-19 16:00:49,257 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request >> >> [addr=/10.48.14.1:47500, reconnect=true, >> >> locNodeId=cd323c53-d1de-4608-8eec-e373b1f68b71] >> >> 2018-11-19 16:00:49,258 [tcp-client-disco-reconnector-#5] ERROR >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Exception on joining: Connection refused >> >> (Connection refused) >> >> java.net.ConnectException: Connection refused (Connection refused) >> >> at java.net.PlainSocketImpl.socketConnect(Native Method) >> >> at >> >> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) >> >> at >> >> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) >> >> at >> >> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) >> >> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >> >> at java.net.Socket.connect(Socket.java:589) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1450) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1413) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:637) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:546) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:128) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1408) >> >> at >> >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) >> >> 2018-11-19 16:00:49,258 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to join to address >> >> [addr=/10.48.14.1:47500, recon=true, errs=[java.net.ConnectException: >> >> Connection refused (Connection refused)]] >> >> 2018-11-19 16:00:51,344 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Resolved addresses from IP finder: >> >> [/10.48.14.1:47500] >> >> 2018-11-19 16:00:51,344 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request >> >> [addr=/10.48.14.1:47500, reconnect=true, >> >> locNodeId=cd323c53-d1de-4608-8eec-e373b1f68b71] >> >> 2018-11-19 16:00:51,364 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Message has been sent to address >> >> [msg=TcpDiscoveryClientReconnectMessage >> >> [routerNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, >> >> lastMsgId=c42b1bc2761-bbc61e78-97e8-49cd-844e-2dc3e8aacc68, >> >> super=TcpDiscoveryAbstractMessage [sndNodeId=null, >> >> id=48722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, >> >> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, >> >> isClient=true]], addr=/10.48.14.1:47500, >> >> rmtNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20] >> >> 2018-11-19 16:00:51,365 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Received response to join request >> >> [addr=/10.48.14.1:47500, res=100] >> >> 2018-11-19 16:00:51,365 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Will wait before retry join. >> >> 2018-11-19 16:00:53,365 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request >> >> [addr=/10.48.14.1:47500, reconnect=true, >> >> locNodeId=cd323c53-d1de-4608-8eec-e373b1f68b71] >> >> 2018-11-19 16:00:53,368 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Message has been sent to address >> >> [msg=TcpDiscoveryClientReconnectMessage >> >> [routerNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, >> >> lastMsgId=c42b1bc2761-bbc61e78-97e8-49cd-844e-2dc3e8aacc68, >> >> super=TcpDiscoveryAbstractMessage [sndNodeId=null, >> >> id=a8722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, >> >> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, >> >> isClient=true]], addr=/10.48.14.1:47500, >> >> rmtNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20] >> >> 2018-11-19 16:00:53,368 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Received response to join request >> >> [addr=/10.48.14.1:47500, res=1] >> >> 2018-11-19 16:00:53,372 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Received reconnect response >> >> [success=false, msg=TcpDiscoveryClientReconnectMessage >> >> [routerNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, >> >> lastMsgId=c42b1bc2761-bbc61e78-97e8-49cd-844e-2dc3e8aacc68, >> >> super=TcpDiscoveryAbstractMessage [sndNodeId=null, >> >> id=a8722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, >> >> verifierNodeId=9c68d70f-883e-4a21-938b-05f3f6f98d20, topVer=0, >> >> pendingIdx=0, failedNodes=null, isClient=true]]] >> >> 2018-11-19 16:00:53,372 [tcp-client-disco-reconnector-#5] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Grid runnable finished normally: >> >> tcp-client-disco-reconnector-#5 >> >> 2018-11-19 16:00:53,373 [tcp-client-disco-msg-worker-#4] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to restore closed connection, >> >> will try to reconnect [networkTimeout=360000, joinTimeout=0, >> failMsg=null] >> >> 2018-11-19 16:00:53,373 [tcp-client-disco-msg-worker-#4] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Discovery notification >> >> [node=TcpDiscoveryNode [id=cd323c53-d1de-4608-8eec-e373b1f68b71, >> >> addrs=[10.37.92.216], >> >> sockAddrs=[kafka-ignite-streamer-5bdb96f7d6-dfbhc/10.37.92.216:0], >> >> discPort=0, order=5, intOrder=0, lastExchangeTime=1542643135842, >> >> loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=true], >> >> type=CLIENT_NODE_DISCONNECTED, topVer=6] >> >> 2018-11-19 16:00:53,443 [tcp-client-disco-msg-worker-#4] INFO >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Client node disconnected from cluster, >> >> will try to reconnect with new id >> >> [newId=af2e6c3b-7d68-44f6-ad64-e403e560b800, >> >> prevId=cd323c53-d1de-4608-8eec-e373b1f68b71, locNode=TcpDiscoveryNode >> >> [id=cd323c53-d1de-4608-8eec-e373b1f68b71, addrs=[10.37.92.216], >> >> sockAddrs=[kafka-ignite-streamer-5bdb96f7d6-dfbhc/10.37.92.216:0], >> >> discPort=0, order=5, intOrder=0, lastExchangeTime=1542643135842, >> >> loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=true]] >> >> 2018-11-19 16:00:53,482 [tcp-client-disco-msg-worker-#4] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Resolved addresses from IP finder: >> >> [/10.48.14.1:47500] >> >> 2018-11-19 16:00:53,482 [tcp-client-disco-msg-worker-#4] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Send join request >> >> [addr=/10.48.14.1:47500, reconnect=false, >> >> locNodeId=af2e6c3b-7d68-44f6-ad64-e403e560b800] >> >> 2018-11-19 16:00:53,553 [tcp-client-disco-msg-worker-#4] ERROR >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Runtime error caught during grid >> >> runnable execution: IgniteSpiThread [name=tcp-client-disco-msg-worker-#4] >> >> java.lang.NullPointerException: null >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:666) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:546) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:128) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.tryJoin(ClientImpl.java:1846) >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1757) >> >> at >> >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) >> >> 2018-11-19 16:00:53,553 [tcp-client-disco-msg-worker-#4] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Grid runnable finished normally: >> >> tcp-client-disco-msg-worker-#4 >> >> 2018-11-19 16:06:27,487 [tcp-client-disco-sock-writer-#2] DEBUG >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to get acknowledge for message, >> >> will try to reconnect [msg=TcpDiscoveryClientMetricsUpdateMessage >> >> [super=TcpDiscoveryAbstractMessage [sndNodeId=null, >> >> id=47722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, >> >> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, >> >> isClient=true]], failureDetectionTimeout=360000] >> >> 2018-11-19 16:06:27,487 [tcp-client-disco-sock-writer-#2] ERROR >> >> o.a.i.s.d.tcp.TcpDiscoverySpi - Failed to send message: null >> >> java.io.IOException: Failed to get acknowledge for message: >> >> TcpDiscoveryClientMetricsUpdateMessage >> >> [super=TcpDiscoveryAbstractMessage [sndNodeId=null, >> >> id=47722bc2761-cd323c53-d1de-4608-8eec-e373b1f68b71, >> >> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, >> >> isClient=true]] >> >> at >> >> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1302) >> >> at >> >> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) >> >> >> >> >> >