I am using 2 Azure virtual machines, both part of the same VNET. The network security policy is applied at the network level and allows all traffic in/out from the VNET.
I am trying to run a cluster across both machines. Seeing as Azure does not support broadcast I am using a static IP list, which are the private IPs of my two machines like such: <discoverySpi type='TcpDiscoverySpi'> <ipFinder type='TcpDiscoveryStaticIpFinder'> <endpoints> <string>10.0.2.5:47500..47509</string> <string>10.0.2.11:47500..47509</string> </endpoints> </ipFinder> </discoverySpi> There is not much going on in the way of error messages but the connection from 11 -> 5 must keep dropping or some other issue is preventing that node joining. On the 11 machine I have these log entries, indicating its found the other node, but then seems to immediately lose the connection? /[06:50:17,775][INFO][tcp-disco-msg-worker-#2][GridEncryptionManager] Joining node doesn't have encryption data [node=41518d17-a16a-48a0-9656-cd3d2b6e0042] [06:50:17,800][INFO][tcp-disco-msg-worker-#2][TcpDiscoverySpi] New next node [newNext=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2, lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false]] [06:50:22,779][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/10.0.2.5, rmtPort=52234] [06:50:22,779][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/10.0.2.5, rmtPort=52234] [06:50:22,780][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/10.0.2.5:52234, rmtPort=52234] [06:50:22,788][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/10.0.2.5:52234, rmtPort=52234 [06:50:22,805][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2, lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@8b3b66a, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1574750964272, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=aa79676ae61-e3b1d49a-a023-435a-961c-13394c08ad0b, verifierNodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2, lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2, lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@8b3b66a, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1574750964272, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=aa79676ae61-e3b1d49a-a023-435a-961c-13394c08ad0b, verifierNodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, order=0, addr=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], daemon=false]]] [06:50:22,806][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' [06:50:22,812][INFO][disco-event-worker-#42][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, host.docker.internal/10.0.2.11:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false] [06:50:22,812][INFO][disco-event-worker-#42][GridDiscoveryManager] Topology snapshot [ver=2, locNode=e3b1d49a, servers=2, clients=0, state=ACTIVE, CPUs=8, offheap=6.4GB, heap=7.1GB] [06:50:22,816][INFO][exchange-worker-#43][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], mvccCrd=MvccCoordinator [nodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, crdVer=1574750964273, topVer=AffinityTopologyVersion [topVer=1, minorTopVer=0]], mvccCrdChange=false, crd=true, evt=NODE_JOINED, evtNode=41518d17-a16a-48a0-9656-cd3d2b6e0042, customEvt=null, allowMerge=true] [06:50:22,816][WARNING][disco-event-worker-#42][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, host.docker.internal/10.0.2.11:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false] [06:50:22,819][INFO][disco-event-worker-#42][GridDiscoveryManager] Topology snapshot [ver=3, locNode=e3b1d49a, servers=1, clients=0, state=ACTIVE, CPUs=4, offheap=3.2GB, heap=3.6GB] / On the other machine the log shows only this: /[06:50:22,774][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]/ I tried changing the timeout to 2000 but it made no difference. Any ideas? Seems like an incredibly simple setup so must be something obvious. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/