I am using 2 Azure virtual machines, both part of the same VNET. The network
security policy is applied at the network level and allows all traffic
in/out from the VNET.

I am trying to run a cluster across both machines. Seeing as Azure does not
support broadcast I am using a static IP list, which are the private IPs of
my two machines like such:

   <discoverySpi type='TcpDiscoverySpi'>
        <ipFinder type='TcpDiscoveryStaticIpFinder'>
            <endpoints>
                <string>10.0.2.5:47500..47509</string>
                <string>10.0.2.11:47500..47509</string>
            </endpoints>
        </ipFinder>
    </discoverySpi>

There is not much going on in the way of error messages but the connection
from 11 -> 5 must keep dropping or some other issue is preventing that node
joining.

On the 11 machine I have these log entries, indicating its found the other
node, but then seems to immediately lose the connection?

/[06:50:17,775][INFO][tcp-disco-msg-worker-#2][GridEncryptionManager]
Joining node doesn't have encryption data
[node=41518d17-a16a-48a0-9656-cd3d2b6e0042]
[06:50:17,800][INFO][tcp-disco-msg-worker-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false]]
[06:50:22,779][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.0.2.5, rmtPort=52234]
[06:50:22,779][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.0.2.5, rmtPort=52234]
[06:50:22,780][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.0.2.5:52234, rmtPort=52234]
[06:50:22,788][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.0.2.5:52234, rmtPort=52234
[06:50:22,805][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Failed to
send message to next node [msg=TcpDiscoveryNodeAddedMessage
[node=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@8b3b66a,
discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null,
gridStartTime=1574750964272, super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=aa79676ae61-e3b1d49a-a023-435a-961c-13394c08ad0b,
verifierNodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, topVer=0, pendingIdx=0,
failedNodes=null, isClient=false]], next=TcpDiscoveryNode
[id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5,
10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500,
/10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false], errMsg=Failed to send message to next node
[msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode
[id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5,
10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500,
/10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@8b3b66a,
discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null,
gridStartTime=1574750964272, super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=aa79676ae61-e3b1d49a-a023-435a-961c-13394c08ad0b,
verifierNodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, topVer=0, pendingIdx=0,
failedNodes=null, isClient=false]], next=ClusterNode
[id=41518d17-a16a-48a0-9656-cd3d2b6e0042, order=0, addr=[0:0:0:0:0:0:0:1,
10.0.2.5, 10.0.75.1, 127.0.0.1], daemon=false]]]
[06:50:22,806][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Local node
has detected failed nodes and started cluster-wide procedure. To speed up
failure detection please see 'Failure Detection' section under javadoc for
'TcpDiscoverySpi'
[06:50:22,812][INFO][disco-event-worker-#42][GridDiscoveryManager] Added new
node to topology: TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=2, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false]
[06:50:22,812][INFO][disco-event-worker-#42][GridDiscoveryManager] Topology
snapshot [ver=2, locNode=e3b1d49a, servers=2, clients=0, state=ACTIVE,
CPUs=8, offheap=6.4GB, heap=7.1GB]
[06:50:22,816][INFO][exchange-worker-#43][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=e3b1d49a-a023-435a-961c-13394c08ad0b,
crdVer=1574750964273, topVer=AffinityTopologyVersion [topVer=1,
minorTopVer=0]], mvccCrdChange=false, crd=true, evt=NODE_JOINED,
evtNode=41518d17-a16a-48a0-9656-cd3d2b6e0042, customEvt=null,
allowMerge=true]
[06:50:22,816][WARNING][disco-event-worker-#42][GridDiscoveryManager] Node
FAILED: TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=2, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false]
[06:50:22,819][INFO][disco-event-worker-#42][GridDiscoveryManager] Topology
snapshot [ver=3, locNode=e3b1d49a, servers=1, clients=0, state=ACTIVE,
CPUs=4, offheap=3.2GB, heap=3.6GB]
/

On the other machine the log shows only this:

/[06:50:22,774][WARNING][main][TcpDiscoverySpi] Node has not been connected
to topology and will repeat join process. Check remote nodes logs for
possible error messages. Note that large topology may require significant
time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration
property if getting this message on the starting nodes
[networkTimeout=5000]/

I tried changing the timeout to 2000 but it made no difference.

Any ideas? Seems like an incredibly simple setup so must be something
obvious.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to