Thanks for the suggestion, but unfortunately it makes no difference. All three nodes are now using the same configuration, except that I've put each machine's local IP address at the top of the list:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> <property name="addresses"> <list> <value>192.168.56.1</value> <!--windows/host--> <value>192.168.56.101</value> <!--linux1--> <value>192.168.56.102</value> <!--linux2--> </list> </property> </bean> </property> </bean> </property> </bean> </beans> I've noticed something interesting. If I start the Windows node first followed by just one of the Linux nodes, then the Linux node doesn't seem to be able to maintain a stable connection, and repeatedly connects then disconnects: [10:00:32] Topology snapshot [ver=1, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:01:00] Topology snapshot [ver=3, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:01:41] Topology snapshot [ver=7, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:01:41] Topology snapshot [ver=7, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:02:21] Topology snapshot [ver=11, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:02:21] Topology snapshot [ver=11, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:02:42] Topology snapshot [ver=13, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:02:42] Topology snapshot [ver=13, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:06:25] Topology snapshot [ver=35, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:06:25] Topology snapshot [ver=35, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:07:46] Topology snapshot [ver=43, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:07:46] Topology snapshot [ver=43, servers=1, clients=0, CPUs=8, heap=1.0GB] This is from the log (happens every 20 seconds): [10:07:46,035][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=a5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs=[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.2.15:47500, /10.0.2.15:47500, / 127.0.0.1:47500, /192.168.56.102:47500], discPort=47500, order=42, intOrder=22, lastExchangeTime=1464080845973, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false] [10:07:46,035][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology snapshot [ver=43, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:07:46,036][WARNING][disco-event-worker-#46%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=a5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs=[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, / 10.0.2.15:47500, /10.0.2.15:47500, /127.0.0.1:47500, /192.168.56.102:47500], discPort=47500, order=42, intOrder=22, lastExchangeTime=1464080845973, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false] [10:07:46,036][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology snapshot [ver=43, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:07:46,043][INFO][exchange-worker-#49%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=42, minorTopVer=0], evt=NODE_JOINED, node=a5982ff4-a30e-479d-b4c4-d2f18880d100] [10:07:46,049][INFO][exchange-worker-#49%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=43, minorTopVer=0], evt=NODE_FAILED, node=a5982ff4-a30e-479d-b4c4-d2f18880d100] [10:07:56,298][WARNING][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout. Current timeout: 9760. Thanks, Graham On 23 May 2016 at 16:00, vkulichenko <[email protected]> wrote: > Graham, > > Default config means that multicast is used for discovery. Can you try > static IP configuration [1] and see if the issue is reproduced? > > [1] > > https://apacheignite.readme.io/docs/cluster-config#static-ip-based-discovery > > -Val > > > > -- > View this message in context: > http://apache-ignite-users.70518.x6.nabble.com/Nodes-running-on-different-operating-systems-tp5098p5126.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. >
