Hi,
I am trying to create a cluster of apache ignite server containers but unable to bring it up. *Setup:* To start, I have created two VMs on two separate host machines and trying to launch one Apache Ignite server container (Docker) on each VMs . The VMs are accessible using floating IP (e.g., VM1-172.26.116.67, VM2-172.26.116.150) and containers are using host networking. The containers are also pinging each other. *Testing: * I am using the $IGNITE_HOME/bin/ignite.sh, but have changed the default configuration to enable discovery. <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="cacheConfiguration"> <bean class="org.apache.ignite.configuration.CacheConfiguration"> *<property name="offHeapMaxMemory" value="0"/>* </bean> </property> <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. --> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="localPort" value="47500"/> *<property name="networkTimeout" value="20000" />* <property name="ipFinder"> <!--bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder" --> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.cloud.TcpDiscoveryCloudIpFinder"/> <property name="addresses"> <list> <!-- In distributed environment, replace with actual host IP address.> --> *<value>127.0.0.1:47100..47509</value>* * <value>172.26.116.67:47100..47509</value>* </list> </property> </bean> </property> <property name="ackTimeout" value="50"/> <property name="socketTimeout" value="200"/> <property name="heartbeatFrequency" value="100"/> </bean> </property> <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <!-- Override local port. --> <property name="localPort" value="47100"/> *<property name="sharedMemoryPort" value="-1"/>* </bean> </property> </bean> *Issue:* When I start the 1st Apache Ignite server container on VM1, I see warnings related to remote node GC pauses even though I tuned off heap memory (<property name="offHeapMaxMemory" value="0"/>), and also, no remote node is running. (verified through visor) [22:55:24,132][INFO][main][TcpCommunicationSpi] Successfully bound to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0] [22:55:24,854][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0] *[22:55:26,379][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=50]* *[22:55:26,482][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=100]* *[22:55:26,684][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=200]* *[22:55:27,086][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=400]* *[22:55:27,888][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=800]* *[22:55:29,491][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=1600]* *[22:55:32,696][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=3200]* *[22:55:39,098][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=6400]* *[22:55:51,904][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=12800]* *[22:56:17,523][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=25600]* [22:56:18,033][WARNING][main][GridCacheProcessor] *Eviction policy not enabled with ONHEAP_TIERED mode for cache* (entries will not be moved to off-heap store): default [22:56:18,120][SEVERE][grid-nio-worker-1-#38%null%][GridDirectParser] *Failed to read message* [msg=null, buf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768], reader=null, ses=GridSelectorNioSessionImpl [sele ctorIdx=1, queueSize=1, writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768], recovery=null, super=GridNioSessionImpl [locAddr=/127.0.0.1:47 100, rmtAddr=/127.0.0.1:48819, createTime=1480460126370, closeTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480460178019, lastSndTime=1480460178114, lastRcvTime=1480460178114, readsPaused=false, filterChai n=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@330f5ec2, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]]] …………… [22:56:18,127][WARNING][grid-nio-worker-1-#38%null%][TcpCommunicationSpi] *Failed to process selector key (will close):* GridSelectorNioSessionImpl [selectorIdx=1, queueSize=1, writeBuf=java.nio.DirectByteBuffer [pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768], recovery=null, super=GridNioSessionImpl [locAddr=/ 127.0.0.1:47100, rmtAddr=/127.0.0.1:48819, createTime=1480460126370, c loseTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480460178019, lastSndTime=1480460178114, lastRcvTime=1480460178114, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i. util.nio.GridDirectParser@330f5ec2, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]] [22:56:18,127][SEVERE][grid-nio-worker-1-#38%null%][TcpCommunicationSpi] Closing NIO session because of unhandled exception. …………….. After few seconds this node starts with the below logs [22:56:18,573][INFO][exchange-worker-#47%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=1, minorTopVer=0], evt=NODE_JOINED, node=6a09c54 d-9cbd-42f8-bdb9-522dc438ce1c] ……… [22:56:18,633][INFO][main][GridDiscoveryManager] *Topology snapshot [ver=1, servers=1, clients=0, CPUs=2, heap=1.0GB]* [22:56:28,053][INFO][ignite-update-notifier-timer][GridUpdateNotifier] Your version is up to date. As soon as I start the 2nd Apache Ignite server container on VM2, I get the below logs although I have increased the default network timeout (<property name="networkTimeout" value="20000" />) [23:18:10,313][INFO][main][TcpCommunicationSpi] Successfully bound to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0] [23:18:10,946][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0] *[23:18:12,410][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=50]* *[23:18:12,512][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=100]* *[23:18:12,715][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=200]* *[23:18:13,117][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=400]* *[23:18:13,919][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=800]* *[23:18:15,523][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=1600]* *[23:18:18,728][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=3200]* *[23:18:25,136][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=6400]* *[23:18:37,942][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=12800]* *[23:19:03,569][WARNING][main][TcpDiscoverySpi] Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node) [curTimeout=25600]* ……………. [23:19:55,021][SEVERE][grid-nio-worker-1-#38%null%][GridDirectParser] *Failed to read message* [msg=null, buf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768], reader=null, ses=GridSelectorNioSessionImpl [sele ctorIdx=1, queueSize=1, writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768], recovery=null, super=GridNioSessionImpl [locAddr=/172.20.29.33 :47100, rmtAddr=/172.26.116.67:41952, createTime=1480461492404, closeTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480461594989, lastSndTime=1480461595001, lastRcvTime=1480461595001, readsPaused=false, fil terChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@9df8cc3, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]]] class org.apache.ignite.IgniteException: Invalid message type: -84 ………………. [23:19:55,058][WARNING][grid-nio-worker-0-#37%null%][TcpCommunicationSpi] *Failed to process selector key* (will close): GridSelectorNioSessionImpl [selectorIdx=0, queueSize=1, writeBuf=java.nio.DirectByteBuffer [pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768], recovery=null, super=GridNioSessionImpl [locAddr=/ 172.20.29.33:47100, rmtAddr=/172.26.116.67:54660, createTime=148046149 2362, closeTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480461594989, lastSndTime=1480461595011, lastRcvTime=1480461595011, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o .a.i.i.util.nio.GridDirectParser@9df8cc3, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]] [23:19:55,058][SEVERE][grid-nio-worker-0-#37%null%][TcpCommunicationSpi] Closing NIO session because of unhandled exception. ……………… [23:19:55,610][INFO][exchange-worker-#47%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=1, minorTopVer=0], evt=NODE_JOINED, node=1af3184 3-483e-42b0-90e3-0afc7f772d61] ……………… [23:19:55,351][WARNING][main][GridCacheProcessor] Eviction policy not enabled with ONHEAP_TIERED mode for cache (entries will not be moved to off-heap store): default …………………. [23:19:55,810][INFO][main][GridDiscoveryManager] *Topology snapshot [ver=1, servers=1, clients=0, CPUs=2, heap=1.0GB]* [23:20:05,051][INFO][ignite-update-notifier-timer][GridUpdateNotifier] Your version is up to date. The servers are not joining. Can you please help. I am attaching both the log files for reference. Let me know if you need any further information. Thanks & Regards, Piali Mazumder Nath VM1-ignite-6a09c54d.0.log (169K) <http://apache-ignite-users.70518.x6.nabble.com/attachment/9287/0/VM1-ignite-6a09c54d.0.log> VM2-ignite-1af31843.0.log (152K) <http://apache-ignite-users.70518.x6.nabble.com/attachment/9287/1/VM2-ignite-1af31843.0.log> -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Unable-to-create-cluster-of-Apache-Ignite-Server-Containers-running-on-individual-VMs-tp9287.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.