Unable to create cluster of Apache Ignite Server Containers running on individual VMs

piali Tue, 29 Nov 2016 20:27:39 -0800

Hi,


I am trying to create a cluster of apache ignite server containers but
unable to bring it up.


*Setup:*

To start, I have created two VMs on two separate host machines and trying
to launch one Apache Ignite server container (Docker) on each VMs .

The VMs are accessible using floating IP (e.g., VM1-172.26.116.67,
VM2-172.26.116.150) and containers are using host networking.

The containers are also pinging each other.


*Testing: *

I am using the $IGNITE_HOME/bin/ignite.sh, but have changed the default
configuration to enable discovery.



    <bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">

      <property name="cacheConfiguration">

        <bean class="org.apache.ignite.configuration.CacheConfiguration">

          *<property name="offHeapMaxMemory" value="0"/>*

        </bean>

      </property>



      <!-- Explicitly configure TCP discovery SPI to provide list of
initial nodes. -->

        <property name="discoverySpi">

            <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">

                <property name="localPort" value="47500"/>

                *<property name="networkTimeout" value="20000" />*

                <property name="ipFinder">

                    <!--bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"
-->

                    <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.cloud.TcpDiscoveryCloudIpFinder"/>

                        <property name="addresses">

                            <list>

                                <!-- In distributed environment, replace
with actual host IP address.> -->

                                *<value>127.0.0.1:47100..47509</value>*

*                                <value>172.26.116.67:47100..47509</value>*

                            </list>

                        </property>

                    </bean>

                </property>

                <property name="ackTimeout" value="50"/>

                <property name="socketTimeout" value="200"/>

                <property name="heartbeatFrequency" value="100"/>

            </bean>

        </property>

        <property name="communicationSpi">

          <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">

             <!-- Override local port. -->

             <property name="localPort" value="47100"/>

             *<property name="sharedMemoryPort" value="-1"/>*

          </bean>

      </property>

    </bean>

*Issue:*


When I start the 1st Apache Ignite server container on VM1, I see warnings
related to remote node GC pauses even though I tuned off heap memory (<property
name="offHeapMaxMemory" value="0"/>), and also, no remote node is running.
(verified through visor)


[22:55:24,132][INFO][main][TcpCommunicationSpi] Successfully bound to TCP
port [port=47100, locHost=0.0.0.0/0.0.0.0]

[22:55:24,854][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port
[port=47500, localHost=0.0.0.0/0.0.0.0]

*[22:55:26,379][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=50]*

*[22:55:26,482][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=100]*

*[22:55:26,684][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=200]*

*[22:55:27,086][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=400]*

*[22:55:27,888][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=800]*

*[22:55:29,491][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=1600]*

*[22:55:32,696][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=3200]*

*[22:55:39,098][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=6400]*

*[22:55:51,904][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=12800]*

*[22:56:17,523][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=25600]*

[22:56:18,033][WARNING][main][GridCacheProcessor] *Eviction policy not
enabled with ONHEAP_TIERED mode for cache* (entries will not be moved to
off-heap store): default

[22:56:18,120][SEVERE][grid-nio-worker-1-#38%null%][GridDirectParser] *Failed
to read message* [msg=null, buf=java.nio.DirectByteBuffer[pos=5 lim=420
cap=32768], reader=null, ses=GridSelectorNioSessionImpl [sele

ctorIdx=1, queueSize=1, writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768],
recovery=null, super=GridNioSessionImpl [locAddr=/127.0.0.1:47

100, rmtAddr=/127.0.0.1:48819, createTime=1480460126370, closeTime=0,
bytesSent=0, bytesRcvd=420, sndSchedTime=1480460178019,
lastSndTime=1480460178114, lastRcvTime=1480460178114, readsPaused=false,
filterChai

n=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@330f5ec2, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]]

……………

[22:56:18,127][WARNING][grid-nio-worker-1-#38%null%][TcpCommunicationSpi]
*Failed
to process selector key (will close):* GridSelectorNioSessionImpl
[selectorIdx=1, queueSize=1, writeBuf=java.nio.DirectByteBuffer

[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5
lim=420 cap=32768], recovery=null, super=GridNioSessionImpl [locAddr=/
127.0.0.1:47100, rmtAddr=/127.0.0.1:48819, createTime=1480460126370, c

loseTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480460178019,
lastSndTime=1480460178114, lastRcvTime=1480460178114, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.

util.nio.GridDirectParser@330f5ec2, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]

[22:56:18,127][SEVERE][grid-nio-worker-1-#38%null%][TcpCommunicationSpi]
Closing NIO session because of unhandled exception.

……………..



After few seconds this node starts with the below logs

[22:56:18,573][INFO][exchange-worker-#47%null%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=1, minorTopVer=0], evt=NODE_JOINED, node=6a09c54

d-9cbd-42f8-bdb9-522dc438ce1c]

………

[22:56:18,633][INFO][main][GridDiscoveryManager] *Topology snapshot [ver=1,
servers=1, clients=0, CPUs=2, heap=1.0GB]*

[22:56:28,053][INFO][ignite-update-notifier-timer][GridUpdateNotifier] Your
version is up to date.



As soon as I start the 2nd Apache Ignite server container on VM2, I get the
below logs although I have increased the default network timeout (<property
name="networkTimeout" value="20000" />)

[23:18:10,313][INFO][main][TcpCommunicationSpi] Successfully bound to TCP
port [port=47100, locHost=0.0.0.0/0.0.0.0]

[23:18:10,946][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port
[port=47500, localHost=0.0.0.0/0.0.0.0]

*[23:18:12,410][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=50]*

*[23:18:12,512][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=100]*

*[23:18:12,715][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=200]*

*[23:18:13,117][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=400]*

*[23:18:13,919][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=800]*

*[23:18:15,523][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=1600]*

*[23:18:18,728][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=3200]*

*[23:18:25,136][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=6400]*

*[23:18:37,942][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=12800]*

*[23:19:03,569][WARNING][main][TcpDiscoverySpi] Timed out waiting for
message to be read (most probably, the reason is in long GC pauses on
remote node) [curTimeout=25600]*

…………….

[23:19:55,021][SEVERE][grid-nio-worker-1-#38%null%][GridDirectParser] *Failed
to read message* [msg=null, buf=java.nio.DirectByteBuffer[pos=5 lim=420
cap=32768], reader=null, ses=GridSelectorNioSessionImpl [sele

ctorIdx=1, queueSize=1, writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5 lim=420 cap=32768],
recovery=null, super=GridNioSessionImpl [locAddr=/172.20.29.33

:47100, rmtAddr=/172.26.116.67:41952, createTime=1480461492404,
closeTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480461594989,
lastSndTime=1480461595001, lastRcvTime=1480461595001, readsPaused=false, fil

terChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@9df8cc3, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]]

class org.apache.ignite.IgniteException: Invalid message type: -84

……………….

[23:19:55,058][WARNING][grid-nio-worker-0-#37%null%][TcpCommunicationSpi]
*Failed
to process selector key* (will close): GridSelectorNioSessionImpl
[selectorIdx=0, queueSize=1, writeBuf=java.nio.DirectByteBuffer

[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5
lim=420 cap=32768], recovery=null, super=GridNioSessionImpl [locAddr=/
172.20.29.33:47100, rmtAddr=/172.26.116.67:54660, createTime=148046149

2362, closeTime=0, bytesSent=0, bytesRcvd=420, sndSchedTime=1480461594989,
lastSndTime=1480461595011, lastRcvTime=1480461595011, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o

.a.i.i.util.nio.GridDirectParser@9df8cc3, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]

[23:19:55,058][SEVERE][grid-nio-worker-0-#37%null%][TcpCommunicationSpi]
Closing NIO session because of unhandled exception.

………………

 
[23:19:55,610][INFO][exchange-worker-#47%null%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=1, minorTopVer=0], evt=NODE_JOINED, node=1af3184

3-483e-42b0-90e3-0afc7f772d61]

………………

[23:19:55,351][WARNING][main][GridCacheProcessor] Eviction policy not
enabled with ONHEAP_TIERED mode for cache (entries will not be moved to
off-heap store): default

………………….

[23:19:55,810][INFO][main][GridDiscoveryManager] *Topology snapshot [ver=1,
servers=1, clients=0, CPUs=2, heap=1.0GB]*

[23:20:05,051][INFO][ignite-update-notifier-timer][GridUpdateNotifier] Your
version is up to date.



The servers are not joining.

Can you please help.


I am attaching both the log files for reference.

Let me know if you need any further information.



Thanks & Regards,

Piali Mazumder Nath


VM1-ignite-6a09c54d.0.log (169K) 
<http://apache-ignite-users.70518.x6.nabble.com/attachment/9287/0/VM1-ignite-6a09c54d.0.log>
VM2-ignite-1af31843.0.log (152K) 
<http://apache-ignite-users.70518.x6.nabble.com/attachment/9287/1/VM2-ignite-1af31843.0.log>




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Unable-to-create-cluster-of-Apache-Ignite-Server-Containers-running-on-individual-VMs-tp9287.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Unable to create cluster of Apache Ignite Server Containers running on individual VMs

Reply via email to