Ignite 2.5 nodes do not rejoin the cluster after restart

szj Tue, 05 Jun 2018 13:39:20 -0700

Hi 

I'm completely new to Ignite. I installed Ignite 2.5 and created a simple
3-node cluster (tried with 2 nodes too). I set a custom ConsistentID for
each node (same as the hostname). I also enabled *persistence* (does it
matter?). The cluster looks active and contains all 3 nodes. I print the
baseline and get:


Cluster state: active 
Current topology version: 39 

Baseline nodes: 
    ConsistentID=hostname1, STATE=ONLINE 
    ConsistentID=hostname2, STATE=ONLINE 
    ConsistentID=hostname3, STATE=ONLINE 

Using sqlline.sh I created an SQL table there WITH "template=replicated". I
added a simple row to the table. Using ignitevisorcmd.sh I can see that the
row I added is in the cache of each node (cache -scan
-c=SQL_PUBLIC_tablename -id8=nodeid) so it does replicate as expected. 

Now on any node I run ignitevisorcmd.sh -e="'open;kill -r -al'" which
triggers the restart of the local node. I also tried kill -k and then manual
startup. I cannot use systemctl because the system does not have systemd
(don't ask why...). This shouldn't be relevant though, right? Systemd would
just kill the process tree. Anyway, to my amazement the node refuses to
rejoin with: 

[22:14:50,844][SEVERE][tcp-disco-msg-worker-#2][TcpDiscoverySpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node
in order to prevent cluster wide instability. 
class org.apache.ignite.IgniteException: Node with BaselineTopology cannot
join mixed cluster running in compatibility mode 
... big stack trace cut ... 

I tried to google the whole Internet to no avail. The doc at
https://apacheignite.readme.io/docs/baseline-topology#section-usage-scenarios
clearly states: 

/Node Restart 
Scenario: a node needs to be restarted. The downtime will be short. 

Steps: 

Stop the node. 
Start the node again. After that: 
No baseline topology-related changes are required. The node preserves its
consistentId after the restart. So the cluster and the baseline topology
just takes the node back. 
If during the node's downtime some data was updated, the data from modified
partitions will be copied to the restarted node from the others. 
/

This obviously doesn't happen in my case. The only way I can start any node
again is removing it with control.sh --baseline remove
consistentIDofThatHost on another node (the host is OFFLINE in the
baseline). After this the node starts and I need to add it back to the
baseline. 

What am I doing wrong? I can't be THAT dumb or can I? What is a "mixed
cluster running in compatibility mode" anyway?

If that matters, my config is practically default, with only consistentId
set on each node and persistenceEnabled:

        <property name="consistentId" value="hostnameX"/>

        <property name="dataStorageConfiguration">
          <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="defaultDataRegionConfiguration">
              <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                <property name="persistenceEnabled" value="true"/>
              </bean>
            </property>
          </bean>
        </property>




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Ignite 2.5 nodes do not rejoin the cluster after restart

Reply via email to