Hi,



I'm testing scaling up/down the raft cluster, but ratis is not working as 
expected in new cluster. My steps are:




* Initialize a cluster with 5 nodes, the size and peers of the cluster is 
configured in a configuration file, let's say N=5. The cluster works perfectly, 
raft logs are synchronized across the cluster.

* Start 6 new nodes with new configuration N=11, while keeping the previous 
nodes running

* Recreate the previous nodes with N=11 one by one




According the raft paper, raft should be able to handle configuration change by 
design, but after the above steps, what I've found is that:




- New nodes not able to join the cluster

- Old nodes still has a size of 5(by 
client.getGroupManagementApi(peerId).info(groupId))




So how should I scale the cluster correctly? A few thoughts of mine:




* Definitely the old cluster should not be stopped while starting new nodes, 
otherwise new nodes might be able to elect new leader(eg. N=11 with 6 new 
nodes)  and raft logs in old nodes will be overriden.

* It's possible to update the old configuration first by using 
client.admin().setConfiguration(), let's say set N=11 first, then start new 
nodes. However,  since 5 < 11/2, the cluster won't be able to elect leader 
until at least 1 new node join. 

* Or may be we should limit the count when scaling? From N=5 -&gt; N=7 -&gt; 
N=9 -&gt; N=11. 




Thanks,

Riguz Lee

Reply via email to