Hi Riguz, > Start 6 new nodes with new configuration N=11, while keeping the previous nodes running
This step probably won't work as expected since it will create a new group but not adding nodes to the original group. We must use the setConfiguration API to change configuration (add/remove nodes); see https://github.com/apache/ratis/blob/bd83e7d7fd41540c8bda6bd92a52ac99ccec2076/ratis-client/src/main/java/org/apache/ratis/client/api/AdminApi.java#L35 Hope it helps. Thanks a lot for trying Ratis! Tsz-Wo On Fri, Jul 1, 2022 at 12:30 AM Riguz Lee <[email protected]> wrote: > > > Hi, > > > I'm testing scaling up/down the raft cluster, but ratis is not working as > expected in new cluster. My steps are: > > > * Initialize a cluster with 5 nodes, the size and peers of the cluster is > configured in a configuration file, let's say N=5. The cluster works > perfectly, raft logs are synchronized across the cluster. > > * Start 6 new nodes with new configuration N=11, while keeping the > previous nodes running > > * Recreate the previous nodes with N=11 one by one > > > According the raft paper, raft should be able to handle configuration > change by design, but after the above steps, what I've found is that: > > > - New nodes not able to join the cluster > > - Old nodes still has a size of 5(by > *client.getGroupManagementApi(peerId).info(groupId)*) > > > So how should I scale the cluster correctly? A few thoughts of mine: > > > * Definitely the old cluster should not be stopped while starting new > nodes, otherwise new nodes might be able to elect new leader(eg. N=11 with > 6 new nodes) and raft logs in old nodes will be overriden. > > * It's possible to update the old configuration first by using > client.admin().setConfiguration(), let's say set N=11 first, then start new > nodes. However, since 5 < 11/2, the cluster won't be able to elect leader > until at least 1 new node join. > > * Or may be we should limit the count when scaling? From N=5 -> N=7 -> N=9 > -> N=11. > > > Thanks, > > Riguz Lee > > > >
