Re: How to correctly scale the raft cluster?

tison Sun, 03 Jul 2022 19:13:05 -0700

Thanks for your explanation. With a closer look I can see that after the
new server install the snapshot it will call state machine `reinitialize'
method and recover the state.


Best,
tison.


Tsz Wo Sze <[email protected]> 于2022年7月4日周一 01:51写道：

> Hi tison,
>
> > When a new node join an existing cluster in this case, how it exactly
> applies logs? Will it applies raft logs from the beginning one by one?
>
> The leader takes a snapshot from time to time and then purges the old log
> entries.  When a new server joins the group, the leader will first send a
> snapshot, if there is any, and then send the remaining log entries.
>
> Tsz-Wo
>
>
> On Sun, Jul 3, 2022 at 9:26 AM tison <[email protected]> wrote:
>
>> Hi Tsz-Wo,
>>
>> I have one question that may be a bit out of topic:
>>
>> When a new node join an existing cluster in this case, how it exactly
>> applies logs? Will it applies raft logs from the beginning one by one? TiKV
>> will sends a snapshot of state machine from leader to the new node in this
>> case to speed up the catch up progress and I'd like to know how Ratis
>> implements catch-up.
>>
>> Best,
>> tison.
>>
>>
>> Tsz Wo Sze <[email protected]> 于2022年7月2日周六 01:31写道：
>>
>>> > * It's possible to update the old configuration first by using
>>> client.admin().setConfiguration(), let's say set N=11 first, then start new
>>> nodes. However, since 5 < 11/2, the cluster won't be able to elect leader
>>> until at least 1 new node join.
>>> Yes, you are right.  Also, even if one node has joined, the group has to
>>> wait for it to catch up with the previous log entries in order to obtain a
>>> majority for committing new entries.
>>>
>>> > * Or may be we should limit the count when scaling? From N=5 -> N=7 ->
>>> N=9 -> N=11.
>>>
>>> We may start the 6 new nodes as listeners first.  Listeners receive log
>>> entries but they are not voting members and they won't be counted for
>>> majority.  When the listeners catch up, we may change them to normal nodes
>>> so that they become voting members.
>>>
>>> Tsz-Wo
>>>
>>>
>>> On Fri, Jul 1, 2022 at 10:23 AM Tsz Wo Sze <[email protected]> wrote:
>>>
>>>> Hi Riguz,
>>>>
>>>> > Start 6 new nodes with new configuration N=11, while keeping the
>>>> previous nodes running
>>>>
>>>> This step probably won't work as expected since it will create a new
>>>> group but not adding nodes to the original group.  We must use the
>>>> setConfiguration API to change configuration (add/remove nodes); see
>>>> https://github.com/apache/ratis/blob/bd83e7d7fd41540c8bda6bd92a52ac99ccec2076/ratis-client/src/main/java/org/apache/ratis/client/api/AdminApi.java#L35
>>>>
>>>> Hope it helps.  Thanks a lot for trying Ratis!
>>>>
>>>> Tsz-Wo
>>>>
>>>> On Fri, Jul 1, 2022 at 12:30 AM Riguz Lee <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> I'm testing scaling up/down the raft cluster, but ratis is not working
>>>>> as expected in new cluster. My steps are:
>>>>>
>>>>>
>>>>> * Initialize a cluster with 5 nodes, the size and peers of the cluster
>>>>> is configured in a configuration file, let's say N=5. The cluster works
>>>>> perfectly, raft logs are synchronized across the cluster.
>>>>>
>>>>> * Start 6 new nodes with new configuration N=11, while keeping the
>>>>> previous nodes running
>>>>>
>>>>> * Recreate the previous nodes with N=11 one by one
>>>>>
>>>>>
>>>>> According the raft paper, raft should be able to handle configuration
>>>>> change by design, but after the above steps, what I've found is that:
>>>>>
>>>>>
>>>>> - New nodes not able to join the cluster
>>>>>
>>>>> - Old nodes still has a size of 5(by
>>>>> *client.getGroupManagementApi(peerId).info(groupId)*)
>>>>>
>>>>>
>>>>> So how should I scale the cluster correctly? A few thoughts of mine:
>>>>>
>>>>>
>>>>> * Definitely the old cluster should not be stopped while starting new
>>>>> nodes, otherwise new nodes might be able to elect new leader(eg. N=11 with
>>>>> 6 new nodes) and raft logs in old nodes will be overriden.
>>>>>
>>>>> * It's possible to update the old configuration first by using
>>>>> client.admin().setConfiguration(), let's say set N=11 first, then start 
>>>>> new
>>>>> nodes. However, since 5 < 11/2, the cluster won't be able to elect leader
>>>>> until at least 1 new node join.
>>>>>
>>>>> * Or may be we should limit the count when scaling? From N=5 -> N=7 ->
>>>>> N=9 -> N=11.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Riguz Lee
>>>>>
>>>>>
>>>>>
>>>>>

Re: How to correctly scale the raft cluster?

Reply via email to