Hi Alexey,

The current design is that the group info (RaftConfiguration) is persisted
to the RaftLog and it requires a Leader to do that.

In the test, the Leader had not yet been elected before the server was
stopped. As a result, the group was not persisted.

This problem happens only for starting a new server since RaftLog is empty.
When the log is non-empty the first log entry has the group information.
Could you simply pass the group peers to the RaftServer.Builder even for
RECOVER as a workaround?

Tsz-Wo


On Fri, May 23, 2025 at 9:30 AM Alexey Goncharuk <[email protected]>
wrote:

> I tried to rework my code to pass the initial group configuration to
> the RaftServer builder, but the bootstrap group is still lost if the server
> is stopped quick enough (it seems that the initial configuration is
> persisted asynchronously). I created a ticket RATIS-2306 for this.
> I am not sure about the complexity of the fix, but I can give it a shot if
> you think it is simple enough and point me in the right direction.
>
> --Alexey
>
> ср, 21 мая 2025 г. в 06:41, Tsz-Wo Nicholas Sze <[email protected]>:
>
>> Hi Alexey,
>>
>> We have [1] for "Ratis Membership Change".  It would be great if you
>> could update it to include the single group case.  We may rename the
>> title to "Raft Group Membership".
>>
>> Thanks a lot for offering your help!
>>
>> Tsz-Wo
>> [1]
>> https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/membership-change.md
>>
>>
>> On Tue, May 20, 2025 at 12:49 PM Alexey Goncharuk <
>> [email protected]> wrote:
>>
>>> Got it, thanks! If there is a documentation page I can update (unless
>>> this info is already there and I missed it), I'll be happy to summarize
>>> this.
>>>
>>> вт, 20 мая 2025 г. в 18:51, Tsz-Wo Nicholas Sze <[email protected]>:
>>>
>>>> Hi Alexey,
>>>>
>>>> > No, I do not need multi-raft. ...
>>>>
>>>> For non-multi-raft, just build the RaftServer with the group.  We need
>>>> to FORMAT it the first time and then keep using RECOVER.
>>>>
>>>> > ... the builder accepts a group during construction, but it also
>>>> accepts the RECOVER startup option.  ...
>>>>
>>>> When a server starts, it will use the group id in the specified group
>>>> to read the corresponding local directory.  Then, it either formats
>>>> (creates a new directory) or recovers from an existing directory.
>>>>
>>>> For RECOVER, it reads the latest group information from the local
>>>> storage.  You are right that the group peers passed to the builder will be
>>>> ignored.  Only the group id is used.
>>>>
>>>> Tsz-Wo
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, May 19, 2025 at 2:18 AM Alexey Goncharuk <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Tsz-Wo, thanks for the reply!
>>>>>
>>>>> No, I do not need multi-raft. I saw the server builder, however I am a
>>>>> bit confused regarding building the server. I see that the builder accepts
>>>>> a group during construction, but it also accepts the RECOVER startup
>>>>> option. Given that there is no way to understand the last committed
>>>>> configuration unless the server is started, does it mean that the group
>>>>> passed to the server builder should be treated as a 'bootstrap' group and
>>>>> it is ignored when server recovers and knows that there was a group
>>>>> reconfiguration?
>>>>>
>>>>> --Alexey
>>>>>
>>>>> сб, 17 мая 2025 г. в 21:19, Tsz-Wo Nicholas Sze <[email protected]>:
>>>>>
>>>>>> Hi Alexey,
>>>>>>
>>>>>> First of all, does your application need multi-Raft, i.e. multiple
>>>>>> Raft groups?  For the single group case (non-multi-Raft), we should build
>>>>>> the servers with the group but not using addGroup.
>>>>>>
>>>>>> As specified in the javadoc of GroupManagementApi, addGroup is an
>>>>>> operation applying to a particular server; see [1]. You may take a look 
>>>>>> at
>>>>>> GroupManagementBaseTest [2].
>>>>>>
>>>>>> Hope it helps!  Please feel free to let us know if you have more
>>>>>> questions
>>>>>>
>>>>>> Tsz-Wo
>>>>>> [1]
>>>>>> https://github.com/apache/ratis/blob/65fd4445335d0500fd372f37c8b7cb3c39259e87/ratis-client/src/main/java/org/apache/ratis/client/api/GroupManagementApi.java#L29
>>>>>> [2]
>>>>>> https://github.com/apache/ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/server/impl/GroupManagementBaseTest.java
>>>>>>
>>>>>> On Fri, May 16, 2025 at 10:44 AM Alexey Goncharuk <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hello Ratis community,
>>>>>>>
>>>>>>> I am trying to understand what is the proper way to initialize a
>>>>>>> Ratis group. My expectation is that the group itself should maintain the
>>>>>>> list of peers through configuration changes and the raft log, and thus
>>>>>>> setting the list of peers is required only during the initial group 
>>>>>>> setup.
>>>>>>> However, what I observe in tests is that the initial list of peers 
>>>>>>> passed
>>>>>>> to addGroup request is restored only when the corresponding 
>>>>>>> configuration
>>>>>>> change has been committed by Ratis. More specifically, I see the 
>>>>>>> following
>>>>>>> behavior:
>>>>>>>  * Create a Raft server S1, initialize a group G1 [peers: S1. S2]. I
>>>>>>> get a reply that the group was successfully added, but the configuration
>>>>>>> change is not committed because S2 does not exist and thus no leader 
>>>>>>> can be
>>>>>>> elected
>>>>>>>  *  I stop the server S1 and then restart it with RECOVERY startup
>>>>>>> option. The group G1 is restored, but it is restored with the empty 
>>>>>>> peers
>>>>>>> list
>>>>>>>
>>>>>>> I was wondering whether this is an expected behavior? I fully
>>>>>>> understand that subsequent configuration changes must go through regular
>>>>>>> raft protocol, but I would expect that the initial configuration setup 
>>>>>>> is
>>>>>>> 'committed' unconditionally and can be reset with the FORMAT startup 
>>>>>>> option
>>>>>>> if required.
>>>>>>> If this is an expected behavior, I was wondering what is the
>>>>>>> suggested way to do the initial group setup? The potential issue I have 
>>>>>>> in
>>>>>>> mind is as follows: let's say I am setting up a 3-node cluster with a
>>>>>>> proper initial configuration, and addGroup request succeeds on all 
>>>>>>> nodes,
>>>>>>> but shortly after one of the nodes gets disconnected and restarted. The 
>>>>>>> two
>>>>>>> remaining nodes will be able to commit the proposed configuration, 
>>>>>>> however,
>>>>>>> the third node will restart with an empty peers list, so it will require
>>>>>>> another addGroup request to join the cluster. Or am I missing something?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Alexey
>>>>>>>
>>>>>>

Reply via email to