Hi Alexey, The current design is that the group info (RaftConfiguration) is persisted to the RaftLog and it requires a Leader to do that.
In the test, the Leader had not yet been elected before the server was stopped. As a result, the group was not persisted. This problem happens only for starting a new server since RaftLog is empty. When the log is non-empty the first log entry has the group information. Could you simply pass the group peers to the RaftServer.Builder even for RECOVER as a workaround? Tsz-Wo On Fri, May 23, 2025 at 9:30 AM Alexey Goncharuk <[email protected]> wrote: > I tried to rework my code to pass the initial group configuration to > the RaftServer builder, but the bootstrap group is still lost if the server > is stopped quick enough (it seems that the initial configuration is > persisted asynchronously). I created a ticket RATIS-2306 for this. > I am not sure about the complexity of the fix, but I can give it a shot if > you think it is simple enough and point me in the right direction. > > --Alexey > > ср, 21 мая 2025 г. в 06:41, Tsz-Wo Nicholas Sze <[email protected]>: > >> Hi Alexey, >> >> We have [1] for "Ratis Membership Change". It would be great if you >> could update it to include the single group case. We may rename the >> title to "Raft Group Membership". >> >> Thanks a lot for offering your help! >> >> Tsz-Wo >> [1] >> https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/membership-change.md >> >> >> On Tue, May 20, 2025 at 12:49 PM Alexey Goncharuk < >> [email protected]> wrote: >> >>> Got it, thanks! If there is a documentation page I can update (unless >>> this info is already there and I missed it), I'll be happy to summarize >>> this. >>> >>> вт, 20 мая 2025 г. в 18:51, Tsz-Wo Nicholas Sze <[email protected]>: >>> >>>> Hi Alexey, >>>> >>>> > No, I do not need multi-raft. ... >>>> >>>> For non-multi-raft, just build the RaftServer with the group. We need >>>> to FORMAT it the first time and then keep using RECOVER. >>>> >>>> > ... the builder accepts a group during construction, but it also >>>> accepts the RECOVER startup option. ... >>>> >>>> When a server starts, it will use the group id in the specified group >>>> to read the corresponding local directory. Then, it either formats >>>> (creates a new directory) or recovers from an existing directory. >>>> >>>> For RECOVER, it reads the latest group information from the local >>>> storage. You are right that the group peers passed to the builder will be >>>> ignored. Only the group id is used. >>>> >>>> Tsz-Wo >>>> >>>> >>>> >>>> >>>> On Mon, May 19, 2025 at 2:18 AM Alexey Goncharuk < >>>> [email protected]> wrote: >>>> >>>>> Hi Tsz-Wo, thanks for the reply! >>>>> >>>>> No, I do not need multi-raft. I saw the server builder, however I am a >>>>> bit confused regarding building the server. I see that the builder accepts >>>>> a group during construction, but it also accepts the RECOVER startup >>>>> option. Given that there is no way to understand the last committed >>>>> configuration unless the server is started, does it mean that the group >>>>> passed to the server builder should be treated as a 'bootstrap' group and >>>>> it is ignored when server recovers and knows that there was a group >>>>> reconfiguration? >>>>> >>>>> --Alexey >>>>> >>>>> сб, 17 мая 2025 г. в 21:19, Tsz-Wo Nicholas Sze <[email protected]>: >>>>> >>>>>> Hi Alexey, >>>>>> >>>>>> First of all, does your application need multi-Raft, i.e. multiple >>>>>> Raft groups? For the single group case (non-multi-Raft), we should build >>>>>> the servers with the group but not using addGroup. >>>>>> >>>>>> As specified in the javadoc of GroupManagementApi, addGroup is an >>>>>> operation applying to a particular server; see [1]. You may take a look >>>>>> at >>>>>> GroupManagementBaseTest [2]. >>>>>> >>>>>> Hope it helps! Please feel free to let us know if you have more >>>>>> questions >>>>>> >>>>>> Tsz-Wo >>>>>> [1] >>>>>> https://github.com/apache/ratis/blob/65fd4445335d0500fd372f37c8b7cb3c39259e87/ratis-client/src/main/java/org/apache/ratis/client/api/GroupManagementApi.java#L29 >>>>>> [2] >>>>>> https://github.com/apache/ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/server/impl/GroupManagementBaseTest.java >>>>>> >>>>>> On Fri, May 16, 2025 at 10:44 AM Alexey Goncharuk < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello Ratis community, >>>>>>> >>>>>>> I am trying to understand what is the proper way to initialize a >>>>>>> Ratis group. My expectation is that the group itself should maintain the >>>>>>> list of peers through configuration changes and the raft log, and thus >>>>>>> setting the list of peers is required only during the initial group >>>>>>> setup. >>>>>>> However, what I observe in tests is that the initial list of peers >>>>>>> passed >>>>>>> to addGroup request is restored only when the corresponding >>>>>>> configuration >>>>>>> change has been committed by Ratis. More specifically, I see the >>>>>>> following >>>>>>> behavior: >>>>>>> * Create a Raft server S1, initialize a group G1 [peers: S1. S2]. I >>>>>>> get a reply that the group was successfully added, but the configuration >>>>>>> change is not committed because S2 does not exist and thus no leader >>>>>>> can be >>>>>>> elected >>>>>>> * I stop the server S1 and then restart it with RECOVERY startup >>>>>>> option. The group G1 is restored, but it is restored with the empty >>>>>>> peers >>>>>>> list >>>>>>> >>>>>>> I was wondering whether this is an expected behavior? I fully >>>>>>> understand that subsequent configuration changes must go through regular >>>>>>> raft protocol, but I would expect that the initial configuration setup >>>>>>> is >>>>>>> 'committed' unconditionally and can be reset with the FORMAT startup >>>>>>> option >>>>>>> if required. >>>>>>> If this is an expected behavior, I was wondering what is the >>>>>>> suggested way to do the initial group setup? The potential issue I have >>>>>>> in >>>>>>> mind is as follows: let's say I am setting up a 3-node cluster with a >>>>>>> proper initial configuration, and addGroup request succeeds on all >>>>>>> nodes, >>>>>>> but shortly after one of the nodes gets disconnected and restarted. The >>>>>>> two >>>>>>> remaining nodes will be able to commit the proposed configuration, >>>>>>> however, >>>>>>> the third node will restart with an empty peers list, so it will require >>>>>>> another addGroup request to join the cluster. Or am I missing something? >>>>>>> >>>>>>> Thank you, >>>>>>> Alexey >>>>>>> >>>>>>
