Hi,
Currently I’m integrating Ratis as our consensus backbone in Apache IoTDB, and 
I encountered weird situation that cause the system into livelock:

My original configuration contains a single Group(1) with a single member(1), 
which is certainly the leader. Now I want to add a new member (follower 2) into 
this group, and I implement it as follows:

Client.getGroupManaApi(2).add(group(1));
Client.admin().setConfiguration([1,2]);

Then I observed event sequence which causes the livelock:
1. addGroup successes and follower 2 lifecycle is STARTING
2. Leader 1 send the latest snapshot to follower 2, which contains the **old 
conf [1]**
3. Follower 2 successfully install snapshot, discovered itself excluded in the 
conf, and turns the lifecycle into CLOSE
4. Leader 1 recv installSnapshot reply, add new conf [1,2] to the log and 
applies this conf
5. Since Follower 2 is closed, Leader 1 step down to follower for 
LOST_MAJORITY_HEARTBEATS, and this group can’t serve anymore.

Am I use the groupManagementApi or adminApi wrong? How can I solve this problem?


William Song
Apache IoTDB

Reply via email to