To run the ovsdb-cluster tests, just: # make check-ovsdb-cluster You can also specify which test case to run: # make check-ovsdb-cluster TESTSUITEFLAGS="<test case number>"
You can list all cases by: # make check-ovsdb-cluster TESTSUITEFLAGS="--list" On Tue, Mar 10, 2020 at 1:13 AM txfh2007 <[email protected]> wrote: > Hi Han: > > Thanks for your kindly reply ! I have tried your patch and the > candidate problem is fixed on my env. Now my 3 nodes raft env works well. > Another question: I found you have submit ovsdb-cluster testsuite > also, how could I run these tests on my own setup ? > > Thanks > Timo > > ------------------------------------------------------------------ > > On Sat, Mar 7, 2020 at 2:33 AM txfh2007 <[email protected]> wrote: > > > > Hi Han: > > > > Thanks for your reply ! There is one point that I can't agree with > you: "If S2 or S3 already becomes leader, their term won't be lower than > S2. " In my test , in step 3, S3 is leader and its term is lower than S2. > The reason is when S2 disconnected from S1 and S3, S2 will add its term and > send vote req until its connection recovered. At the same time ,S3 becomes > leader and won't add its term. So it is possible that S2's term is larger > than S3's, and that's why in Step 3, S2 replies "stale term" to S3's > append entry request. > > Hi Timo, > > Sorry that my answer wasn't accurate enough and caused confusion. My > answer was focusing on the "candidate forever" scenario as you reported so > I didn't take the more common scenario (that a reconnected server can have > larger term) into account, but of course the more common scenario do exist. > Please see my rephrased answer below. and let me know if it solves the > confusion. > > Thanks, > Han > > > > Timo > > > > > > On Fri, Mar 6, 2020 at 1:13 AM txfh2007 via discuss < > [email protected]> wrote: > > > > > > Hi Han && all: > > > > > > I have a question about RAFT: I have tried the latest OVN-2.30, > and have found in some condition, there is one node whose role is always > "Candidate" (got by cluster/status cmd), but act as a Follower. My cluster > still works well, but it seems odd that a server's role is always > Candidate. As far as I know, server's role is normally Follower or Leader. > > > > > > Hi Timo, I happened to fix the problem yesterday and here is the patch: > https://patchwork.ozlabs.org/patch/1250116/. Details of my analysis is in > commit message and a test case is added to cover this scenario. > > > > > > > > > After digging into related code, I think I can try to describe how > to reproduce this scenario: > > > 1. It is three servers cluster: One Leader(S2), Two > followers(S1,S3) > > > 2. Try to disconnect Leader(S2) from other two servers,so S2 > would add term and send vote request, and meanwhile S1 and S3 would choose > a new Leader(Let's say it's S3) > > > > > > When S1 and S3 choose a new leader, they (one of them, or both) would > have to increase the term, too. > > > > > > > > > 3. Recover connection between S2 and other two nodes, then if > S2 receives append entry req from S3, as S3's term is lower, so S2 will > reply "stale term" > > > > > > If S2 or S3 already becomes leader, their term won't be lower than S2. > From this point on, the below steps shouldn't happen. But instead, it is > possible that when S2 receives append-request from the new leader, it has > the same term, and it updates the leader without switching from candidate > to follower, thus result in the candidate state forever. > > > > Rephrase: > > If S1 (not S2, sorry for the typo above) or S3 already becomes leader, it > is possible that their term is the same as the one of S2 when S2's > connection restored, and when S2 received append-request from the new > leader, because it observes the same term, it updates the leader without > switching from candidate to follower (which is a bug of the implementation, > and fixed in the patch I posted, which is merged yesterday), thus result in > the candidate state forever. In this situation, the candidate doesn't > increase term and initiate vote-request any more because it receives > append-request (heartbeat) regularly and responses, like a follower. The > only difference is that it announces itself as "disconnected from cluster" > to its clients, so all the clients will be disconnected from it. > > On the other hand, if S2's connection is restored after more election > timer timeouts, it's term can be larger than the new leader. In this case, > it won't trigger the "candidate forever" problem. Firstly, the candidate > will send vote-request with a larger term, but the new leader will reject > vote-request because it is leader itself, and the follower will also reject > the vote-request because of the logic of > "raft_should_suppress_disruptive_server()". However, the candidate will > receive append-request from the new leader, which has smaller term. It > replies append-reply with reason "stale term" but with the its own term > number. When the leader receives this reply, it sees a large term number > than its own, so it updates its term to the larger term and steps down as > follower, and then the cluster will start election again, which will end up > with one leader and two followers as usual. > > > > > > > > 4. After S3 gets S2's reply, S3 will change its term to S2's > value and change its role to follower and then candidate(at the same time , > S1/S2/S3 are all candidate role) > > > 5.Then if S2 got S3's vote request and vote for S3, S3 will > become new leader, but S2's role is still candidate > > If all 3 ended up as candidate in same term as mention in your step 4, > each of them only votes to themselves, and there won't be any leader > elected in that term and they will have to increase term (at random time) > and re-elect again. For my understanding the only chance that end up with a > candidate forever, is when 2 servers entered into candidate competing in > the *same term*. > > > > > > > I guess The reason is term of S3's vote request is equal to S2's > term, For S2, it will change to follower only if receiving vote request > whose term value is larger than it own . > > > Am I right? and the candidate role(but actually is a follower) is > reasonable ? > > > > > > Thanks > > > Timo > > > > > Hi Timo, > > > > > > > > > > > > >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
