Re: [ovs-dev] [PATCH] ovsdb: raft: Support pre-vote mechanism to deal with disruptive server.

Ilya Maximets Wed, 12 Jul 2023 10:51:06 -0700

On 7/7/23 07:14, Han Zhou wrote:
> 
> 
> On Mon, Jul 3, 2023 at 9:46 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
>>
>> On 7/1/23 04:43, Han Zhou wrote:
>> >
>> >
>> > On Fri, Jun 30, 2023 at 4:30 PM Ilya Maximets <[email protected] 
>> > <mailto:[email protected]> <mailto:[email protected] 
>> > <mailto:[email protected]>>> wrote:
>> >>
>> >> On 6/25/23 19:35, Han Zhou wrote:
>> >> > When a server becomes unstable due to overloading or intermittent
>> >> > partitioning, it may miss some heartbeats and then starts election with
>> >> > a new term, which would disrupt the otherwise healthy cluster formed by
>> >> > the rest of the healthy nodes.  Such situation may exist for a long time
>> >> > until the "flapping" server is shutdown or recovered completely, which
>> >> > can severely impact the availability of the cluster. The pre-vote
>> >> > mechanism introduced in the raft paper section 9.6 can prevent such
>> >> > problems. This patch implements the pre-vote mechanism, in a way that is
>> >> > backward compatible and safe for upgrades.
>> >>
>> >> Thanks, Han!  This is definitely interesting.  I don't think that this
>> >> should be a solution for overloading though, overloading should be fixed
>> >> by, well, reducing the load or increasing the election timer.  But some
>> >> intermittent network issues is indeed a valid case.
>> >
>> > Sorry that I wasn't clear about "overloading". I was referring to the case 
>> > when the machine is getting overloaded, not necessarily because of the 
>> > ovsdb-server process but because of other applications/VMs sharing the 
>> > same physical resource, which is likely what we have hit in the production.
>> > You may argue it is an improper resource allocation, but from OVSDB point 
>> > of view, it is not an excuse for the whole RAFT cluster down. The pre-vote 
>> > can avoid such disaster, regardless of the factors of the problematic 
>> > server that we can't control.
>>
>> OK.  Makes sense.
>>
>> >
>> >>
>> >> I looked through the code and I don't see any significant problems right
>> >> away, but I would like to experiment with it a bit more next week, since
>> >> the voting logic is fairly complex in general.
>> >>
>> > Appreciated!
>> >
>> >> It's nice to see you found a use case for the stop-raft-rpc failure
>> >> test. :)
>> >>
>> >> See some comments inline.
>> >>
>> >> Best regards, Ilya Maximets.
>> >>
>> >> >
>> >> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]> 
>> >> > <mailto:[email protected] <mailto:[email protected]>>>
>> >> > ---
>> >> >  ovsdb/raft-rpc.c       | 19 ++++++++-
>> >> >  ovsdb/raft-rpc.h       |  3 ++
>> >> >  ovsdb/raft.c           | 88 ++++++++++++++++++++++++++++++------------
>> >> >  tests/ovsdb-cluster.at <http://ovsdb-cluster.at> 
>> >> > <http://ovsdb-cluster.at <http://ovsdb-cluster.at>> | 42 
>> >> > ++++++++++++++++++++
>> >> >  4 files changed, 127 insertions(+), 25 deletions(-)
>> >> >
>> >> > diff --git a/ovsdb/raft-rpc.c b/ovsdb/raft-rpc.c
>> >> > index dd14d81091fc..f750ba897046 100644
>> >> > --- a/ovsdb/raft-rpc.c
>> >> > +++ b/ovsdb/raft-rpc.c
>> >> > @@ -283,6 +283,10 @@ raft_vote_request_to_jsonrpc(const struct 
>> >> > raft_vote_request *rq,
>> >> >          json_object_put(args, "leadership_transfer",
>> >> >                          json_boolean_create(true));
>> >> >      }
>> >> > +    if (rq->is_prevote) {
>> >> > +        json_object_put(args, "is_prevote",
>> >> > +                        json_boolean_create(true));
>> >> > +    }
>>
>> Hmm.  I don't think this is fully backward compatible.  If the other
>> server is the old one, the ovsdb_parser_finish() in the
>> raft_rpc_from_jsonrpc() will report an error, because not all the members
>> of the json message were consumed.  And hence the pre-vote will never
>> succeed in a cluster where upgraded servers do not have a quorum.
>>
>> Let's say we have a cluster of 5.  The leader + 2 old servers + 2 new.
>> Leader goes down.  2 new servers don't have a quorum and old ones do
>> not reply to pre-vote requests.  I suppose, one of the old ones will
>> have to initiate an election then, and the new ones will reply to the
>> actual vote.  So, one of the old servers can become a leader, but new
>> servers can not.  This is probably not an issue since the cluster can
>> still elect a leader.  What do you think?
>>
>> Maybe a clarification on that will be useful in the commit message.
>>
> 
> Thanks for pointing this out. I just checked the logs during an upgrade test, 
> and it indeed had the error in the old cluster node:
> 
> 2023-07-07T04:46:24.466Z|00070|raft|INFO|unix#6: syntax 
> "{"cluster":"a5cc9100-9d30-466a-b97c-55e72300b510","from":"7aaec107-2fe2-431e-a7f8-136f17ef9600","is_prevote":true,"last_log_index":9,"last_log_term":1,"term":1,"to":"b8ea2911-fcc3-4048-b140-f609918b7e67"}":
>  syntax error: Parsing raft vote_request RPC failed: Member 'is_prevote' is 
> present but not allowed here.
> 
> I didn't consider this regarding backward compatibility. I was mainly trying 
> to avoid crashing the old nodes by unknown message types, and didn't know 
> ovsdb rpc doesn't allow unused fields. But fortunately the node can continue 
> the work after this error, and just as you mentioned, the old node then 
> initiated the vote and the new one accepted it. So, maybe it is not a real 
> issue except the unpleasant error log. I will update the commit message, and 
> probably include this in NEWS as well.
> 
>>
>> >> >  }
>> >> >
>> >> >  static void
>> >> > @@ -294,6 +298,8 @@ raft_vote_request_from_jsonrpc(struct ovsdb_parser 
>> >> > *p,
>> >> >      rq->last_log_term = raft_parse_required_uint64(p, "last_log_term");
>> >> >      rq->leadership_transfer
>> >> >          = raft_parse_optional_boolean(p, "leadership_transfer") == 1;
>> >> > +    rq->is_prevote
>> >> > +        = raft_parse_optional_boolean(p, "is_prevote") == 1;
>> >> >  }
>> >> >
>> >> >  static void
>> >> > @@ -305,6 +311,9 @@ raft_format_vote_request(const struct 
>> >> > raft_vote_request *rq, struct ds *s)
>> >> >      if (rq->leadership_transfer) {
>> >> >          ds_put_cstr(s, " leadership_transfer=true");
>> >> >      }
>> >> > +    if (rq->is_prevote) {
>> >> > +        ds_put_cstr(s, " is_prevote=true");
>> >> > +    }
>> >> >  }
>> >> >
>> >> >  /* raft_vote_reply. */
>> >> > @@ -326,6 +335,9 @@ raft_vote_reply_to_jsonrpc(const struct 
>> >> > raft_vote_reply *rpy,
>> >> >  {
>> >> >      raft_put_uint64(args, "term", rpy->term);
>> >> >      json_object_put_format(args, "vote", UUID_FMT, 
>> >> > UUID_ARGS(&rpy->vote));
>> >> > +    if (rpy->is_prevote) {
>> >> > +        json_object_put(args, "is_prevote", json_boolean_create(true));
>> >> > +    }
>> >> >  }
>> >> >
>> >> >  static void
>> >> > @@ -334,6 +346,7 @@ raft_vote_reply_from_jsonrpc(struct ovsdb_parser *p,
>> >> >  {
>> >> >      rpy->term = raft_parse_required_uint64(p, "term");
>> >> >      rpy->vote = raft_parse_required_uuid(p, "vote");
>> >> > +    rpy->is_prevote = raft_parse_optional_boolean(p, "is_prevote") == 
>> >> > 1;
>> >> >  }
>> >> >
>> >> >  static void
>> >> > @@ -341,6 +354,9 @@ raft_format_vote_reply(const struct raft_vote_reply 
>> >> > *rpy, struct ds *s)
>> >> >  {
>> >> >      ds_put_format(s, " term=%"PRIu64, rpy->term);
>> >> >      ds_put_format(s, " vote="SID_FMT, SID_ARGS(&rpy->vote));
>> >> > +    if (rpy->is_prevote) {
>> >> > +        ds_put_cstr(s, " is_prevote=true");
>> >> > +    }
>> >> >  }
>> >> >
>> >> >  /* raft_add_server_request */
>> >> > @@ -1008,7 +1024,8 @@ raft_rpc_get_vote(const union raft_rpc *rpc)
>> >> >          return NULL;
>> >> >
>> >> >      case RAFT_RPC_VOTE_REPLY:
>> >> > -        return &raft_vote_reply_cast(rpc)->vote;
>> >> > +        return raft_vote_reply_cast(rpc)->is_prevote ? NULL :
>> >> > +            &raft_vote_reply_cast(rpc)->vote;
>> >>
>> >> It might be better to create a pointer variable here and access it
>> >> instead of casting twice.
>> >>
>> > That was my first attempt, but I can't define a local variable under the 
>> > 'case' statement unless I add a pair of '{}' for the whole block, which 
>> > looks ugly :(
>> > I will try to find if there are similar patterns in the code base.
>>
>> We have a few places with braces around the case body, it should be
>> fine.  You may also define a pointer at the top of the function
>> and assign it in the case.  RAFT_RPC_VOTE_REPLY is the main use-case
>> for this function, so having a pointer for it declared at the top
>> should be fine as well.
>>
> 
> Ack
> 
>> >
>> >> >
>> >> >      default:
>> >> >          OVS_NOT_REACHED();
>> >> > diff --git a/ovsdb/raft-rpc.h b/ovsdb/raft-rpc.h
>> >> > index 221f24d00128..10f30618e3e0 100644
>> >> > --- a/ovsdb/raft-rpc.h
>> >> > +++ b/ovsdb/raft-rpc.h
>> >> > @@ -125,12 +125,15 @@ struct raft_vote_request {
>> >> >      uint64_t last_log_index; /* Index of candidate's last log entry. */
>> >> >      uint64_t last_log_term;  /* Term of candidate's last log entry. */
>> >> >      bool leadership_transfer;  /* True to override minimum election 
>> >> > timeout. */
>> >> > +    bool is_prevote;         /* True: pre-vote; False: real vote 
>> >> > (default). */
>> >> >  };
>> >> >
>> >> >  struct raft_vote_reply {
>> >> >      struct raft_rpc_common common;
>> >> >      uint64_t term;          /* Current term, for candidate to update 
>> >> > itself. */
>> >> >      struct uuid vote;       /* Server ID of vote. */
>> >> > +    bool is_prevote;        /* Copy the is_prevote from the request, 
>> >> > primarily
>> >> > +                               for validation. */
>> >>
>> >> Each comment line should start with a '*'.
>> >>
>> >> "Copy of the ..." ?
>> >>
>> >
>> > Ack.
>> >
>> >> >  };
>> >> >
>> >> >  struct raft_add_server_request {
>> >> > diff --git a/ovsdb/raft.c b/ovsdb/raft.c
>> >> > index b2361b1737a2..4d7a3f112cad 100644
>> >> > --- a/ovsdb/raft.c
>> >> > +++ b/ovsdb/raft.c
>> >> > @@ -305,6 +305,11 @@ struct raft {
>> >> >
>> >> >      /* Candidates only.  Reinitialized at start of election. */
>> >> >      int n_votes;                /* Number of votes for me. */
>> >> > +    bool prevote_passed;        /* Indicates if it passed pre-vote 
>> >> > phase.
>> >> > +                                   Pre-vote mechanism is introduced in 
>> >> > raft
>> >> > +                                   paper section 9.6. We implement it 
>> >> > as a
>> >> > +                                   sub-state of candidate to minize 
>> >> > the change
>> >> > +                                   and keep backward compatibility. */
>> >>
>> >> Each comment line should start with a '*'.
>> >>
>> >> And s/minize/minimize/.
>> >>
>> > Ack.
>> >
>> >> >
>> >> >      /* Followers and candidates only. */
>> >> >      bool candidate_retrying;    /* The earlier election timed-out and 
>> >> > we are
>> >> > @@ -372,7 +377,8 @@ static void raft_become_follower(struct raft *);
>> >> >  static void raft_reset_election_timer(struct raft *);
>> >> >  static void raft_reset_ping_timer(struct raft *);
>> >> >  static void raft_send_heartbeats(struct raft *);
>> >> > -static void raft_start_election(struct raft *, bool 
>> >> > leadership_transfer);
>> >> > +static void raft_start_election(struct raft *, bool is_prevote,
>> >> > +                                bool leadership_transfer);
>> >> >  static bool raft_truncate(struct raft *, uint64_t new_end);
>> >> >  static void raft_get_servers_from_log(struct raft *, enum vlog_level);
>> >> >  static void raft_get_election_timer_from_log(struct raft *);
>> >> > @@ -1069,7 +1075,8 @@ raft_open(struct ovsdb_log *log, struct raft 
>> >> > **raftp)
>> >> >          /* If there's only one server, start an election right away so 
>> >> > that the
>> >> >           * cluster bootstraps quickly. */
>> >> >          if (hmap_count(&raft->servers) == 1) {
>> >> > -            raft_start_election(raft, false);
>> >> > +            /* No pre-vote needed since we are the only one. */
>> >> > +            raft_start_election(raft, false, false);
>> >> >          }
>> >> >      } else {
>> >> >          raft->join_timeout = time_msec() + 1000;
>> >> > @@ -1360,7 +1367,7 @@ void
>> >> >  raft_take_leadership(struct raft *raft)
>> >> >  {
>> >> >      if (raft->role != RAFT_LEADER) {
>> >> > -        raft_start_election(raft, true);
>> >> > +        raft_start_election(raft, false, true);
>> >> >      }
>> >> >  }
>> >> >
>> >> > @@ -1766,12 +1773,12 @@ raft_set_term(struct raft *raft, uint64_t term, 
>> >> > const struct uuid *vote)
>> >> >      return true;
>> >> >  }
>> >> >
>> >> > -static void
>> >> > +static bool
>> >> >  raft_accept_vote(struct raft *raft, struct raft_server *s,
>> >> >                   const struct uuid *vote)
>> >> >  {
>> >> >      if (uuid_equals(&s->vote, vote)) {
>> >> > -        return;
>> >> > +        return false;
>> >> >      }
>> >> >      if (!uuid_is_zero(&s->vote)) {
>> >> >          static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>> >> > @@ -1785,13 +1792,18 @@ raft_accept_vote(struct raft *raft, struct 
>> >> > raft_server *s,
>> >> >      s->vote = *vote;
>> >> >      if (uuid_equals(vote, &raft->sid)
>> >> >          && ++raft->n_votes > hmap_count(&raft->servers) / 2) {
>> >> > -        raft_become_leader(raft);
>> >> > +        return true;
>> >> >      }
>> >> > +    return false;
>> >> >  }
>> >> >
>> >> >  static void
>> >> > -raft_start_election(struct raft *raft, bool leadership_transfer)
>> >> > +raft_start_election(struct raft *raft, bool is_prevote,
>> >> > +                    bool leadership_transfer)
>> >> >  {
>> >> > +    /* Leadership transfer doesn't use prevote. */
>> >> > +    ovs_assert(!is_prevote || !leadership_transfer);
>> >> > +
>> >> >      if (raft->leaving) {
>> >> >          return;
>> >> >      }
>> >> > @@ -1801,7 +1813,7 @@ raft_start_election(struct raft *raft, bool 
>> >> > leadership_transfer)
>> >> >          return;
>> >> >      }
>> >> >
>> >> > -    if (!raft_set_term(raft, raft->term + 1, &raft->sid)) {
>> >> > +    if (!is_prevote && !raft_set_term(raft, raft->term + 1, 
>> >> > &raft->sid)) {
>> >> >          return;
>> >> >      }
>> >> >
>> >> > @@ -1809,26 +1821,33 @@ raft_start_election(struct raft *raft, bool 
>> >> > leadership_transfer)
>> >> >
>> >> >      raft->leader_sid = UUID_ZERO;
>> >> >      raft->role = RAFT_CANDIDATE;
>> >> > -    /* If there was no leader elected since last election, we know we 
>> >> > are
>> >> > -     * retrying now. */
>> >> > -    raft->candidate_retrying = !raft->had_leader;
>> >> > -    raft->had_leader = false;
>> >> > +    raft->prevote_passed = !is_prevote;
>> >> > +
>> >> > +    if (is_prevote || leadership_transfer) {
>> >> > +        /* If there was no leader elected since last election, we know 
>> >> > we are
>> >> > +         * retrying now. */
>> >> > +        raft->candidate_retrying = !raft->had_leader;
>> >> > +        raft->had_leader = false;
>> >> > +
>> >> > +        raft->election_start = time_msec();
>> >> > +        raft->election_won = 0;
>> >> > +    }
>> >> >
>> >> >      raft->n_votes = 0;
>> >> >
>> >> > -    raft->election_start = time_msec();
>> >> > -    raft->election_won = 0;
>> >> >      raft->leadership_transfer = leadership_transfer;
>> >> >
>> >> >      static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>> >> >      if (!VLOG_DROP_INFO(&rl)) {
>> >> >          long long int now = time_msec();
>> >> > +        char *comment = is_prevote ? "prevote" : "vote";
>> >>
>> >> Should user-visible logs say "pre-vote" instead of "prevote" ?
>> >
>> > I am ok either way. I was thinking about the situation when searching the 
>> > logs with \<vote\> in vim, but it shouldn't matter much :) I can change it 
>> > if pre-vote looks better to users.
>>
>> It was just a bit confusing that all the comments use 'pre-vote',
>> but the log entry is using 'prevote'.
>>
>> An alternative might be to use "starting election" for an actual
>> vote and "starting preliminary election" for a pre-vote.
>>
>> We may also use 'preliminary': true/false, instead of "is_prevote".
>> Might be more user-friendly.
>>
>> 'prevote_passed' may be turned into 'preliminary_vote_passed'.
>> It's on the edge of being too long of a name, not sure what would
>> be a better one though.
>>
>> What do you think?
> 
> I'd prefer "pre-vote" since it is the terminology used in the RAFT paper. I 
> will update the log to "pre-vote" in v2.
> 
>>
>> >>
>> >> >          if (now >= raft->election_timeout) {
>> >> >              VLOG_INFO("term %"PRIu64": %lld ms timeout expired, "
>> >> > -                      "starting election",
>> >> > -                      raft->term, now - raft->election_base);
>> >> > +                      "starting election (%s)",
>> >> > +                      raft->term, now - raft->election_base, comment);
>> >> >          } else {
>> >> > -            VLOG_INFO("term %"PRIu64": starting election", raft->term);
>> >> > +            VLOG_INFO("term %"PRIu64": starting election (%s)",
>> >> > +                      raft->term, comment);
>> >> >          }
>> >> >      }
>> >> >      raft_reset_election_timer(raft);
>> >> > @@ -1853,6 +1872,7 @@ raft_start_election(struct raft *raft, bool 
>> >> > leadership_transfer)
>> >> >                      ? raft->entries[raft->log_end - raft->log_start - 
>> >> > 1].term
>> >> >                      : raft->snap.term),
>> >> >                  .leadership_transfer = leadership_transfer,
>> >> > +                .is_prevote = is_prevote,
>> >> >              },
>> >> >          };
>> >> >          if (failure_test != FT_DONT_SEND_VOTE_REQUEST) {
>> >> > @@ -1861,7 +1881,13 @@ raft_start_election(struct raft *raft, bool 
>> >> > leadership_transfer)
>> >> >      }
>> >> >
>> >> >      /* Vote for ourselves. */
>> >> > -    raft_accept_vote(raft, me, &raft->sid);
>> >> > +    if (raft_accept_vote(raft, me, &raft->sid)) {
>> >> > +        /* We just started vote, so it shouldn't be accepted yet 
>> >> > unless this is
>> >> > +         * a one-node cluster. In such case we don't do pre-vote, and 
>> >> > become
>> >> > +         * leader immediately. */
>> >> > +        ovs_assert(!is_prevote);
>> >> > +        raft_become_leader(raft);
>> >> > +    }
>> >> >  }
>> >> >
>> >> >  static void
>> >> > @@ -2041,10 +2067,10 @@ raft_run(struct raft *raft)
>> >> >                  raft_reset_election_timer(raft);
>> >> >              } else {
>> >> >                  raft_become_follower(raft);
>> >> > -                raft_start_election(raft, false);
>> >> > +                raft_start_election(raft, true, false);
>> >> >              }
>> >> >          } else {
>> >> > -            raft_start_election(raft, false);
>> >> > +            raft_start_election(raft, true, false);
>> >> >          }
>> >> >
>> >> >      }
>> >> > @@ -3673,6 +3699,10 @@ raft_handle_vote_request__(struct raft *raft,
>> >> >          return false;
>> >> >      }
>> >> >
>> >> > +    if (rq->is_prevote) {
>> >> > +        return true;
>> >> > +    }
>> >> > +
>> >> >      /* Record a vote for the peer. */
>> >> >      if (!raft_set_term(raft, raft->term, &rq->common.sid)) {
>> >> >          return false;
>> >> > @@ -3685,7 +3715,7 @@ raft_handle_vote_request__(struct raft *raft,
>> >> >
>> >> >  static void
>> >> >  raft_send_vote_reply(struct raft *raft, const struct uuid *dst,
>> >> > -                     const struct uuid *vote)
>> >> > +                     const struct uuid *vote, bool is_prevote)
>> >> >  {
>> >> >      union raft_rpc rpy = {
>> >> >          .vote_reply = {
>> >> > @@ -3695,6 +3725,7 @@ raft_send_vote_reply(struct raft *raft, const 
>> >> > struct uuid *dst,
>> >> >              },
>> >> >              .term = raft->term,
>> >> >              .vote = *vote,
>> >> > +            .is_prevote = is_prevote,
>> >> >          },
>> >> >      };
>> >> >      raft_send(raft, &rpy);
>> >> > @@ -3705,7 +3736,9 @@ raft_handle_vote_request(struct raft *raft,
>> >> >                           const struct raft_vote_request *rq)
>> >> >  {
>> >> >      if (raft_handle_vote_request__(raft, rq)) {
>> >> > -        raft_send_vote_reply(raft, &rq->common.sid, &raft->vote);
>> >> > +        raft_send_vote_reply(raft, &rq->common.sid,
>> >> > +                             rq->is_prevote ? &rq->common.sid : 
>> >> > &raft->vote,
>> >> > +                             rq->is_prevote);
>> >> >      }
>> >> >  }
>> >> >
>> >> > @@ -3723,7 +3756,14 @@ raft_handle_vote_reply(struct raft *raft,
>> >> >
>> >> >      struct raft_server *s = raft_find_peer(raft, &rpy->common.sid);
>> >> >      if (s) {
>> >> > -        raft_accept_vote(raft, s, &rpy->vote);
>> >> > +        if (raft_accept_vote(raft, s, &rpy->vote)) {
>> >>
>> >> IIUC, the raft_accept_vote() here can't succeed for pre-vote because
>> >> everyone votes for themselves on pre-vote.  Is that correct?
>> >>
>> > pre-vote happens the same way as the real vote, except that pre-vote 
>> > doesn't increase the term.
>> > This function is for the candidate to handle a vote-reply from a peer for 
>> > a vote/pre-vote request sent by itself.
>> > So, raft_accept_vote should succeed in normal cases, for both pre-vote and 
>> > vote.
>> > If it is pre-vote and it is successful, it starts the real vote 
>> > immediately.
>> >
>> >> What happens if we received a real vote during a pre-vote phase for
>> >> some reason?  Will this logic fail?  Is this scenario possible?
>> >
>> > The logic of handling the vote/pre-vote reply message only depends on the 
>> > status of the server itself, regardless of the message's is_prevote field. 
>> > So we don't really need to care if it is a pre-vote reply or vote reply. 
>> > The is_prevote field in the reply was added only for validation purpose, 
>> > not useful for the RPC itself, as mentioned in the comments.
>>
>> Oh, thanks for explanation, I misread the raft_handle_vote_request()
>> the first time.  I'll take a closer look.
>>
> No problem. Thanks for your review. Do you have other comments? If not, I 
> will send v2 with all the comments addressed.


I don't have any further comments at the moment,
feel free to send a new version.

Best regards, Ilya Maximets.

> 
> Regards,
> Han
> 
>> >
>> > Thanks,
>> > Han
>> >>
>> >> > +            if (raft->prevote_passed) {
>> >> > +                raft_become_leader(raft);
>> >> > +            } else {
>> >> > +                /* Start the real election. */
>> >> > +                raft_start_election(raft, false, false);
>> >> > +            }
>> >> > +        }
>> >> >      }
>> >> >  }
>> >> >
>> >> > @@ -4357,7 +4397,7 @@ raft_handle_become_leader(struct raft *raft,
>> >> >          VLOG_INFO("received leadership transfer from %s in term 
>> >> > %"PRIu64,
>> >> >                    raft_get_nickname(raft, &rq->common.sid, buf, sizeof 
>> >> > buf),
>> >> >                    rq->term);
>> >> > -        raft_start_election(raft, true);
>> >> > +        raft_start_election(raft, false, true);
>> >> >      }
>> >> >  }
>> >> >
>> >> > diff --git a/tests/ovsdb-cluster.at <http://ovsdb-cluster.at> 
>> >> > <http://ovsdb-cluster.at <http://ovsdb-cluster.at>> 
>> >> > b/tests/ovsdb-cluster.at <http://ovsdb-cluster.at> 
>> >> > <http://ovsdb-cluster.at <http://ovsdb-cluster.at>>
>> >> > index 9fbf5dc897f2..8a81136999e0 100644
>> >> > --- a/tests/ovsdb-cluster.at <http://ovsdb-cluster.at> 
>> >> > <http://ovsdb-cluster.at <http://ovsdb-cluster.at>>
>> >> > +++ b/tests/ovsdb-cluster.at <http://ovsdb-cluster.at> 
>> >> > <http://ovsdb-cluster.at <http://ovsdb-cluster.at>>
>> >> > @@ -715,6 +715,48 @@ done
>> >> >
>> >> >  AT_CLEANUP
>> >> >
>> >> > +
>> >> > +AT_SETUP([OVSDB cluster - disruptive server])
>> >> > +AT_KEYWORDS([ovsdb server negative unix cluster disruptive])
>> >> > +
>> >> > +n=3
>> >> > +schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema`
>> >> > +ordinal_schema > schema
>> >> > +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db 
>> >> > $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr])
>> >> > +cid=`ovsdb-tool db-cid s1.db`
>> >> > +schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema`
>> >> > +for i in `seq 2 $n`; do
>> >> > +    AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name 
>> >> > unix:s$i.raft unix:s1.raft])
>> >> > +done
>> >> > +
>> >> > +on_exit 'kill `cat *.pid`'
>> >> > +for i in `seq $n`; do
>> >> > +    AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off --detach 
>> >> > --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i 
>> >> > --remote=punix:s$i.ovsdb s$i.db])
>> >> > +done
>> >> > +for i in `seq $n`; do
>> >> > +    AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
>> >> > +done
>> >> > +
>> >> > +# An unstable follower shouldn't disrupt the healthy cluster - 
>> >> > shouldn't
>> >> > +# trigger term change.
>> >> > +AT_CHECK([ovs-appctl -t "`pwd`"/s2 cluster/failure-test 
>> >> > stop-raft-rpc], [0], [ignore])
>> >> > +OVS_WAIT_UNTIL([ovs-appctl -t "`pwd`"/s2 cluster/status $schema_name | 
>> >> > grep "Role: candidate"])
>> >> > +AT_CHECK([ovs-appctl -t "`pwd`"/s2 cluster/failure-test clear], [0], 
>> >> > [ignore])
>> >> > +
>> >> > +# Should step back to follower.
>> >> > +OVS_WAIT_UNTIL([ovs-appctl -t "`pwd`"/s2 cluster/status $schema_name | 
>> >> > grep "Role: follower"])
>> >> > +
>> >> > +# No term change.
>> >> > +for i in `seq $n`; do
>> >> > +    AT_CHECK([ovs-appctl -t "`pwd`"/s$i cluster/status $schema_name | 
>> >> > grep "Term: 1"], [0], [ignore])
>> >> > +done
>> >> > +
>> >> > +for i in `seq $n`; do
>> >> > +    OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
>> >> > +done
>> >> > +
>> >> > +AT_CLEANUP
>> >> > +
>> >> >
>> >> >  AT_BANNER([OVSDB - cluster tests])
>> >> >
>> >>
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovsdb: raft: Support pre-vote mechanism to deal with disruptive server.

Reply via email to