I found and fixed some bugs in the Raft implementation. The patch
series is currently waiting for review:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=76115
On Fri, Nov 09, 2018 at 04:22:51PM +0000, Paul Greenberg wrote:
> Hi Ben,
>
> The issues are sporadic. I got prometheus ovn_exporter running to capture
> relevant RAFT metrics. Once I see them reappearing, I will post log entries
> around the time when a cluster is in Leader, Candidate, Follower states.
> (During normal operation, it is Leader, and 2 Followers).
>
> Best Regards,
> Paul Greenberg
>
> ________________________________
> From: Ben Pfaff <[email protected]>
> Sent: Monday, November 5, 2018 4:21 PM
> To: Paul Greenberg
> Cc: Yifeng Sun; ovs dev
> Subject: Re: [ovs-dev] [PATCH] ovsdb: Clarify that a server that leaves a
> cluster may never rejoin.
>
> On Fri, Nov 02, 2018 at 07:08:34PM +0000, Paul Greenberg wrote:
> > Let me clarify. Based on my observation, once a server loses touch
> > with the rest of the cluster you have to rejoin it.
> > At the same time it is not readily apparent that you have to do the
> > clean up (removal) yourself. For example, you had a cluster of 3
> > nodes, then after some time one node is out. You go to that node and
> > do a cluster join. My understanding is that a new server id gets
> > generated.
> >
> > Now, if you did not do a cleanup, you end up with 4 node cluster. When 3
> > out of 4 are working, there is no issue. It is almost mandatory to do a
> > cleanup after join.
>
> The design intent is that, if a server goes out of contact with the rest
> of the cluster, and later comes back into contact, then it gracefully
> catches up and becomes a productive member of the cluster.
>
> It seems like you're encountering bugs that I don't understand yet. Can
> you help me to reproduce them?
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev