Hi Ben, The issues are sporadic. I got prometheus ovn_exporter running to capture relevant RAFT metrics. Once I see them reappearing, I will post log entries around the time when a cluster is in Leader, Candidate, Follower states. (During normal operation, it is Leader, and 2 Followers).
Best Regards, Paul Greenberg ________________________________ From: Ben Pfaff <[email protected]> Sent: Monday, November 5, 2018 4:21 PM To: Paul Greenberg Cc: Yifeng Sun; ovs dev Subject: Re: [ovs-dev] [PATCH] ovsdb: Clarify that a server that leaves a cluster may never rejoin. On Fri, Nov 02, 2018 at 07:08:34PM +0000, Paul Greenberg wrote: > Let me clarify. Based on my observation, once a server loses touch > with the rest of the cluster you have to rejoin it. > At the same time it is not readily apparent that you have to do the > clean up (removal) yourself. For example, you had a cluster of 3 > nodes, then after some time one node is out. You go to that node and > do a cluster join. My understanding is that a new server id gets > generated. > > Now, if you did not do a cleanup, you end up with 4 node cluster. When 3 out > of 4 are working, there is no issue. It is almost mandatory to do a cleanup > after join. The design intent is that, if a server goes out of contact with the rest of the cluster, and later comes back into contact, then it gracefully catches up and becomes a productive member of the cluster. It seems like you're encountering bugs that I don't understand yet. Can you help me to reproduce them? _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
