Hi, Could you see if this patch fixes your problem? https://patchwork.ozlabs.org/patch/1203951/
Thanks, Han On Mon, Dec 2, 2019 at 12:28 AM Han Zhou <[email protected]> wrote: > Sorry for the late reply. It was holiday here. > I didn't see such problem when there is no compaction. Did you see this > problem when DB compaction didn't happen? The difference is that after > compaction the RAFT log doesn't have any entries and all the data is in the > snapshot. > > On Fri, Nov 29, 2019 at 12:11 AM taoyunupt <[email protected]> wrote: > >> Hi,Han >> Hope to receive your reply. >> >> >> Thanks, >> Yun >> >> >> >> 在 2019-11-28 16:17:07,"taoyunupt" <[email protected]> 写道: >> >> Hi,Han >> Another question. NO COMPACT. If restart a follower , leader >> sender some entries during the break time, when it has started, if it also >> happend to this problem? What is the difference between simply restart and >> COMPACT with restart ? >> >> >> Thanks, >> Yun >> >> >> >> >> >> >> >> >> 在 2019-11-28 13:58:36,"taoyunupt" <[email protected]> 写道: >> >> Hi,Han >> Thanks for your reply. I think maybe we can disconnect the >> failed follower from the Haproxy then synchronize the date, after all >> completed, reconnect it to Haproxy again. But I do not know how to >> synchronize actually. >> It is just my naive idea. Do you have some suggestion about how >> to fix this problem. If not very completed, I wii have a try. >> >> >> Thanks >> Yun >> >> >> >> >> >> >> 在 2019-11-28 11:47:55,"Han Zhou" <[email protected]> 写道: >> >> >> >> On Wed, Nov 27, 2019 at 7:22 PM taoyunupt <[email protected]> wrote: >> > >> > Hi, >> > My OVN cluster has 3 OVN-northd nodes, They are proxied by Haproxy >> with a VIP. Recently, I restart OVN cluster frequently. One of the members >> report the logs below. >> > After read the code and paper of RAFT, it seems normal process ,If >> the follower does not find an entry in its log with the same index and >> term, then it refuses the new entries. >> > I think it's reasonable to refuse. But, as we could not control >> Haproxy or some proxy maybe, so it will happen error when an session >> assignate to the failed follower. >> > >> > Does have some means or ways to solve this problem. Maybe we can >> kick off the failed follower or disconnect it from the haproxy then >> synchronize the date ? Hope to hear your suggestion. >> > >> > >> > 2019-11-27T14:22:17.060Z|00240|raft|INFO|rejecting append_request >> because previous entry 1103,50975 not in local log (mismatch past end of >> log) >> > 2019-11-27T14:22:17.064Z|00241|raft|ERR|Dropped 34 log messages in last >> 12 seconds (most recently, 0 seconds ago) due to excessive rate >> > 2019-11-27T14:22:17.064Z|00242|raft|ERR|internal error: deferred >> append_reply message completed but not ready to send because message index >> 14890 is past last synced index 0: a2b2 append_reply "mismatch past end of >> log": term=1103 log_end=14891 result="inconsistency" >> > 2019-11-27T14:22:17.402Z|00243|raft|INFO|rejecting append_request >> because previous entry 1103,50975 not in local log (mismatch past end of >> log) >> > >> > >> > [root@ovn1 ~]# ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl >> cluster/status OVN_Southbound >> > a2b2 >> > Name: OVN_Southbound >> > Cluster ID: 4c54 (4c546513-77e3-4602-b211-2e200014ad79) >> > Server ID: a2b2 (a2b2a9c5-cf58-4724-8421-88fd5ca5d94d) >> > Address: tcp:10.254.8.209:6644 >> > Status: cluster member >> > Role: leader >> > Term: 1103 >> > Leader: self >> > Vote: self >> > >> > Log: [42052, 51009] >> > Entries not yet committed: 0 >> > Entries not yet applied: 0 >> > Connections: ->beaf ->9a33 <-9a33 <-beaf >> > Servers: >> > a2b2 (a2b2 at tcp:10.254.8.209:6644) (self) next_index=15199 >> match_index=51008 >> > beaf (beaf at tcp:10.254.8.208:6644) next_index=51009 match_index=0 >> > 9a33 (9a33 at tcp:10.254.8.210:6644) next_index=51009 >> match_index=51008 >> >> > >> >> >> I think it is a bug. I noticed that this problem happens when the cluster >> is restarted after DB compaction. I mentioned it in one of the test cases: >> https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L252 >> I also mentioned another problem related to compaction: >> https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L239 >> I was planning to debug these but didn't get the time yet. I will try to >> find some time next week (it would be great if you could figure it out and >> submit patches). >> >> >> >> Thanks, >> Han >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
