Hi,Han
Thanks for your reply. I think maybe we can disconnect the failed
follower from the Haproxy then synchronize the date, after all completed,
reconnect it to Haproxy again. But I do not know how to synchronize actually.
It is just my naive idea. Do you have some suggestion about how to fix
this problem. If not very completed, I wii have a try.
Thanks
Yun
在 2019-11-28 11:47:55,"Han Zhou" <[email protected]> 写道:
On Wed, Nov 27, 2019 at 7:22 PM taoyunupt <[email protected]> wrote:
>
> Hi,
> My OVN cluster has 3 OVN-northd nodes, They are proxied by Haproxy with a
> VIP. Recently, I restart OVN cluster frequently. One of the members report
> the logs below.
> After read the code and paper of RAFT, it seems normal process ,If the
> follower does not find an entry in its log with the same index and term, then
> it refuses the new entries.
> I think it's reasonable to refuse. But, as we could not control Haproxy
> or some proxy maybe, so it will happen error when an session assignate to the
> failed follower.
>
> Does have some means or ways to solve this problem. Maybe we can kick off
> the failed follower or disconnect it from the haproxy then synchronize the
> date ? Hope to hear your suggestion.
>
>
> 2019-11-27T14:22:17.060Z|00240|raft|INFO|rejecting append_request because
> previous entry 1103,50975 not in local log (mismatch past end of log)
> 2019-11-27T14:22:17.064Z|00241|raft|ERR|Dropped 34 log messages in last 12
> seconds (most recently, 0 seconds ago) due to excessive rate
> 2019-11-27T14:22:17.064Z|00242|raft|ERR|internal error: deferred append_reply
> message completed but not ready to send because message index 14890 is past
> last synced index 0: a2b2 append_reply "mismatch past end of log": term=1103
> log_end=14891 result="inconsistency"
> 2019-11-27T14:22:17.402Z|00243|raft|INFO|rejecting append_request because
> previous entry 1103,50975 not in local log (mismatch past end of log)
>
>
> [root@ovn1 ~]# ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl
> cluster/status OVN_Southbound
> a2b2
> Name: OVN_Southbound
> Cluster ID: 4c54 (4c546513-77e3-4602-b211-2e200014ad79)
> Server ID: a2b2 (a2b2a9c5-cf58-4724-8421-88fd5ca5d94d)
> Address: tcp:10.254.8.209:6644
> Status: cluster member
> Role: leader
> Term: 1103
> Leader: self
> Vote: self
>
> Log: [42052, 51009]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: ->beaf ->9a33 <-9a33 <-beaf
> Servers:
> a2b2 (a2b2 at tcp:10.254.8.209:6644) (self) next_index=15199
> match_index=51008
> beaf (beaf at tcp:10.254.8.208:6644) next_index=51009 match_index=0
> 9a33 (9a33 at tcp:10.254.8.210:6644) next_index=51009 match_index=51008
>
I think it is a bug. I noticed that this problem happens when the cluster is
restarted after DB compaction. I mentioned it in one of the test cases:
https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L252
I also mentioned another problem related to compaction:
https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L239
I was planning to debug these but didn't get the time yet. I will try to find
some time next week (it would be great if you could figure it out and submit
patches).
Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev