On Wed, Aug 5, 2020 at 3:05 PM Han Zhou <[email protected]> wrote: > > > On Wed, Aug 5, 2020 at 12:51 PM Winson Wang <[email protected]> > wrote: > >> Hello OVN Experts: >> >> With large scale ovn-k8s cluster, there are several conditions that >> would make ovn-controller clients connect SB central from a balanced state >> to an unbalanced state. >> Is there an ongoing project to address this problem? >> If not, I have one proposal not sure if it is doable. >> Please share your thoughts. >> >> The issue: >> >> OVN SB RAFT 3 node cluster, at first all the ovn-controller clients will >> connect all the 3 nodes in a balanced state. >> >> The following conditions will make the connections become unbalanced. >> >> - >> >> One RAFT node restart, all the ovn-controller clients to reconnect >> to the two remaining cluster nodes. >> >> >> - >> >> Ovn-k8s, after SB raft pods rolling upgrade, the last raft pod has >> no client connections. >> >> >> RAFT clients in an unbalanced state would trigger more stress to the raft >> cluster, which makes the raft unstable under stress compared to a balanced >> state. >> The proposal solution: >> >> Ovn-controller adds next unix commands “reconnect” with argument of >> preferred SB node IP. >> >> When unbalanced state happens, the UNIX command can trigger >> ovn-controller reconnect >> >> To new SB raft node with fast sync which doesn’t trigger the whole DB >> downloading process. >> >> > Thanks Winson. The proposal sounds good to me. Will you implement it? >
Han/Winson, The fast re-sync is for ovsdb-server restart and it will not apply for ovn-controller restart, right? If the ovsdb-client (ovn-controller) restarts, then it would have lost all its state and when it starts again it will still need to download logical_flows, port_bindings , and other tables it cares about. So, fast re-sync may not apply to this case. Also, the ovn-controller should stash the IP address of the SB server to which it is connected to in Open_vSwitch table's external_id column. It updates this field whenever it re-connects to a different SB server (because that ovsdb-server instance failed or restarted). When ovn-controller itself restarts it could check for the value in this field and try to connect to it first and on failure fallback to connect to default connection approach. Regards, ~Girish > > Han > > > >> >> -- >> Winson >> >> -- >> You received this message because you are subscribed to the Google Groups >> "ovn-kubernetes" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com >> <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
