Hi folks, While working with an OpenStack environment running OVN and ovsdb-server in A/P configuration with Pacemaker we hit an issue that has been probably around for a long time. The bug itself seems to be related with ovsdb-server not updating the read-only flag properly.
With a 3 nodes cluster running ovsdb-server in active/passive mode, when we restart the master-node, pacemaker promotes another node as master and moves the associated IPAddr2 resource to it. At this point, ovn-controller instances across the cloud reconnect to the new node but there's a window where ovsdb-server is still running as backup. For those ovn-controller instances that reconnect within that window, every attempt to write in the OVSDB will fail with "operation not allowed when database server is in read only mode". This state will remain forever unless a reconnection is forced. Restarting ovn-controller or killing the connection (for example with tcpkill) will make things work again. A workaround in OVN OCF script could be to wait for the ovsdb_server_promote function to wait until we get 'running/active' on that instance. Another open question is what should clients (in this case, ovn-controller) do in such situation? Shall they log an error and attempt a reconnection (rate limited)? Thoughts? Thanks a lot, Daniel _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss