I tried a simple patch and it fixes the issue (see below). The question now is, do we want to do this? I think it makes sense to drop *all* the connections when the role changes but I'm curious to see what other people think:
diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c index 4dda63a..ddbbc2e 100644 --- a/ovsdb/jsonrpc-server.c +++ b/ovsdb/jsonrpc-server.c @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *svr, { if (svr->read_only != read_only) { svr->read_only = read_only; - ovsdb_jsonrpc_server_reconnect(svr, false, + ovsdb_jsonrpc_server_reconnect(svr, true, xstrdup(read_only ? "making server read-only" : "making server read/write")); $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) $ovn-nbctl ls-add sw0 $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status state: active $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server tcp:192.0.2.2:6641 $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status state: backup connecting: tcp:192.0.2.2:6641 $ ovn-nbctl ls-add sw1 ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez <dalva...@redhat.com> wrote: > > I *think* that it may not a bug in ovsdb-server but a problem with > ovn-controller as it doesn't seem to be a DB change aware client. > > When the role changes from master to backup or viceversa, connections > are expected to be reestablished for all clients except those that are > not aware of db changes [0] (note the 'false' argument). This flag is > explained here [1] and looks like since ovn-controller is not > monitoring the Database table in the _Server database, then the > connection with it is not re-established. This is just a blind guess > but I can give it a shot :) > > [0] > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368 > [1] > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456 > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique <nusid...@redhat.com> wrote: > > > > > > > > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez <dalva...@redhat.com> > > wrote: > >> > >> Hi folks, > >> > >> While working with an OpenStack environment running OVN and > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that > >> has been probably around for a long time. The bug itself seems to be > >> related with ovsdb-server not updating the read-only flag properly. > >> > >> With a 3 nodes cluster running ovsdb-server in active/passive mode, > >> when we restart the master-node, pacemaker promotes another node as > >> master and moves the associated IPAddr2 resource to it. > >> At this point, ovn-controller instances across the cloud reconnect to > >> the new node but there's a window where ovsdb-server is still running > >> as backup. > >> > >> For those ovn-controller instances that reconnect within that window, > >> every attempt to write in the OVSDB will fail with "operation not > >> allowed when database server is in read only mode". This state will > >> remain forever unless a reconnection is forced. Restarting > >> ovn-controller or killing the connection (for example with tcpkill) > >> will make things work again. > >> > >> A workaround in OVN OCF script could be to wait for the > >> ovsdb_server_promote function to wait until we get 'running/active' on > >> that instance. > >> > >> Another open question is what should clients (in this case, > >> ovn-controller) do in such situation? Shall they log an error and > >> attempt a reconnection (rate limited)? > > > > > > Thanks for reporting this issue Daniel. > > > > I can easily reproduce the issue with the below commands. > > > > $ <start the sandbox with --ovn > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > > $ovn-nbctl ls-add sw0 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: active > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > > tcp:192.0.2.2:6641 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: backup > > connecting: tcp:192.0.2.2:6641 > > $ovn-nbctl ls-add sw1 --> This should have failed. Since OVN_NB_DAEMON is > > set, ovn-nbctl talks to the > > ovn-nbctl daemon and it is able > > to create a logical switch even though the db is in backup mode > > $unset OVN_NB_DAEMON > > $ovn-nbctl ls-add sw2 > > ovn-nbctl: transaction error: {"details":"insert operation not allowed when > > database server is in read only mode","error":"not allowed"} > > > > > > I looked into the ovsdb-server code, when the user changes the state of the > > ovsdb-server, the read_only param of active ovsdb_server_sessions > > are not updated. > > > > Thanks > > Numan > > > >> > >> Thoughts? > >> > >> Thanks a lot, > >> Daniel > >> _______________________________________________ > >> discuss mailing list > >> disc...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss