On Tue, Aug 1, 2017 at 9:21 PM, Numan Siddique <nusid...@redhat.com> wrote: > > > On Wed, Aug 2, 2017 at 1:18 AM, Russell Bryant <russ...@ovn.org> wrote: >> >> On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou <zhou...@gmail.com> wrote: >> > >> > >> > On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant <russ...@ovn.org> wrote: >> >> >> >> Add native support for active-standby HA in ovn-northd by having each >> >> instance attempt to acquire an OVSDB lock. Only the instance of >> >> ovn-northd that currently holds the lock will make active changes to >> >> the OVN databases. >> >> >> >> Signed-off-by: Russell Bryant <russ...@ovn.org> >> >> --- >> >> NEWS | 1 + >> >> ovn/northd/ovn-northd.8.xml | 9 +++++++++ >> >> ovn/northd/ovn-northd.c | 40 >> >> +++++++++++++++++++++++++++++++--------- >> >> 3 files changed, 41 insertions(+), 9 deletions(-) >> >> >> >> diff --git a/NEWS b/NEWS >> >> index facea0228..f3cdd2443 100644 >> >> --- a/NEWS >> >> +++ b/NEWS >> >> @@ -49,6 +49,7 @@ Post-v2.7.0 >> >> one chassis is specified, OVN will manage high availability for >> >> that >> >> gateway. >> >> * Add support for ACL logging. >> >> + * ovn-northd now has native support for active-standby high >> >> availability. >> >> - Tracing with ofproto/trace now traces through recirculation. >> >> - OVSDB: >> >> * New support for role-based access control (see >> >> ovsdb-server(1)). >> >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml >> >> index 1527e8a60..0d85ec0d2 100644 >> >> --- a/ovn/northd/ovn-northd.8.xml >> >> +++ b/ovn/northd/ovn-northd.8.xml >> >> @@ -72,6 +72,15 @@ >> >> </dl> >> >> </p> >> >> >> >> + <h1>Active-Standby for High Availability</h1> >> >> + <p> >> >> + You may run <code>ovn-northd</code> more than once in an OVN >> >> deployment. >> >> + OVN will automatically ensure that only one of them is active at >> >> a >> >> time. >> >> + If multiple instances of <code>ovn-northd</code> are running and >> >> the >> >> + active <code>ovn-northd</code> fails, one of the hot standby >> >> instances >> >> + of <code>ovn-northd</code> will automatically take over. >> >> + </p> >> >> + >> >> <h1>Logical Flow Table Structure</h1> >> >> >> >> <p> >> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >> >> index 10e0c7ce0..3d2be4267 100644 >> >> --- a/ovn/northd/ovn-northd.c >> >> +++ b/ovn/northd/ovn-northd.c >> >> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[]) >> >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, >> >> &sbrec_chassis_col_nb_cfg); >> >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name); >> >> >> >> + /* Ensure that only a single ovn-northd is active in the >> >> deployment >> >> by >> >> + * acquiring a lock called "ovn_northd" on the southbound database >> >> + * and then only performing DB transactions if the lock is held. >> >> */ >> >> + ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd"); >> >> + bool had_lock = false; >> >> + >> >> /* Main loop. */ >> >> exiting = false; >> >> while (!exiting) { >> >> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[]) >> >> .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), >> >> }; >> >> >> >> - struct chassis_index chassis_index; >> >> - chassis_index_init(&chassis_index, ctx.ovnsb_idl); >> >> + if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { >> >> + VLOG_INFO("ovn-northd lock acquired. " >> >> + "This ovn-northd instance is now active."); >> >> + had_lock = true; >> >> + } else if (had_lock && >> >> !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { >> >> + VLOG_INFO("ovn-northd lock lost. " >> >> + "This ovn-northd instance is now on standby."); >> > >> > Should it try lock again, if we want it to be standby? Otherwise, this >> > instance won't have a chance to be active any more. >> >> Good question ... I was assuming this scenario was due to a lost >> connection, and that the IDL would automatically try to re-acquire the >> lock. >> >> I tested to make sure I saw a second ovn-northd go from standby to >> active, but I have not tested active -> standby -> active again. >> >> I'll take a closer look at this before applying the patch. >> > > I tested it and it works fine. active -> standby -> active scenario also > works fine. > I also tested by restarting southbound ovsdb-server. Once ovsdb-server is up > again, the idl clients try to > acquire the lock and one of the ovn-northd instance becomes active again. > I don't think it is required to try lock again as idl client takes care of > it. > > How about starting another instance of ovn-northd in the sandbox/test > environment so that active/standby > scenario gets tested for all the ovn tests ? > > Acked-by: Numan Siddique <nusid...@redhat.com>
Thanks! I updated the test suite to always run a backup ovn-northd to help ensure it doesn't break anything. I then applied this to master. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev