On Tue, Aug 1, 2017 at 9:21 PM, Numan Siddique <nusid...@redhat.com> wrote:
>
>
> On Wed, Aug 2, 2017 at 1:18 AM, Russell Bryant <russ...@ovn.org> wrote:
>>
>> On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou <zhou...@gmail.com> wrote:
>> >
>> >
>> > On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant <russ...@ovn.org> wrote:
>> >>
>> >> Add native support for active-standby HA in ovn-northd by having each
>> >> instance attempt to acquire an OVSDB lock.  Only the instance of
>> >> ovn-northd that currently holds the lock will make active changes to
>> >> the OVN databases.
>> >>
>> >> Signed-off-by: Russell Bryant <russ...@ovn.org>
>> >> ---
>> >>  NEWS                        |  1 +
>> >>  ovn/northd/ovn-northd.8.xml |  9 +++++++++
>> >>  ovn/northd/ovn-northd.c     | 40
>> >> +++++++++++++++++++++++++++++++---------
>> >>  3 files changed, 41 insertions(+), 9 deletions(-)
>> >>
>> >> diff --git a/NEWS b/NEWS
>> >> index facea0228..f3cdd2443 100644
>> >> --- a/NEWS
>> >> +++ b/NEWS
>> >> @@ -49,6 +49,7 @@ Post-v2.7.0
>> >>         one chassis is specified, OVN will manage high availability for
>> >> that
>> >>         gateway.
>> >>       * Add support for ACL logging.
>> >> +     * ovn-northd now has native support for active-standby high
>> >> availability.
>> >>     - Tracing with ofproto/trace now traces through recirculation.
>> >>     - OVSDB:
>> >>       * New support for role-based access control (see
>> >> ovsdb-server(1)).
>> >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> >> index 1527e8a60..0d85ec0d2 100644
>> >> --- a/ovn/northd/ovn-northd.8.xml
>> >> +++ b/ovn/northd/ovn-northd.8.xml
>> >> @@ -72,6 +72,15 @@
>> >>        </dl>
>> >>      </p>
>> >>
>> >> +    <h1>Active-Standby for High Availability</h1>
>> >> +    <p>
>> >> +      You may run <code>ovn-northd</code> more than once in an OVN
>> >> deployment.
>> >> +      OVN will automatically ensure that only one of them is active at
>> >> a
>> >> time.
>> >> +      If multiple instances of <code>ovn-northd</code> are running and
>> >> the
>> >> +      active <code>ovn-northd</code> fails, one of the hot standby
>> >> instances
>> >> +      of <code>ovn-northd</code> will automatically take over.
>> >> +    </p>
>> >> +
>> >>      <h1>Logical Flow Table Structure</h1>
>> >>
>> >>      <p>
>> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> >> index 10e0c7ce0..3d2be4267 100644
>> >> --- a/ovn/northd/ovn-northd.c
>> >> +++ b/ovn/northd/ovn-northd.c
>> >> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
>> >>      ovsdb_idl_add_column(ovnsb_idl_loop.idl,
>> >> &sbrec_chassis_col_nb_cfg);
>> >>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>> >>
>> >> +    /* Ensure that only a single ovn-northd is active in the
>> >> deployment
>> >> by
>> >> +     * acquiring a lock called "ovn_northd" on the southbound database
>> >> +     * and then only performing DB transactions if the lock is held.
>> >> */
>> >> +    ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
>> >> +    bool had_lock = false;
>> >> +
>> >>      /* Main loop. */
>> >>      exiting = false;
>> >>      while (!exiting) {
>> >> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
>> >>              .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>> >>          };
>> >>
>> >> -        struct chassis_index chassis_index;
>> >> -        chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> >> +        if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> >> +            VLOG_INFO("ovn-northd lock acquired. "
>> >> +                      "This ovn-northd instance is now active.");
>> >> +            had_lock = true;
>> >> +        } else if (had_lock &&
>> >> !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> >> +            VLOG_INFO("ovn-northd lock lost. "
>> >> +                      "This ovn-northd instance is now on standby.");
>> >
>> > Should it try lock again, if we want it to be standby? Otherwise, this
>> > instance won't have a chance to be active any more.
>>
>> Good question ... I was assuming this scenario was due to a lost
>> connection, and that the IDL would automatically try to re-acquire the
>> lock.
>>
>> I tested to make sure I saw a second ovn-northd go from standby to
>> active, but I have not tested active -> standby -> active again.
>>
>> I'll take a closer look at this before applying the patch.
>>
>
> I tested it and it works fine. active -> standby -> active scenario also
> works fine.
> I also tested by restarting southbound ovsdb-server. Once ovsdb-server is up
> again, the idl clients  try to
> acquire the lock and one of the ovn-northd instance becomes active again.
> I don't think it is required to try lock again as idl client takes care of
> it.
>
> How about starting another instance of ovn-northd in the sandbox/test
> environment so that active/standby
> scenario gets tested for all the ovn tests ?
>
> Acked-by: Numan Siddique <nusid...@redhat.com>

Thanks!  I updated the test suite to always run a backup ovn-northd to
help ensure it doesn't break anything.  I then applied this to master.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to