On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou <[email protected]> wrote:
>
>
> On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant <[email protected]> wrote:
>>
>> Add native support for active-standby HA in ovn-northd by having each
>> instance attempt to acquire an OVSDB lock.  Only the instance of
>> ovn-northd that currently holds the lock will make active changes to
>> the OVN databases.
>>
>> Signed-off-by: Russell Bryant <[email protected]>
>> ---
>>  NEWS                        |  1 +
>>  ovn/northd/ovn-northd.8.xml |  9 +++++++++
>>  ovn/northd/ovn-northd.c     | 40 +++++++++++++++++++++++++++++++---------
>>  3 files changed, 41 insertions(+), 9 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index facea0228..f3cdd2443 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -49,6 +49,7 @@ Post-v2.7.0
>>         one chassis is specified, OVN will manage high availability for
>> that
>>         gateway.
>>       * Add support for ACL logging.
>> +     * ovn-northd now has native support for active-standby high
>> availability.
>>     - Tracing with ofproto/trace now traces through recirculation.
>>     - OVSDB:
>>       * New support for role-based access control (see ovsdb-server(1)).
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 1527e8a60..0d85ec0d2 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -72,6 +72,15 @@
>>        </dl>
>>      </p>
>>
>> +    <h1>Active-Standby for High Availability</h1>
>> +    <p>
>> +      You may run <code>ovn-northd</code> more than once in an OVN
>> deployment.
>> +      OVN will automatically ensure that only one of them is active at a
>> time.
>> +      If multiple instances of <code>ovn-northd</code> are running and
>> the
>> +      active <code>ovn-northd</code> fails, one of the hot standby
>> instances
>> +      of <code>ovn-northd</code> will automatically take over.
>> +    </p>
>> +
>>      <h1>Logical Flow Table Structure</h1>
>>
>>      <p>
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 10e0c7ce0..3d2be4267 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
>>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>>
>> +    /* Ensure that only a single ovn-northd is active in the deployment
>> by
>> +     * acquiring a lock called "ovn_northd" on the southbound database
>> +     * and then only performing DB transactions if the lock is held. */
>> +    ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
>> +    bool had_lock = false;
>> +
>>      /* Main loop. */
>>      exiting = false;
>>      while (!exiting) {
>> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
>>              .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>>          };
>>
>> -        struct chassis_index chassis_index;
>> -        chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> +        if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> +            VLOG_INFO("ovn-northd lock acquired. "
>> +                      "This ovn-northd instance is now active.");
>> +            had_lock = true;
>> +        } else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> +            VLOG_INFO("ovn-northd lock lost. "
>> +                      "This ovn-northd instance is now on standby.");
>
> Should it try lock again, if we want it to be standby? Otherwise, this
> instance won't have a chance to be active any more.

Good question ... I was assuming this scenario was due to a lost
connection, and that the IDL would automatically try to re-acquire the
lock.

I tested to make sure I saw a second ovn-northd go from standby to
active, but I have not tested active -> standby -> active again.

I'll take a closer look at this before applying the patch.

>
>> +            had_lock = false;
>> +        }
>>
>> -        ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
>> -        ovnsb_db_run(&ctx, &ovnsb_idl_loop);
>> -        if (ctx.ovnsb_txn) {
>> -            check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>> -            check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
>> -            check_and_update_rbac(&ctx);
>> +        struct chassis_index chassis_index;
>> +        bool destroy_chassis_index = false;
>> +        if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> +            chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> +            destroy_chassis_index = true;
>> +
>> +            ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
>> +            ovnsb_db_run(&ctx, &ovnsb_idl_loop);
>> +            if (ctx.ovnsb_txn) {
>> +                check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>> +                check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
>> +                check_and_update_rbac(&ctx);
>> +            }
>>          }
>>
>>          unixctl_server_run(unixctl);
>> @@ -6565,7 +6585,9 @@ main(int argc, char *argv[])
>>              exiting = true;
>>          }
>>
>> -        chassis_index_destroy(&chassis_index);
>> +        if (destroy_chassis_index) {
>> +            chassis_index_destroy(&chassis_index);
>> +        }
>>      }
>>
>>      unixctl_server_destroy(unixctl);
>> --
>> 2.13.3
>>
>
> Acked-by: Han Zhou <[email protected]>

Thanks for the review!

-- 
Russell Bryant
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to