On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou <[email protected]> wrote:
>
>
> On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant <[email protected]> wrote:
>>
>> Add native support for active-standby HA in ovn-northd by having each
>> instance attempt to acquire an OVSDB lock. Only the instance of
>> ovn-northd that currently holds the lock will make active changes to
>> the OVN databases.
>>
>> Signed-off-by: Russell Bryant <[email protected]>
>> ---
>> NEWS | 1 +
>> ovn/northd/ovn-northd.8.xml | 9 +++++++++
>> ovn/northd/ovn-northd.c | 40 +++++++++++++++++++++++++++++++---------
>> 3 files changed, 41 insertions(+), 9 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index facea0228..f3cdd2443 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -49,6 +49,7 @@ Post-v2.7.0
>> one chassis is specified, OVN will manage high availability for
>> that
>> gateway.
>> * Add support for ACL logging.
>> + * ovn-northd now has native support for active-standby high
>> availability.
>> - Tracing with ofproto/trace now traces through recirculation.
>> - OVSDB:
>> * New support for role-based access control (see ovsdb-server(1)).
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 1527e8a60..0d85ec0d2 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -72,6 +72,15 @@
>> </dl>
>> </p>
>>
>> + <h1>Active-Standby for High Availability</h1>
>> + <p>
>> + You may run <code>ovn-northd</code> more than once in an OVN
>> deployment.
>> + OVN will automatically ensure that only one of them is active at a
>> time.
>> + If multiple instances of <code>ovn-northd</code> are running and
>> the
>> + active <code>ovn-northd</code> fails, one of the hot standby
>> instances
>> + of <code>ovn-northd</code> will automatically take over.
>> + </p>
>> +
>> <h1>Logical Flow Table Structure</h1>
>>
>> <p>
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 10e0c7ce0..3d2be4267 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
>> ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>> ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>>
>> + /* Ensure that only a single ovn-northd is active in the deployment
>> by
>> + * acquiring a lock called "ovn_northd" on the southbound database
>> + * and then only performing DB transactions if the lock is held. */
>> + ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
>> + bool had_lock = false;
>> +
>> /* Main loop. */
>> exiting = false;
>> while (!exiting) {
>> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
>> .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>> };
>>
>> - struct chassis_index chassis_index;
>> - chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> + if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> + VLOG_INFO("ovn-northd lock acquired. "
>> + "This ovn-northd instance is now active.");
>> + had_lock = true;
>> + } else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> + VLOG_INFO("ovn-northd lock lost. "
>> + "This ovn-northd instance is now on standby.");
>
> Should it try lock again, if we want it to be standby? Otherwise, this
> instance won't have a chance to be active any more.
Good question ... I was assuming this scenario was due to a lost
connection, and that the IDL would automatically try to re-acquire the
lock.
I tested to make sure I saw a second ovn-northd go from standby to
active, but I have not tested active -> standby -> active again.
I'll take a closer look at this before applying the patch.
>
>> + had_lock = false;
>> + }
>>
>> - ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
>> - ovnsb_db_run(&ctx, &ovnsb_idl_loop);
>> - if (ctx.ovnsb_txn) {
>> - check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>> - check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
>> - check_and_update_rbac(&ctx);
>> + struct chassis_index chassis_index;
>> + bool destroy_chassis_index = false;
>> + if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> + chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> + destroy_chassis_index = true;
>> +
>> + ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
>> + ovnsb_db_run(&ctx, &ovnsb_idl_loop);
>> + if (ctx.ovnsb_txn) {
>> + check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>> + check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
>> + check_and_update_rbac(&ctx);
>> + }
>> }
>>
>> unixctl_server_run(unixctl);
>> @@ -6565,7 +6585,9 @@ main(int argc, char *argv[])
>> exiting = true;
>> }
>>
>> - chassis_index_destroy(&chassis_index);
>> + if (destroy_chassis_index) {
>> + chassis_index_destroy(&chassis_index);
>> + }
>> }
>>
>> unixctl_server_destroy(unixctl);
>> --
>> 2.13.3
>>
>
> Acked-by: Han Zhou <[email protected]>
Thanks for the review!
--
Russell Bryant
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev