When OVN moves a port binding to another node, options:requested-chassis is updated to point to a new chassis, at which point northd translates the change into SB updates; then the old chassis winds down flows related to the binding, and the new chassis configures the binding in the new location. While it works, it takes some time and is perceived as temporary network downtime.
To reduce the downtime, this patch introduces a new option called 'migration-destination' that can be set on LS port, which makes the second chassis to pre-configure the port binding in the new location while the old binding still exists and serves the user. Then once everything is ready on the other side, the user may move their payload to the new location and, finally, wind down the original binding on the first chassis. We must guard the new binding location from prematurily communicating with the outside world. This is achieved by installing drop flows into tables 8 and 40 at the new location. These flows are removed when the port at the new location issues a RARP packet, which indicates that the payload has moved and is ready to serve from the new chassis. The packet is caught by a separate controller() action flow. The action handler then removes drop flows, as well as the rarp controller() action flow. The handler also tags the binding with options:migration-unblocked=true. Once this happens, the user may complete migration by unwinding the binding at the original location. This is achieved by setting options:requested-chassis to point to the new chassis, and unsetting options:migration-destination. Note: the design was inspired / re-invented based on previous discussions of the use scenario, e.g. found at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-March/329865.html https://etherpad.opendev.org/p/ovn_live_migration https://bugzilla.redhat.com/show_bug.cgi?id=2012179 This is expected to be utilizied by OpenStack Neutron for VM live migration. TODO: perhaps postpone RARP to when drop flows are deleted from the switch; then re-inject into pipeline. We can't just re-inject the packet with a continuation pin inside the unblock_migration handler because there's a lag between delete flow messages queued and applied by the switch. TODO: implement ddlog. Signed-off-by: Ihar Hrachyshka <[email protected]> -- v1: initial commit. --- controller/binding.c | 39 +++++---- controller/lport.c | 26 +++++- controller/ovn-controller.c | 9 ++ controller/physical.c | 67 +++++++++++++- controller/pinctrl.c | 153 +++++++++++++++++++++++++++++++- controller/vif-plug.c | 22 ++++- controller/vif-plug.h | 1 + include/ovn/actions.h | 14 +++ lib/actions.c | 37 ++++++++ northd/northd.c | 40 +++++++++ northd/ovn-northd.c | 5 +- northd/ovn_northd.dl | 24 ++++-- ovn-architecture.7.xml | 39 +++++++++ ovn-sb.ovsschema | 6 +- ovn-sb.xml | 24 ++++++ tests/ovn.at | 168 ++++++++++++++++++++++++++++++++++++ utilities/ovn-trace.c | 2 + 17 files changed, 642 insertions(+), 34 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 4d62b0858..7957cebdb 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -929,18 +929,23 @@ claim_lport(const struct sbrec_port_binding *pb, return false; } - if (pb->chassis) { - VLOG_INFO("Changing chassis for lport %s from %s to %s.", - pb->logical_port, pb->chassis->name, - chassis_rec->name); - } else { - VLOG_INFO("Claiming lport %s for this chassis.", pb->logical_port); - } - for (int i = 0; i < pb->n_mac; i++) { - VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); - } + /* Update chassis only when we don't migrate port to the chassis. */ + if (!pb->migration_destination || + strcmp(pb->migration_destination->name, chassis_rec->name)) { + if (pb->chassis) { + VLOG_INFO("Changing chassis for lport %s from %s to %s.", + pb->logical_port, pb->chassis->name, + chassis_rec->name); + } else { + VLOG_INFO("Claiming lport %s for this chassis.", + pb->logical_port); + } + for (int i = 0; i < pb->n_mac; i++) { + VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); + } - sbrec_port_binding_set_chassis(pb, chassis_rec); + sbrec_port_binding_set_chassis(pb, chassis_rec); + } if (tracked_datapaths) { update_lport_tracking(pb, tracked_datapaths, true); @@ -1094,14 +1099,16 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, /* We could, but can't claim the lport. */ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); VLOG_INFO_RL(&rl, - "Not claiming lport %s, chassis %s " - "requested-chassis %s", + "Not claiming lport %s, chassis %s, " + "requested-chassis %s, migration-destination %s", pb->logical_port, b_ctx_in->chassis_rec->name, pb->requested_chassis ? - pb->requested_chassis->name : "(option points at " - "non-existent " - "chassis)"); + pb->requested_chassis->name : "" + "(option points at non-existent chassis)", + pb->migration_destination ? + pb->migration_destination->name : "" + "(option points at non-existent chassis)"); } } diff --git a/controller/lport.c b/controller/lport.c index 5ad40f6d3..f013ad893 100644 --- a/controller/lport.c +++ b/controller/lport.c @@ -113,12 +113,13 @@ lport_can_bind_on_this_chassis(const struct sbrec_chassis *chassis_rec, const struct sbrec_port_binding *pb) { /* We need to check for presence of the requested-chassis option in - * addittion to checking the pb->requested_chassis column because this + * addition to checking the pb->requested_chassis column because this * column will be set to NULL whenever the option points to a non-existent * chassis. As the controller routinely clears its own chassis record this * might occur more often than one might think. */ const char *requested_chassis_option = smap_get(&pb->options, "requested-chassis"); + bool requested = false; if (requested_chassis_option && requested_chassis_option[0] && !pb->requested_chassis) { /* The requested-chassis option is set, but the requested_chassis @@ -126,11 +127,28 @@ lport_can_bind_on_this_chassis(const struct sbrec_chassis *chassis_rec, * points to is currently not running, or is in the process of starting * up. In this case we must fall back to comparing the strings to * avoid release/claim thrashing. */ - return !strcmp(requested_chassis_option, chassis_rec->name) + requested = !strcmp(requested_chassis_option, chassis_rec->name) || !strcmp(requested_chassis_option, chassis_rec->hostname); + } else { + requested = !requested_chassis_option || !requested_chassis_option[0] + || chassis_rec == pb->requested_chassis; + } + + /* Alternatively, the upcoming migration destination chassis may also bind + * the port. */ + if (!requested) { + const char *migration_destination_option = smap_get( + &pb->options, "migration-destination"); + if (migration_destination_option && migration_destination_option[0]) { + requested = ( + !strcmp(migration_destination_option, chassis_rec->name) || + !strcmp(migration_destination_option, chassis_rec->hostname) + ); + } else { + requested = chassis_rec == pb->migration_destination; + } } - return !requested_chassis_option || !requested_chassis_option[0] - || chassis_rec == pb->requested_chassis; + return requested; } const struct sbrec_datapath_binding * diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 5069aedfc..ccbf0bc6a 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -215,6 +215,9 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, sbrec_port_binding_add_clause_requested_chassis( &pb, OVSDB_F_EQ, &chassis->header_.uuid); + sbrec_port_binding_add_clause_migration_destination( + &pb, OVSDB_F_EQ, &chassis->header_.uuid); + /* Ensure that we find out about l2gateway and l3gateway ports that * should be present on this chassis. Otherwise, we might never find * out about those ports, if their datapaths don't otherwise have a VIF @@ -3129,6 +3132,10 @@ main(int argc, char *argv[]) struct ovsdb_idl_index *sbrec_port_binding_by_requested_chassis = ovsdb_idl_index_create1(ovnsb_idl_loop.idl, &sbrec_port_binding_col_requested_chassis); + struct ovsdb_idl_index *sbrec_port_binding_by_migration_destination + = ovsdb_idl_index_create1( + ovnsb_idl_loop.idl, + &sbrec_port_binding_col_migration_destination); struct ovsdb_idl_index *sbrec_datapath_binding_by_key = ovsdb_idl_index_create1(ovnsb_idl_loop.idl, &sbrec_datapath_binding_col_tunnel_key); @@ -3669,6 +3676,8 @@ main(int argc, char *argv[]) sbrec_port_binding_by_name, .sbrec_port_binding_by_requested_chassis = sbrec_port_binding_by_requested_chassis, + .sbrec_port_binding_by_migration_destination = + sbrec_port_binding_by_migration_destination, .ovsrec_port_by_interfaces = ovsrec_port_by_interfaces, .ovs_table = ovs_table, diff --git a/controller/physical.c b/controller/physical.c index 6bfa2304d..a51daac4e 100644 --- a/controller/physical.c +++ b/controller/physical.c @@ -40,6 +40,7 @@ #include "lib/mcast-group-index.h" #include "lib/ovn-sb-idl.h" #include "lib/ovn-util.h" +#include "ovn/actions.h" #include "physical.h" #include "openvswitch/shash.h" #include "simap.h" @@ -885,6 +886,68 @@ get_binding_peer(struct ovsdb_idl_index *sbrec_port_binding_by_name, return peer; } +static void +handle_migration_destination(const struct sbrec_port_binding *binding, + const struct sbrec_chassis *chassis, + struct ovn_desired_flow_table *flow_table, + struct ofpbuf *ofpacts_p) +{ + /* Block all traffic for the migrating port until it sends a RARP. */ + const char *migration_destination_option = smap_get( + &binding->options, "migration-destination"); + if (migration_destination_option && migration_destination_option[0] && + !strcmp(migration_destination_option, chassis->name)) { + if (!smap_get_bool(&binding->options, "migration-unblocked", false)) { + struct match match = MATCH_CATCHALL_INITIALIZER; + uint32_t dp_key = binding->datapath->tunnel_key; + uint32_t port_key = binding->tunnel_key; + + /* Unblock the port on ingress RARP. */ + match_set_metadata(&match, htonll(dp_key)); + match_set_dl_type(&match, htons(ETH_TYPE_RARP)); + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, port_key); + ofpbuf_clear(ofpacts_p); + + size_t ofs = ofpacts_p->size; + struct ofpact_controller *oc = ofpact_put_CONTROLLER(ofpacts_p); + oc->max_len = UINT16_MAX; + oc->reason = OFPR_ACTION; + oc->pause = true; + + struct action_header ah = { + .opcode = htonl(ACTION_OPCODE_UNBLOCK_MIGRATION) + }; + ofpbuf_put(ofpacts_p, &ah, sizeof ah); + + ofpacts_p->header = oc; + oc->userdata_len = ofpacts_p->size - (ofs + sizeof *oc); + ofpact_finish_CONTROLLER(ofpacts_p, &oc); + + ofctrl_add_flow(flow_table, OFTABLE_LOG_INGRESS_PIPELINE, 1010, + binding->header_.uuid.parts[0], + &match, ofpacts_p, &binding->header_.uuid); + ofpbuf_clear(ofpacts_p); + + /* Block all non-RARP traffic for the port, both directions. */ + match_init_catchall(&match); + match_set_metadata(&match, htonll(dp_key)); + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, port_key); + + ofctrl_add_flow(flow_table, OFTABLE_LOG_INGRESS_PIPELINE, 1000, + binding->header_.uuid.parts[0], + &match, ofpacts_p, &binding->header_.uuid); + + match_init_catchall(&match); + match_set_metadata(&match, htonll(dp_key)); + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key); + + ofctrl_add_flow(flow_table, OFTABLE_LOG_EGRESS_PIPELINE, 1000, + binding->header_.uuid.parts[0], + &match, ofpacts_p, &binding->header_.uuid); + } + } +} + static void consider_port_binding(struct ovsdb_idl_index *sbrec_port_binding_by_name, enum mf_field_id mff_ovn_geneve, @@ -902,11 +965,13 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_port_binding_by_name, uint32_t dp_key = binding->datapath->tunnel_key; uint32_t port_key = binding->tunnel_key; struct local_datapath *ld; + struct match match; if (!(ld = get_local_datapath(local_datapaths, dp_key))) { return; } - struct match match; + handle_migration_destination(binding, chassis, flow_table, ofpacts_p); + if (!strcmp(binding->type, "patch") || (!strcmp(binding->type, "l3gateway") && binding->chassis == chassis)) { diff --git a/controller/pinctrl.c b/controller/pinctrl.c index d2bb7f441..6ede23a72 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -29,10 +29,12 @@ #include "lport.h" #include "mac-learn.h" #include "nx-match.h" +#include "ofctrl.h" #include "latch.h" #include "lib/packets.h" #include "lib/sset.h" #include "openvswitch/ofp-actions.h" +#include "openvswitch/ofp-flow.h" #include "openvswitch/ofp-msgs.h" #include "openvswitch/ofp-packet.h" #include "openvswitch/ofp-print.h" @@ -152,8 +154,8 @@ VLOG_DEFINE_THIS_MODULE(pinctrl); * and pinctrl_run(). * 'pinctrl_handler_seq' is used by pinctrl_run() to * wake up pinctrl_handler thread from poll_block() if any changes happened - * in 'send_garp_rarp_data', 'ipv6_ras' and 'buffered_mac_bindings' - * structures. + * in 'send_garp_rarp_data', 'ipv6_ras', 'buffered_mac_bindings' and + * 'unblocked_migration_ports' structures. * * 'pinctrl_main_seq' is used by pinctrl_handler() thread to wake up * the main thread from poll_block() when mac bindings/igmp groups need to @@ -294,6 +296,18 @@ static void pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow, struct dp_packet *pkt_in, const struct match *md); + +static void pinctrl_unblock_migration(struct rconn *swconn, + const struct match *md); +static void init_unblocked_migration_ports(void); +static void destroy_unblocked_migration_ports(void); +static void wait_unblocked_migration_ports( + struct ovsdb_idl_txn *ovnsb_idl_txn); +static void run_unblocked_migration_ports(struct ovsdb_idl_txn *ovnsb_idl_txn, + struct ovsdb_idl_index *sbrec_datapath_binding_by_key, + struct ovsdb_idl_index *sbrec_port_binding_by_name) + OVS_REQUIRES(pinctrl_mutex); + static void init_svc_monitors(void); static void destroy_svc_monitors(void); static void sync_svc_monitors( @@ -522,6 +536,7 @@ pinctrl_init(void) init_ipv6_ras(); init_ipv6_prefixd(); init_buffered_packets_map(); + init_unblocked_migration_ports(); init_event_table(); ip_mcast_snoop_init(); init_put_vport_bindings(); @@ -3234,6 +3249,12 @@ process_packet_in(struct rconn *swconn, const struct ofp_header *msg) ovs_mutex_unlock(&pinctrl_mutex); break; + case ACTION_OPCODE_UNBLOCK_MIGRATION: + ovs_mutex_lock(&pinctrl_mutex); + pinctrl_unblock_migration(swconn, &pin.flow_metadata); + ovs_mutex_unlock(&pinctrl_mutex); + break; + default: VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32, ntohl(ah->opcode)); @@ -3498,6 +3519,9 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, bfd_monitor_run(ovnsb_idl_txn, bfd_table, sbrec_port_binding_by_name, chassis, active_tunnels); run_put_fdbs(ovnsb_idl_txn, sbrec_fdb_by_dp_key_mac); + run_unblocked_migration_ports( + ovnsb_idl_txn, sbrec_datapath_binding_by_key, + sbrec_port_binding_by_key); ovs_mutex_unlock(&pinctrl_mutex); } @@ -4026,6 +4050,7 @@ pinctrl_wait(struct ovsdb_idl_txn *ovnsb_idl_txn) int64_t new_seq = seq_read(pinctrl_main_seq); seq_wait(pinctrl_main_seq, new_seq); wait_put_fdbs(ovnsb_idl_txn); + wait_unblocked_migration_ports(ovnsb_idl_txn); } /* Called by ovn-controller. */ @@ -4040,6 +4065,7 @@ pinctrl_destroy(void) destroy_ipv6_ras(); destroy_ipv6_prefixd(); destroy_buffered_packets_map(); + destroy_unblocked_migration_ports(); event_table_destroy(); destroy_put_mac_bindings(); destroy_put_vport_bindings(); @@ -7719,6 +7745,129 @@ pinctrl_handle_svc_check(struct rconn *swconn, const struct flow *ip_flow, } } +static struct ofpbuf * +encode_flow_mod(struct ofputil_flow_mod *fm) +{ + fm->buffer_id = UINT32_MAX; + fm->out_port = OFPP_ANY; + fm->out_group = OFPG_ANY; + return ofputil_encode_flow_mod(fm, OFPUTIL_P_OF15_OXM); +} + +struct port_pair { + uint32_t dp_key; + uint32_t port_key; + struct ovs_list list; +}; + +static struct ovs_list unblocked_migration_ports; + +static void +init_unblocked_migration_ports(void) +{ + ovs_list_init(&unblocked_migration_ports); +} + +static void +destroy_unblocked_migration_ports(void) +{ + struct port_pair *pp; + LIST_FOR_EACH_POP (pp, list, &unblocked_migration_ports) { + free(pp); + } +} + +static void +wait_unblocked_migration_ports(struct ovsdb_idl_txn *ovnsb_idl_txn) +{ + if (ovnsb_idl_txn && !ovs_list_is_empty(&unblocked_migration_ports)) { + poll_immediate_wake(); + } +} + +static void +run_unblocked_migration_ports( + struct ovsdb_idl_txn *ovnsb_idl_txn, + struct ovsdb_idl_index *sbrec_datapath_binding_by_key, + struct ovsdb_idl_index *sbrec_port_binding_by_key) + OVS_REQUIRES(pinctrl_mutex) +{ + if (!ovnsb_idl_txn) { + return; + } + + const struct port_pair *pp; + LIST_FOR_EACH (pp, list, &unblocked_migration_ports) { + const struct sbrec_port_binding *pb = lport_lookup_by_key( + sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, + pp->dp_key, pp->port_key); + if (pb) { + sbrec_port_binding_update_options_setkey( + pb, "migration-unblocked", "true"); + } + } + destroy_unblocked_migration_ports(); +} + + +static void +pinctrl_unblock_migration(struct rconn *swconn, const struct match *md) + OVS_REQUIRES(pinctrl_mutex) +{ + struct match match; + struct minimatch mmatch; + + /* Delete inport controller flow (the one that got us here */ + match_init_catchall(&match); + match_set_metadata(&match, md->flow.metadata); + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, + md->flow.regs[MFF_LOG_INPORT - MFF_REG0]); + match_set_dl_type(&match, ETH_TYPE_RARP); + minimatch_init(&mmatch, &match); + + /* Remove the flow that got us here. */ + struct ofputil_flow_mod fm = { + .match = mmatch, + .priority = 1010, + .table_id = OFTABLE_LOG_INGRESS_PIPELINE, + .command = OFPFC_DELETE_STRICT, + }; + queue_msg(swconn, encode_flow_mod(&fm)); + minimatch_destroy(&mmatch); + + /* Delete [in|e]gress drop-all flows to unblock the port. */ + match_init_catchall(&match); + match_set_metadata(&match, md->flow.metadata); + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, + md->flow.regs[MFF_LOG_INPORT - MFF_REG0]); + minimatch_init(&mmatch, &match); + + fm.match = mmatch; + fm.priority = 1000; + queue_msg(swconn, encode_flow_mod(&fm)); + minimatch_destroy(&mmatch); + + match_init_catchall(&match); + match_set_metadata(&match, md->flow.metadata); + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, + md->flow.regs[MFF_LOG_INPORT - MFF_REG0]); + minimatch_init(&mmatch, &match); + + fm.match = mmatch; + fm.table_id = OFTABLE_LOG_EGRESS_PIPELINE; + queue_msg(swconn, encode_flow_mod(&fm)); + minimatch_destroy(&mmatch); + + /* Tag the port as migration-unblocked. */ + struct port_pair *pp = xmalloc(sizeof *pp); + pp->port_key = md->flow.regs[MFF_LOG_INPORT - MFF_REG0]; + pp->dp_key = ntohll(md->flow.metadata); + ovs_list_push_front(&unblocked_migration_ports, &pp->list); + + /* Notify the main thread about pending migration-unblocked updates. */ + notify_pinctrl_main(); +} + static struct hmap put_fdbs; /* MAC learning (fdb) related functions. Runs within the main diff --git a/controller/vif-plug.c b/controller/vif-plug.c index 62b75263c..37a0eca34 100644 --- a/controller/vif-plug.c +++ b/controller/vif-plug.c @@ -407,7 +407,8 @@ consider_plug_lport(const struct sbrec_port_binding *pb, { bool ret = true; if (lport_can_bind_on_this_chassis(vif_plug_ctx_in->chassis_rec, pb) - && pb->requested_chassis == vif_plug_ctx_in->chassis_rec) { + && (pb->requested_chassis == vif_plug_ctx_in->chassis_rec || + pb->migration_destination == vif_plug_ctx_in->chassis_rec)) { const char *vif_plug_type = smap_get(&pb->options, VIF_PLUG_OPTION_TYPE); if (!vif_plug_type) { @@ -560,6 +561,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in, !vif_plug_prime_idl_count); } + /* Handle requested-chassis. */ struct sbrec_port_binding *target = sbrec_port_binding_index_init_row( vif_plug_ctx_in->sbrec_port_binding_by_requested_chassis); @@ -577,6 +579,24 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in, } } sbrec_port_binding_index_destroy_row(target); + + /* Handle migration-destination. */ + target = + sbrec_port_binding_index_init_row( + vif_plug_ctx_in->sbrec_port_binding_by_migration_destination); + sbrec_port_binding_index_set_migration_destination( + target, + vif_plug_ctx_in->chassis_rec); + SBREC_PORT_BINDING_FOR_EACH_EQUAL ( + pb, target, + vif_plug_ctx_in->sbrec_port_binding_by_migration_destination) { + enum en_lport_type lport_type = get_lport_type(pb); + if (lport_type == LP_VIF) { + vif_plug_handle_lport_vif(pb, vif_plug_ctx_in, vif_plug_ctx_out, + !vif_plug_prime_idl_count); + } + } + sbrec_port_binding_index_destroy_row(target); } static void diff --git a/controller/vif-plug.h b/controller/vif-plug.h index 76063591b..b957eab68 100644 --- a/controller/vif-plug.h +++ b/controller/vif-plug.h @@ -33,6 +33,7 @@ struct vif_plug_ctx_in { struct ovsdb_idl_txn *ovs_idl_txn; struct ovsdb_idl_index *sbrec_port_binding_by_name; struct ovsdb_idl_index *sbrec_port_binding_by_requested_chassis; + struct ovsdb_idl_index *sbrec_port_binding_by_migration_destination; struct ovsdb_idl_index *ovsrec_port_by_interfaces; const struct ovsrec_open_vswitch_table *ovs_table; const struct ovsrec_bridge *br_int; diff --git a/include/ovn/actions.h b/include/ovn/actions.h index cdef5fb03..45c367e81 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -113,6 +113,7 @@ struct ovn_extend_table; OVNACT(PUT_FDB, ovnact_put_fdb) \ OVNACT(GET_FDB, ovnact_get_fdb) \ OVNACT(LOOKUP_FDB, ovnact_lookup_fdb) \ + OVNACT(UNBLOCK_MIGRATION, ovnact_unblock_migration) \ /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -411,6 +412,11 @@ struct ovnact_handle_svc_check { struct expr_field port; /* Logical port name. */ }; +/* OVNACT_UNBLOCK_MIGRATION. */ +struct ovnact_unblock_migration { + struct ovnact ovnact; +}; + /* OVNACT_FWD_GROUP. */ struct ovnact_fwd_group { struct ovnact ovnact; @@ -635,6 +641,14 @@ enum action_opcode { * MFF_LOG_INPORT = port */ ACTION_OPCODE_HANDLE_SVC_CHECK, + + /* "unblock_migration()"." + * + * Remove flows that block ingress and egress for the port. + * Used in live migration scenarios. + */ + ACTION_OPCODE_UNBLOCK_MIGRATION, + /* handle_dhcpv6_reply { ...actions ...}." * * The actions, in OpenFlow 1.3 format, follow the action_header. diff --git a/lib/actions.c b/lib/actions.c index d5d8391bb..058cef00a 100644 --- a/lib/actions.c +++ b/lib/actions.c @@ -3565,6 +3565,40 @@ ovnact_handle_svc_check_free(struct ovnact_handle_svc_check *sc OVS_UNUSED) { } +static void +parse_unblock_migration(struct action_context *ctx OVS_UNUSED) +{ + if (!lexer_force_match(ctx->lexer, LEX_T_LPAREN)) { + return; + } + + ovnact_put_UNBLOCK_MIGRATION(ctx->ovnacts); + lexer_force_match(ctx->lexer, LEX_T_RPAREN); +} + +static void +format_UNBLOCK_MIGRATION( + const struct ovnact_unblock_migration *unblock_dm OVS_UNUSED, + struct ds *s) +{ + ds_put_cstr(s, "unblock_migration();"); +} + +static void +encode_UNBLOCK_MIGRATION( + const struct ovnact_unblock_migration *unblock_dm OVS_UNUSED, + const struct ovnact_encode_params *ep, + struct ofpbuf *ofpacts) +{ + encode_controller_op(ACTION_OPCODE_UNBLOCK_MIGRATION, + ep->ctrl_meter_id, ofpacts); +} + +static void +ovnact_unblock_migration_free(struct ovnact_unblock_migration *sc OVS_UNUSED) +{ +} + static void parse_fwd_group_action(struct action_context *ctx) { @@ -4113,6 +4147,8 @@ parse_action(struct action_context *ctx) parse_bind_vport(ctx); } else if (lexer_match_id(ctx->lexer, "handle_svc_check")) { parse_handle_svc_check(ctx); + } else if (lexer_match_id(ctx->lexer, "unblock_migration")) { + parse_unblock_migration(ctx); } else if (lexer_match_id(ctx->lexer, "fwd_group")) { parse_fwd_group_action(ctx); } else if (lexer_match_id(ctx->lexer, "handle_dhcpv6_reply")) { @@ -4356,6 +4392,7 @@ ovnact_op_to_string(uint32_t ovnact_opc) ACTION_OPCODE(BIND_VPORT) \ ACTION_OPCODE(DHCP6_SERVER) \ ACTION_OPCODE(HANDLE_SVC_CHECK) \ + ACTION_OPCODE(UNBLOCK_MIGRATION) \ ACTION_OPCODE(BFD_MSG) #define ACTION_OPCODE(ENUM) \ case ACTION_OPCODE_##ENUM: return xstrdup(#ENUM); diff --git a/northd/northd.c b/northd/northd.c index c0ecf2346..5753ff464 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -3278,6 +3278,45 @@ ovn_port_update_sbrec(struct northd_input *input_data, smap_add(&options, "vlan-passthru", "true"); } + const char *migration_destination; + bool reset_migration_destination = false; + migration_destination = smap_get(&op->nbsp->options, + "migration-destination"); + if (migration_destination) { + const struct sbrec_chassis *chassis; /* May be NULL. */ + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, + migration_destination); + chassis = chassis ? chassis : chassis_lookup_by_hostname( + sbrec_chassis_by_hostname, + migration_destination); + + if (chassis) { + sbrec_port_binding_set_migration_destination(op->sb, + chassis); + } else { + reset_migration_destination = true; + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT( + 1, 1); + VLOG_WARN_RL( + &rl, + "Unknown chassis '%s' set as " + "options:migration-destination on LSP '%s'.", + migration_destination, op->nbsp->name); + } + } else if (op->sb->migration_destination) { + reset_migration_destination = true; + } + if (reset_migration_destination) { + sbrec_port_binding_set_migration_destination(op->sb, NULL); + } + if (!reset_migration_destination) { + /* Retain migration-unblocked. */ + if (smap_get_bool(&op->sb->options, + "migration-unblocked", false)) { + smap_add(&options, "migration-unblocked", "true"); + } + } + sbrec_port_binding_set_options(op->sb, &options); smap_destroy(&options); if (ovn_is_known_nb_lsp_type(op->nbsp->type)) { @@ -3339,6 +3378,7 @@ ovn_port_update_sbrec(struct northd_input *input_data, if (reset_requested_chassis) { sbrec_port_binding_set_requested_chassis(op->sb, NULL); } + } else { const char *chassis = NULL; if (op->peer && op->peer->od && op->peer->od->nbr) { diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 793135ede..3e1992142 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -100,7 +100,10 @@ static const char *rbac_fdb_update[] = static const char *rbac_port_binding_auth[] = {""}; static const char *rbac_port_binding_update[] = - {"chassis", "encap", "up", "virtual_parent"}; + {"chassis", "encap", "up", "virtual_parent", + /* NOTE: we only need to update the migration-unblocked key, + * but RBAC_Role doesn't support mutate operation. */ + "options"}; static const char *rbac_mac_binding_auth[] = {""}; diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 2fe73959c..7dd298860 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -137,7 +137,8 @@ relation OutProxy_Port_Binding ( mac: Set<istring>, nat_addresses: Set<istring>, external_ids: Map<istring,istring>, - requested_chassis: Option<uuid> + requested_chassis: Option<uuid>, + migration_destination: Option<uuid> ) /* Case 1a: Create a Port_Binding per logical switch port that is not of type @@ -154,7 +155,8 @@ OutProxy_Port_Binding(._uuid = lsp._uuid, .mac = lsp.addresses, .nat_addresses = set_empty(), .external_ids = eids, - .requested_chassis = None) :- + .requested_chassis = None, + .migration_destination = None) :- sp in &SwitchPort(.lsp = lsp, .sw = sw), SwitchPortNewDynamicTag(lsp._uuid, opt_tag), var tag = match (opt_tag) { @@ -195,7 +197,8 @@ OutProxy_Port_Binding(._uuid = lsp._uuid, .mac = lsp.addresses, .nat_addresses = set_empty(), .external_ids = eids, - .requested_chassis = Some{requested_chassis}) :- + .requested_chassis = Some{requested_chassis}, + .migration_destination = None) :- sp in &SwitchPort(.lsp = lsp, .sw = sw), SwitchPortNewDynamicTag(lsp._uuid, opt_tag), var tag = match (opt_tag) { @@ -237,7 +240,8 @@ OutProxy_Port_Binding(._uuid = lsp._uuid, .mac = lsp.addresses, .nat_addresses = set_empty(), .external_ids = eids, - .requested_chassis = None) :- + .requested_chassis = None, + .migration_destination = None) :- sp in &SwitchPort(.lsp = lsp, .sw = sw), SwitchPortNewDynamicTag(lsp._uuid, opt_tag), var tag = match (opt_tag) { @@ -292,7 +296,8 @@ OutProxy_Port_Binding(._uuid = lsp._uuid, .mac = lsp.addresses, .nat_addresses = nat_addresses, .external_ids = eids, - .requested_chassis = None) :- + .requested_chassis = None, + .migration_destination = None) :- SwitchPortLBIPs(.port = &SwitchPort{.lsp = lsp, .sw = sw, .peer = peer}, .lbips = lbips), var eids = { @@ -387,7 +392,8 @@ OutProxy_Port_Binding(._uuid = lrp._uuid, .mac = set_singleton(i"${lrp.mac} ${lrp.networks.map(ival).to_vec().join(\" \")}"), .nat_addresses = set_empty(), .external_ids = lrp.external_ids, - .requested_chassis = None) :- + .requested_chassis = None, + .migration_destination = None) :- rp in &RouterPort(.lrp = lrp, .router = router, .peer = peer), RouterPortRAOptionsComplete(lrp._uuid, options0), (var __type, var options1) = match (router.options.get(i"chassis")) { @@ -583,7 +589,8 @@ OutProxy_Port_Binding(._uuid = cr_lrp_uuid, .mac = set_singleton(i"${lrp.mac} ${lrp.networks.map(ival).to_vec().join(\" \")}"), .nat_addresses = set_empty(), .external_ids = lrp.external_ids, - .requested_chassis = None) :- + .requested_chassis = None, + .migration_destination = None) :- DistributedGatewayPort(lrp, lr_uuid, cr_lrp_uuid), DistributedGatewayPortHAChassisGroup(lrp, hacg_uuid), var redirect_type = match (lrp.options.get(i"redirect-type")) { @@ -629,7 +636,8 @@ sb::Out_Port_Binding(._uuid = pbinding._uuid, .nat_addresses = pbinding.nat_addresses, .external_ids = pbinding.external_ids, .up = Some{up}, - .requested_chassis = pbinding.requested_chassis) :- + .requested_chassis = pbinding.requested_chassis, + .migration_destination = pbinding.migration_destination) :- pbinding in OutProxy_Port_Binding(), PortTunKeyAllocation(pbinding._uuid, tunkey), QueueIDAllocation(pbinding._uuid, qid), diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml index ef8d669a2..f871f262e 100644 --- a/ovn-architecture.7.xml +++ b/ovn-architecture.7.xml @@ -1157,6 +1157,45 @@ </li> </ol> + <h2>Migration Life Cycle of a VIF</h2> + + <p> + This section describes how port is migrated to a different chassis. + </p> + + <p> + Sometimes a user may want to precisely control port binding location. In + this case, a logical switch port <code>options:requested-chassis</code> + property may be used. When set, the option specifies the name of the + chassis that should bind the port, and no other chassis will make any + attempts to bind it. When <code>options:requested-chassis</code> changes, + the old chassis will unbind the port and the new one will bind the port + instead. + + This process requires database object translation and flow setup, which + takes time and may result in perceived network downtime. To avoid it, + a user may use <code>options:migration-destination</code> property on + a logical switch port. When set, the chassis that this option points to + will pre-configure the port binding, including all relevant flows, but + will keep it deactivated. Which means all traffic incoming or outgoing + from the port will be blocked (corresponding <code>drop</code> flows + are installed in tables 8 and 40). When CMS is ready to pass ownership of + the binding to the new chassis, it should send a RARP (Reverse ARP) + packet from the port, in which case a special controller action handler + attached to <code>rarp</code> matching flow will unblock incoming and + outgoing traffic for the port by removing previously installed + <code>drop</code> flows. It will also set + <code>options:migration-unblocked</code> to <code>true</code> for the + port binding. + + At this point the port binding is active on both chassis, and it's + assumed that the user made sure that the original chassis wouldn't + send any more packets using the original port. It's expected that in + due course the user will complete port migration by setting + <code>options:requested-chassis</code> to point to the new chassis + and removing <code>options:migration-destination</code>. + </p> + <h2>Architectural Physical Life Cycle of a Packet</h2> <p> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema index 122614dd5..520a4127c 100644 --- a/ovn-sb.ovsschema +++ b/ovn-sb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Southbound", "version": "20.21.0", - "cksum": "2362446865 26963", + "cksum": "3647452942 27260", "tables": { "SB_Global": { "columns": { @@ -236,6 +236,10 @@ "requested_chassis": {"type": {"key": {"type": "uuid", "refTable": "Chassis", "refType": "weak"}, + "min": 0, "max": 1}}, + "migration_destination": {"type": {"key": {"type": "uuid", + "refTable": "Chassis", + "refType": "weak"}, "min": 0, "max": 1}}}, "indexes": [["datapath", "tunnel_key"], ["logical_port"]], "isRoot": true}, diff --git a/ovn-sb.xml b/ovn-sb.xml index 9ddacdf09..3a7b01bbb 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -3048,6 +3048,30 @@ tcp.flags = RST; is defined and contains a string matching the name or hostname of an existing chassis. </column> + <column name="migration_destination"> + This column exists so that the ovn-controller can effectively monitor + all <ref table="Port_Binding"/> records destined for migration to it, + and is a supplement to the <ref + table="Port_Binding" + column="options" + key="migration-destination"/> option. The option is still required so + that the ovn-controller can check the CMS intent when the chassis + pointed to does not currently exist, which for example occurs when the + ovn-controller is stopped without passing the --restart argument. + + This option implies that <ref table="Port_Binding" column="options" + key="requested-chassis"/> is also set. + + This column must be a + <ref table="Chassis"/> record. This is populated by + <code>ovn-northd</code> when the <ref + table="Logical_Switch_Port" + column="options" + key="migration-destination" + db="OVN_Northbound"/> + is defined and contains a string matching the name or hostname of an + existing chassis. + </column> </group> <group title="Patch Options"> diff --git a/tests/ovn.at b/tests/ovn.at index 957eb7850..f6e07053c 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -13654,6 +13654,174 @@ OVN_CLEANUP([hv1],[hv2]) AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([options:migration-destination for logical port]) +ovn_start + +net_add n1 + +ovn-nbctl ls-add ls0 -- add Logical_Switch ls0 other_config vlan-passthru=true +ovn-nbctl lsp-add ls0 lsp0 +ovn-nbctl lsp-set-addresses lsp0 "00:00:00:00:00:01 10.0.0.1" + +ovn-nbctl lsp-add ls0 lsp1 +ovn-nbctl lsp-set-addresses lsp1 "00:00:00:00:00:10 10.0.0.10" + +# create two hypervisors, each with one vif port for the same LSP +sim_add hv1 +as hv1 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.11 +ovs-vsctl -- add-port br-int hv1-vif0 -- \ +set Interface hv1-vif0 ofport-request=1 \ + external-ids:iface-id=lsp0 \ + options:tx_pcap=hv1/vif0-tx.pcap \ + options:rxq_pcap=hv1/vif0-rx.pcap + +sim_add hv2 +as hv2 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.12 +ovs-vsctl -- add-port br-int hv2-vif0 -- \ +set Interface hv2-vif0 ofport-request=1 \ + external-ids:iface-id=lsp0 \ + options:tx_pcap=hv2/vif0-tx.pcap \ + options:rxq_pcap=hv2/vif0-rx.pcap + +# create another hypervisor to receive packets from the migrating LSP +sim_add hv3 +as hv3 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.13 +ovs-vsctl -- add-port br-int vif1 -- \ +set Interface vif1 ofport-request=2 +ovs-vsctl set interface vif1 external-ids:iface-id=lsp1 \ + options:tx_pcap=vif1-tx.pcap \ + options:rxq_pcap=vif1-rx.pcap + +# Allow only chassis hv2 to bind logical port lsp0. +ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 + +# Allow some time for ovn-northd and ovn-controller to catch up. +check ovn-nbctl --wait=hv sync + +# Check that migration destination is not set for port binding +hv1_uuid=$(fetch_column Chassis _uuid name=hv1) +hv2_uuid=$(fetch_column Chassis _uuid name=hv2) +pb_uuid=$(fetch_column Port_Binding _uuid logical_port=lsp0) +migration_destination=$(ovn-sbctl get port_binding $pb_uuid migration_destination) +AT_CHECK([test x"${migration_destination}" = x"[[]]"], [0], []) + +# Migrate port hv2 -> hv1: both hypervisors are bound +check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=hv2 migration-destination=hv1 +check ovn-nbctl --wait=hv sync + +# Check that migration destination is set now +migration_destination=$(ovn-sbctl get port_binding $pb_uuid migration_destination) +AT_CHECK([test x"${migration_destination}" = x"${hv1_uuid}"], [0], []) + +# Check that both vifs got flows set +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ignore]) +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) + +# Check that hv1 has a flow to circumvent RARP +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=8 | grep priority=1010 | grep rarp | grep actions=controller], [0], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=8 | grep priority=1000 | grep actions=drop], [0], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=40 | grep priority=1000 | grep actions=drop], [0], [ignore]) + +# Check that hv2 doesn't have these flows +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep priority=1010 | grep rarp | grep actions=controller], [1], [ignore]) +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep priority=1000 | grep actions=drop], [1], [ignore]) +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=40 | grep priority=1000 | grep actions=drop], [1], [ignore]) + +OVN_POPULATE_ARP + +: > expected + +send_garp() { + local hv=$1 inport=$2 eth_src=$3 eth_dst=$4 spa=$5 tpa=$6 succ=$7 + local request=${eth_dst}${eth_src}08060001080006040001${eth_src}${spa}${eth_dst}${tpa} + as ${hv} ovs-appctl netdev-dummy/receive $inport $request + if [[ x${succ} = x1 ]]; then + echo ${request} >> expected + fi +} + +send_rarp() { + local hv=$1 inport=$2 eth_src=$3 eth_dst=$4 spa=$5 tpa=$6 + local request=${eth_dst}${eth_src}80350001080006040001${eth_src}${spa}${eth_dst}${tpa} + as ${hv} ovs-appctl netdev-dummy/receive $inport $request + echo ${request} >> expected +} + +reset_pcap_file() { + local iface=$1 + local pcap_file=$2 + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ +options:rxq_pcap=dummy-rx.pcap + rm -f ${pcap_file}*.pcap + ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \ +options:rxq_pcap=${pcap_file}-rx.pcap +} + +# Send three packets from each port binding, only one will allow them +spa=$(ip_to_hex 10 0 0 1) +tpa=$(ip_to_hex 10 0 0 10) +for i in 1 2 3; do + send_garp hv1 hv1-vif0 000000000001 ffffffffffff $spa $tpa 0 + send_garp hv2 hv2-vif0 000000000001 ffffffffffff $spa $tpa 1 +done + +# Check that migrating destination didn't observe RARP activation yet +migration_unblocked=$(ovn-sbctl get port_binding $pb_uuid options:migration-unblocked | tr -d '""') +AT_CHECK([test x"${migration_unblocked}" = x""], [0], []) + +OVN_CHECK_PACKETS([vif1-tx.pcap], [expected]) +as hv3 reset_pcap_file vif1 vif1 + +# Now "activate" hv1 binding with a RARP sent by migration-destination vif +send_rarp hv1 hv1-vif0 000000000001 ffffffffffff $spa $tpa + +# Check that the binding is now tagged as observed, meaning traffic is unblocked +migration_unblocked=$(ovn-sbctl get port_binding $pb_uuid options:migration-unblocked | tr -d '""') +AT_CHECK([test x"${migration_unblocked}" = x"true"], [0], []) + +# Check that flows that blocked traffic for the migration destination port are now gone +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=8 | grep priority=1010 | grep rarp | grep actions=controller], [1], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=8 | grep priority=1000 | grep actions=drop], [1], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=40 | grep priority=1000 | grep actions=drop], [1], [ignore]) + +: > expected + +# Send three packets from each port binding, now expect both allowed +for i in 1 2 3; do + send_garp hv1 hv1-vif0 000000000001 ffffffffffff $spa $tpa 1 + send_garp hv2 hv2-vif0 000000000001 ffffffffffff $spa $tpa 1 +done + +OVN_CHECK_PACKETS([vif1-tx.pcap], [expected]) + +# Complete migration: destination is bound +check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=hv1 +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], []) +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [1], []) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ignore]) +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) + +# Check that migration-destination and migration-unblocked are reset +migration_unblocked=$(ovn-sbctl get port_binding $pb_uuid options:migration-unblocked | tr -d '""') +AT_CHECK([test x"${migration_unblocked}" = x""], [0], []) + +migration_destination=$(ovn-sbctl get port_binding $pb_uuid migration_destination) +AT_CHECK([test x"${migration_destination}" = x'[[]]'], [0], []) + +OVN_CLEANUP([hv1],[hv2]) + +AT_CLEANUP +]) + OVN_FOR_EACH_NORTHD([ AT_SETUP([options:requested-chassis with hostname]) diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c index 0795913d3..1c5cb3132 100644 --- a/utilities/ovn-trace.c +++ b/utilities/ovn-trace.c @@ -2799,6 +2799,8 @@ trace_actions(const struct ovnact *ovnacts, size_t ovnacts_len, case OVNACT_HANDLE_SVC_CHECK: break; + case OVNACT_UNBLOCK_MIGRATION: + break; case OVNACT_FWD_GROUP: break; -- 2.31.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
