On Thu, Jul 18, 2019 at 2:47 AM Guru Shetty <[email protected]> wrote:
>
>
> On Tue, 16 Jul 2019 at 17:48, <[email protected]> wrote:
>
>> From: Numan Siddique <[email protected]>
>>
>> This new type is added for the following reasons:
>>
>> - When a load balancer is created in an OpenStack deployment with
>> Octavia
>> service, it creates a logical port 'VIP' for the virtual ip.
>>
>> - This logical port is not bound to any VIF.
>>
>> - Octavia service creates a service VM (with another logical port 'P'
>> which
>> belongs to the same logical switch)
>>
>> - The virtual ip 'VIP' is configured on this service VM.
>>
>
> Does this mean that the VIP is the IP address of this service VM?
> Or does the service VM have a different IP but also responds to VIP?
>
The latter is correct. The service VM will have a different IP address but
also
will be configured with the VIP.
>
>
>>
>> - This service VM provides the load balancing for the VIP with the
>> configured
>> backend IPs.
>>
>> - Octavia service can be configured to create few service VMs with
>> active-standby mode
>> with the active VM configured with the VIP. The VIP can move between
>> these service nodes.
>>
>> Presently there are few problems:
>>
>> - When a floating ip (externally reachable IP) is associated to the VIP
>> and if
>> the compute nodes have external connectivity then the external
>> traffic cannot
>> reach the VIP using the floating ip as the VIP logical port would be
>> down.
>> dnat_and_snat entry in NAT table for this vip will have
>> 'external_mac' and
>> 'logical_port' configured.
>>
>
> So floating ip is the DNAT ip in OVN and VIP is the logical port IP?
>
That's right.
>
> It would have worked if NAT table had logical_port column set as service
> VM? But you don't do it because you don't know which service VM is active?
>
That is right.
>
>
>
>>
>> - The only way to make it work is to clear the 'external_mac' entry so
>> that
>> the gateway chassis does the DNAT for the VIP.
>>
> In this case, floating IP would convert to VIP and OVN will just send it
> to whichever logical port will respond to arp of that VIP? And in this case
> it is the "active" service VM?
>
That is right.
>
>
>
>>
>> To solve these problems, this patch proposes a new logical port type -
>> virtual.
>> CMS when creating the logical port for the VIP, should
>>
>
> So your goal of this patch is to not do centralized routing? And you want
> to do distributed routing and hence jumping through hoops?
>
Correct.
>
>
>
>>
>> - set the type as 'virtual'
>>
>> - configure the VIP in the options -
>> Logical_Switch_Port.options:virtual-ip
>>
>> - And set the virtual parents in the options
>> Logical_Switch_Port.options:virtual-parents.
>> These virtual parents are the one which can be configured with the VIP.
>>
>> If suppose the virtual_ip is configured to 10.0.0.10 on a virtual logical
>> port 'sw0-vip'
>> and the virtual_parents are set to - [sw0-p1, sw0-p2] then below logical
>> flows are added in the
>> lsp_in_arp_rsp logical switch pipeline
>>
>
>> - table=11(ls_in_arp_rsp), priority=100,
>> match=(inport == "sw0-p1" && !is_chassis_resident("sw0-vip") &&
>> ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10)
>> ||
>> (arp.op == 2 && arp.spa == 10.0.0.10))),
>>
> action=(bind_vport("sw0-vip", inport); next;)
>>
>
> 1. I haven't looked at ovn-northd code for a while, so this is probably
> stupid question. "sw0-p1" which has a IP of 10.0.0.10 is sending a ARP
> request with spa and tpa of 10.0.0.10? I don't understand what is happening
> here. Can you explain?
>
Sure.
If we take the above example logical port "sw0-vip". The virtual_ip
configured on this logical port is 10.0.0.10.
Let's say sw0-p1 and sw0-p2 are its virtual parents with sw0-p1 configured
with IP - 10.0.0.3 anf sw0-p2 configured with IP - 10.0.0.4
To make it clear below is the output of ovn-nbctl show
*************
$ovn-nbctl show
switch 1c45db56-d128-4fbe-9402-ac6fd8e26b80 (sw0)
port sw0-vip
type: virtual
addresses: ["50:54:00:00:00:10 10.0.0.10"]
port sw0-p1
addresses: ["50:54:00:00:00:03 10.0.0.3"]
port sw0-p2
addresses: ["50:54:00:00:00:04 10.0.0.4"]
port sw0-p3
addresses: ["50:54:00:00:00:05 10.0.0.5"]
***************
Let's say the IP address 10.0.0.10 is just configured on sw0-p1.
sw0-p1 resides on chassis ch1 and sw0-p2 resides on chassis ch2.
The above logical flow handles 2 scenarios.
Scenario 1: Lets say sw0-p3 wants to ping 10.0.0.10. When it sends ARP
request for 10.0.0.10,
sw0-p1 will reply with the ARP reply packet. The above logical flow with
the match (arp.op == 2 && arp.spa == 10.0.0.10) will
be hit. The expression !is_chassis_resident("sw0-vip") will be true
because chassis ch1 has not bound the logical port "sw0-vip" yet.
The action - bind_vport causes the ARP reply packet from sw0-p1 to be sent
to the ovn-controller in chassis ch1 and it will
bind the logical port - sw0-vip.
Subsequent ARP replies with (arp.op == 2 && arp.spa == 10.0.0.10) will not
hit this logical flow on chassis ch1 because "sw0-vip" is
bound on this chassis and the expression - !is_chassis_resident("sw0-vip")
will be false.
Let's say sw0-p1 goes down for some reason, and the vip - 10.0.0.10 is now
configured on sw0-p3 logical port. When some one sends
ARP request for 10.0.0.10, sw0-p3 will handle the ARP reply and the logical
port sw0-vip will be now bound to chassis ch2.
Scenario 2: When keepalived is used to manage the VIPs, it will be running
on sw0-p1 and sw0-p2 VMs and after the leader election,
the leader will configure the vip - 10.0.0.10. If sw0-p1 wins the election,
then keepalived will configures the vip - 10.0.0.10 and it sends a GARP for
10.0.0.10. The logical flow with the match - (arp.op == 1 && arp.spa ==
10.0.0.10 && arp.tpa == 10.0.0.10) will be hit and sw0-vip will be
bound by chassis ch1.
If sw0-p1 goes down for some reason, keepalived on sw0-p2 detects it and it
becomes the leader and configures 10.0.0.10 on sw0-p2. It will
send GARP for 10.0.0.10 and sw0-vip will be bound on chassis ch2.
> 2. After bind_vport(sw0-vip) is executed , does
> is_chassis_resident(sw0-vip) return true?
>
Correct.
>
> Also, a unit test "ovn -- 1 LR with distributed router gateway port" fails
> consistently for me.
>
I ran the tests multiple times and it passes. I think it could be because
of timing issues.
Can you please rebase to latest master and try again ? There was a bug in
ovn-northd
which was fixed recently which causes some timeouts in the tests.
If it still fails, can you please share the output of testsuite.log ?
Please let me know if you have any further questions.
I will submit v8 with the suggestions from Ben.
Thanks
Numan
>
>> - table=11(ls_in_arp_rsp), priority=100,
>> match=(inport == "sw0-p2" && !is_chassis_resident("sw0-vip") &&
>> ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10)
>> ||
>> (arp.op == 2 && arp.spa == 10.0.0.10))),
>> action=(bind_vport("sw0-vip", inport); next;)
>>
>> The action bind_vport will claim the logical port - sw0-vip on the
>> chassis where this action
>> is executed. Since the port - sw0-vip is claimed by a chassis, the
>> dnat_and_snat rule for
>> the VIP will be handled by the compute node.
>>
>> Signed-off-by: Numan Siddique <[email protected]>
>> ---
>>
>> v6 -> v7
>> ========
>> * Resolved merge conflicts.
>>
>> v5 -> v6
>> ========
>> * Resolved conflicts after rebasing to latest master in tests/ovn.at
>>
>> v4 -> v5
>> =======
>> * Rebased to master to resolve merge conflicts.
>>
>> v3 -> v4
>> =======
>> * Addressed the review comment and removed the code in northd which
>> referenced the Southbound db state while adding the logical flows.
>> Instead
>> using the ovn match - is_chassis_resident() - which I should have used
>> it in the first place.
>>
>> v2 -> v3
>> =======
>> * Addressed the review comments from Ben - deleted the new columns -
>> virtual_ip and virtual_parents from Logical_Switch_Port and instead
>> is making use of options column for this purpose.
>>
>> v1 -> v2
>> ========
>> * In v1, was not updating the 'put_vport_binding' struct if it already
>> exists in the put_vport_bindings hmap in the function -
>> pinctrl_handle_bind_vport().
>> In v2 handled it.
>> * Improved the if else check in binding.c when releasing the lports.
>>
>> include/ovn/actions.h | 18 ++-
>> ovn/controller/binding.c | 30 +++-
>> ovn/controller/pinctrl.c | 174 ++++++++++++++++++++
>> ovn/lib/actions.c | 60 +++++++
>> ovn/lib/ovn-util.c | 1 +
>> ovn/northd/ovn-northd.8.xml | 61 ++++++-
>> ovn/northd/ovn-northd.c | 306 +++++++++++++++++++++++++++---------
>> ovn/ovn-nb.xml | 45 ++++++
>> ovn/ovn-sb.ovsschema | 6 +-
>> ovn/ovn-sb.xml | 46 ++++++
>> ovn/utilities/ovn-trace.c | 3 +
>> tests/ovn.at | 281 +++++++++++++++++++++++++++++++++
>> tests/test-ovn.c | 1 +
>> 13 files changed, 945 insertions(+), 87 deletions(-)
>>
>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>> index 63d3907d8..0ca06537c 100644
>> --- a/include/ovn/actions.h
>> +++ b/include/ovn/actions.h
>> @@ -85,7 +85,8 @@ struct ovn_extend_table;
>> OVNACT(SET_METER, ovnact_set_meter) \
>> OVNACT(OVNFIELD_LOAD, ovnact_load) \
>> OVNACT(CHECK_PKT_LARGER, ovnact_check_pkt_larger) \
>> - OVNACT(TRIGGER_EVENT, ovnact_controller_event)
>> + OVNACT(TRIGGER_EVENT, ovnact_controller_event) \
>> + OVNACT(BIND_VPORT, ovnact_bind_vport)
>>
>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>> enum OVS_PACKED_ENUM ovnact_type {
>> @@ -328,6 +329,13 @@ struct ovnact_controller_event {
>> size_t n_options;
>> };
>>
>> +/* OVNACT_BIND_VPORT. */
>> +struct ovnact_bind_vport {
>> + struct ovnact ovnact;
>> + char *vport;
>> + struct expr_field vport_parent; /* Logical virtual port's port
>> name. */
>> +};
>> +
>> /* Internal use by the helpers below. */
>> void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
>> void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
>> @@ -505,6 +513,14 @@ enum action_opcode {
>> * Snoop IGMP, learn the multicast participants
>> */
>> ACTION_OPCODE_IGMP,
>> +
>> + /* "bind_vport(vport, vport_parent)".
>> + *
>> + * 'vport' follows the action_header, in the format - 32-bit field.
>> + * 'vport_parent' is passed through the packet metadata as
>> + * MFF_LOG_INPORT.
>> + */
>> + ACTION_OPCODE_BIND_VPORT,
>> };
>>
>> /* Header. */
>> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
>> index ace0f811b..dfe002b60 100644
>> --- a/ovn/controller/binding.c
>> +++ b/ovn/controller/binding.c
>> @@ -571,11 +571,31 @@ consider_local_datapath(struct ovsdb_idl_txn
>> *ovnsb_idl_txn,
>> sbrec_port_binding_set_encap(binding_rec, encap_rec);
>> }
>> } else if (binding_rec->chassis == chassis_rec) {
>> - VLOG_INFO("Releasing lport %s from this chassis.",
>> - binding_rec->logical_port);
>> - if (binding_rec->encap)
>> - sbrec_port_binding_set_encap(binding_rec, NULL);
>> - sbrec_port_binding_set_chassis(binding_rec, NULL);
>> + if (!strcmp(binding_rec->type, "virtual")) {
>> + /* pinctrl module takes care of binding the ports
>> + * of type 'virtual'.
>> + * Release such ports if their virtual parents are no
>> + * longer claimed by this chassis. */
>> + const struct sbrec_port_binding *parent
>> + = lport_lookup_by_name(sbrec_port_binding_by_name,
>> + binding_rec->virtual_parent);
>> + if (!parent || parent->chassis != chassis_rec) {
>> + VLOG_INFO("Releasing lport %s from this chassis.",
>> + binding_rec->logical_port);
>> + if (binding_rec->encap) {
>> + sbrec_port_binding_set_encap(binding_rec, NULL);
>> + }
>> + sbrec_port_binding_set_chassis(binding_rec, NULL);
>> + sbrec_port_binding_set_virtual_parent(binding_rec,
>> NULL);
>> + }
>> + } else {
>> + VLOG_INFO("Releasing lport %s from this chassis.",
>> + binding_rec->logical_port);
>> + if (binding_rec->encap) {
>> + sbrec_port_binding_set_encap(binding_rec, NULL);
>> + }
>> + sbrec_port_binding_set_chassis(binding_rec, NULL);
>> + }
>> } else if (our_chassis) {
>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
>> 1);
>> VLOG_INFO_RL(&rl,
>> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
>> index d857067a5..357050eb5 100644
>> --- a/ovn/controller/pinctrl.c
>> +++ b/ovn/controller/pinctrl.c
>> @@ -273,9 +273,22 @@ static void pinctrl_ip_mcast_handle_igmp(
>>
>> static bool may_inject_pkts(void);
>>
>> +static void init_put_vport_bindings(void);
>> +static void destroy_put_vport_bindings(void);
>> +static void run_put_vport_bindings(
>> + struct ovsdb_idl_txn *ovnsb_idl_txn,
>> + struct ovsdb_idl_index *sbrec_datapath_binding_by_key,
>> + struct ovsdb_idl_index *sbrec_port_binding_by_key,
>> + const struct sbrec_chassis *chassis)
>> + OVS_REQUIRES(pinctrl_mutex);
>> +static void wait_put_vport_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn);
>> +static void pinctrl_handle_bind_vport(const struct flow *md,
>> + struct ofpbuf *userdata);
>> +
>> COVERAGE_DEFINE(pinctrl_drop_put_mac_binding);
>> COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map);
>> COVERAGE_DEFINE(pinctrl_drop_controller_event);
>> +COVERAGE_DEFINE(pinctrl_drop_put_vport_binding);
>>
>> struct empty_lb_backends_event {
>> struct hmap_node hmap_node;
>> @@ -432,6 +445,7 @@ pinctrl_init(void)
>> init_buffered_packets_map();
>> init_event_table();
>> ip_mcast_snoop_init();
>> + init_put_vport_bindings();
>> pinctrl.br_int_name = NULL;
>> pinctrl_handler_seq = seq_create();
>> pinctrl_main_seq = seq_create();
>> @@ -1957,6 +1971,12 @@ process_packet_in(struct rconn *swconn, const
>> struct ofp_header *msg)
>> ovs_mutex_unlock(&pinctrl_mutex);
>> break;
>>
>> + case ACTION_OPCODE_BIND_VPORT:
>> + ovs_mutex_lock(&pinctrl_mutex);
>> + pinctrl_handle_bind_vport(&pin.flow_metadata.flow, &userdata);
>> + ovs_mutex_unlock(&pinctrl_mutex);
>> + break;
>> +
>> default:
>> VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32,
>> ntohl(ah->opcode));
>> @@ -2135,6 +2155,8 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>> run_put_mac_bindings(ovnsb_idl_txn, sbrec_datapath_binding_by_key,
>> sbrec_port_binding_by_key,
>> sbrec_mac_binding_by_lport_ip);
>> + run_put_vport_bindings(ovnsb_idl_txn, sbrec_datapath_binding_by_key,
>> + sbrec_port_binding_by_key, chassis);
>> send_garp_prepare(sbrec_port_binding_by_datapath,
>> sbrec_port_binding_by_name, br_int, chassis,
>> local_datapaths, active_tunnels);
>> @@ -2481,6 +2503,7 @@ pinctrl_wait(struct ovsdb_idl_txn *ovnsb_idl_txn)
>> {
>> wait_put_mac_bindings(ovnsb_idl_txn);
>> wait_controller_event(ovnsb_idl_txn);
>> + wait_put_vport_bindings(ovnsb_idl_txn);
>> int64_t new_seq = seq_read(pinctrl_main_seq);
>> seq_wait(pinctrl_main_seq, new_seq);
>> }
>> @@ -2498,6 +2521,7 @@ pinctrl_destroy(void)
>> destroy_buffered_packets_map();
>> event_table_destroy();
>> destroy_put_mac_bindings();
>> + destroy_put_vport_bindings();
>> destroy_dns_cache();
>> ip_mcast_snoop_destroy();
>> seq_destroy(pinctrl_main_seq);
>> @@ -4341,3 +4365,153 @@ pinctrl_handle_event(struct ofpbuf *userdata)
>> return;
>> }
>> }
>> +
>> +struct put_vport_binding {
>> + struct hmap_node hmap_node;
>> +
>> + /* Key and value. */
>> + uint32_t dp_key;
>> + uint32_t vport_key;
>> +
>> + uint32_t vport_parent_key;
>> +};
>> +
>> +/* Contains "struct put_vport_binding"s. */
>> +static struct hmap put_vport_bindings;
>> +
>> +static void
>> +init_put_vport_bindings(void)
>> +{
>> + hmap_init(&put_vport_bindings);
>> +}
>> +
>> +static void
>> +flush_put_vport_bindings(void)
>> +{
>> + struct put_vport_binding *vport_b;
>> + HMAP_FOR_EACH_POP (vport_b, hmap_node, &put_vport_bindings) {
>> + free(vport_b);
>> + }
>> +}
>> +
>> +static void
>> +destroy_put_vport_bindings(void)
>> +{
>> + flush_put_vport_bindings();
>> + hmap_destroy(&put_vport_bindings);
>> +}
>> +
>> +static void
>> +wait_put_vport_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn)
>> +{
>> + if (ovnsb_idl_txn && !hmap_is_empty(&put_vport_bindings)) {
>> + poll_immediate_wake();
>> + }
>> +}
>> +
>> +static struct put_vport_binding *
>> +pinctrl_find_put_vport_binding(uint32_t dp_key, uint32_t vport_key,
>> + uint32_t hash)
>> +{
>> + struct put_vport_binding *vpb;
>> + HMAP_FOR_EACH_WITH_HASH (vpb, hmap_node, hash, &put_vport_bindings) {
>> + if (vpb->dp_key == dp_key && vpb->vport_key == vport_key) {
>> + return vpb;
>> + }
>> + }
>> + return NULL;
>> +}
>> +
>> +static void
>> +run_put_vport_binding(struct ovsdb_idl_txn *ovnsb_idl_txn OVS_UNUSED,
>> + struct ovsdb_idl_index
>> *sbrec_datapath_binding_by_key,
>> + struct ovsdb_idl_index *sbrec_port_binding_by_key,
>> + const struct sbrec_chassis *chassis,
>> + const struct put_vport_binding *vpb)
>> +{
>> + /* Convert logical datapath and logical port key into lport. */
>> + const struct sbrec_port_binding *pb = lport_lookup_by_key(
>> + sbrec_datapath_binding_by_key, sbrec_port_binding_by_key,
>> + vpb->dp_key, vpb->vport_key);
>> + if (!pb) {
>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>> +
>> + VLOG_WARN_RL(&rl, "unknown logical port with datapath %"PRIu32" "
>> + "and port %"PRIu32, vpb->dp_key, vpb->vport_key);
>> + return;
>> + }
>> +
>> + /* pinctrl module updates the port binding only for type 'virtual'.
>> */
>> + if (!strcmp(pb->type, "virtual")) {
>> + const struct sbrec_port_binding *parent = lport_lookup_by_key(
>> + sbrec_datapath_binding_by_key, sbrec_port_binding_by_key,
>> + vpb->dp_key, vpb->vport_parent_key);
>> + if (parent) {
>> + VLOG_INFO("Claiming virtual lport %s for this chassis "
>> + "with the virtual parent %s",
>> + pb->logical_port, parent->logical_port);
>> + sbrec_port_binding_set_chassis(pb, chassis);
>> + sbrec_port_binding_set_virtual_parent(pb,
>> parent->logical_port);
>> + }
>> + }
>> +}
>> +
>> +/* Called by pinctrl_run(). Runs with in the main ovn-controller
>> + * thread context. */
>> +static void
>> +run_put_vport_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn,
>> + struct ovsdb_idl_index
>> *sbrec_datapath_binding_by_key,
>> + struct ovsdb_idl_index *sbrec_port_binding_by_key,
>> + const struct sbrec_chassis *chassis)
>> + OVS_REQUIRES(pinctrl_mutex)
>> +{
>> + if (!ovnsb_idl_txn) {
>> + return;
>> + }
>> +
>> + const struct put_vport_binding *vpb;
>> + HMAP_FOR_EACH (vpb, hmap_node, &put_vport_bindings) {
>> + run_put_vport_binding(ovnsb_idl_txn,
>> sbrec_datapath_binding_by_key,
>> + sbrec_port_binding_by_key, chassis, vpb);
>> + }
>> +
>> + flush_put_vport_bindings();
>> +}
>> +
>> +/* Called with in the pinctrl_handler thread context. */
>> +static void
>> +pinctrl_handle_bind_vport(
>> + const struct flow *md, struct ofpbuf *userdata)
>> + OVS_REQUIRES(pinctrl_mutex)
>> +{
>> + /* Get the datapath key from the packet metadata. */
>> + uint32_t dp_key = ntohll(md->metadata);
>> + uint32_t vport_parent_key = md->regs[MFF_LOG_INPORT - MFF_REG0];
>> +
>> + /* Get the virtual port key from the userdata buffer. */
>> + uint32_t *vport_key = ofpbuf_try_pull(userdata, sizeof *vport_key);
>> +
>> + if (!vport_key) {
>> + return;
>> + }
>> +
>> + uint32_t hash = hash_2words(dp_key, *vport_key);
>> +
>> + struct put_vport_binding *vpb
>> + = pinctrl_find_put_vport_binding(dp_key, *vport_key, hash);
>> + if (!vpb) {
>> + if (hmap_count(&put_vport_bindings) >= 1000) {
>> + COVERAGE_INC(pinctrl_drop_put_vport_binding);
>> + return;
>> + }
>> +
>> + vpb = xmalloc(sizeof *vpb);
>> + hmap_insert(&put_vport_bindings, &vpb->hmap_node, hash);
>> + }
>> +
>> + vpb->dp_key = dp_key;
>> + vpb->vport_key = *vport_key;
>> + vpb->vport_parent_key = vport_parent_key;
>> +
>> + notify_pinctrl_main();
>> +}
>> diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
>> index 4eacc44ed..318d80d5d 100644
>> --- a/ovn/lib/actions.c
>> +++ b/ovn/lib/actions.c
>> @@ -2599,6 +2599,64 @@ ovnact_check_pkt_larger_free(struct
>> ovnact_check_pkt_larger *cipl OVS_UNUSED)
>> {
>> }
>>
>> +static void
>> +parse_bind_vport(struct action_context *ctx)
>> +{
>> + if (!lexer_force_match(ctx->lexer, LEX_T_LPAREN)) {
>> + return;
>> + }
>> +
>> + if (ctx->lexer->token.type != LEX_T_STRING) {
>> + lexer_error(ctx->lexer,
>> + "bind_vport requires port name to be specified.");
>> + return;
>> + }
>> +
>> + struct ovnact_bind_vport *bind_vp =
>> ovnact_put_BIND_VPORT(ctx->ovnacts);
>> + bind_vp->vport = xstrdup(ctx->lexer->token.s);
>> + lexer_get(ctx->lexer);
>> + lexer_force_match(ctx->lexer, LEX_T_COMMA);
>> + action_parse_field(ctx, 0, false, &bind_vp->vport_parent);
>> + lexer_force_match(ctx->lexer, LEX_T_RPAREN);
>> +}
>> +
>> +static void
>> +format_BIND_VPORT(const struct ovnact_bind_vport *bind_vp,
>> + struct ds *s )
>> +{
>> + ds_put_format(s, "bind_vport(\"%s\", ", bind_vp->vport);
>> + expr_field_format(&bind_vp->vport_parent, s);
>> + ds_put_cstr(s, ");");
>> +}
>> +
>> +static void
>> +encode_BIND_VPORT(const struct ovnact_bind_vport *vp,
>> + const struct ovnact_encode_params *ep,
>> + struct ofpbuf *ofpacts)
>> +{
>> + uint32_t vport_key;
>> + if (!ep->lookup_port(ep->aux, vp->vport, &vport_key)) {
>> + return;
>> + }
>> +
>> + const struct arg args[] = {
>> + { expr_resolve_field(&vp->vport_parent), MFF_LOG_INPORT },
>> + };
>> + encode_setup_args(args, ARRAY_SIZE(args), ofpacts);
>> + size_t oc_offset =
>> encode_start_controller_op(ACTION_OPCODE_BIND_VPORT,
>> + false,
>> NX_CTLR_NO_METER,
>> + ofpacts);
>> + ofpbuf_put(ofpacts, &vport_key, sizeof(uint32_t));
>> + encode_finish_controller_op(oc_offset, ofpacts);
>> + encode_restore_args(args, ARRAY_SIZE(args), ofpacts);
>> +}
>> +
>> +static void
>> +ovnact_bind_vport_free(struct ovnact_bind_vport *bp)
>> +{
>> + free(bp->vport);
>> +}
>> +
>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>> static void
>> parse_set_action(struct action_context *ctx)
>> @@ -2706,6 +2764,8 @@ parse_action(struct action_context *ctx)
>> parse_set_meter_action(ctx);
>> } else if (lexer_match_id(ctx->lexer, "trigger_event")) {
>> parse_trigger_event(ctx, ovnact_put_TRIGGER_EVENT(ctx->ovnacts));
>> + } else if (lexer_match_id(ctx->lexer, "bind_vport")) {
>> + parse_bind_vport(ctx);
>> } else {
>> lexer_syntax_error(ctx->lexer, "expecting action");
>> }
>> diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c
>> index 0f07d80ac..de745d73f 100644
>> --- a/ovn/lib/ovn-util.c
>> +++ b/ovn/lib/ovn-util.c
>> @@ -326,6 +326,7 @@ static const char *OVN_NB_LSP_TYPES[] = {
>> "router",
>> "vtep",
>> "external",
>> + "virtual",
>> };
>>
>> bool
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index d2267de0e..6ff7aaff1 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -519,6 +519,34 @@
>> some additional flow cost for this and the value appears limited.
>> </li>
>>
>> + <li>
>> + <p>
>> + If inport <code>V</code> is of type <code>virtual</code> adds a
>> + priority-100 logical flow for each <var>P</var> configured in
>> the
>> + <ref table="Logical_Switch_Port"
>> column="options:virtual-parents"/>
>> + column with the match
>> + </p>
>> + <pre>
>> +<code>inport == <var>P</var> &&
>> !is_chassis_resident(<var>V</var>) && ((arp.op == 1 &&
>> arp.spa == <var>VIP</var> && arp.tpa == <var>VIP</var>) || (arp.op
>> == 2 && arp.spa == <var>VIP</var>))</code>
>> + </pre>
>> +
>> + <p>
>> + and applies the action
>> + </p>
>> + <pre>
>> +<code>bind_vport(<var>V</var>, inport);</code>
>> + </pre>
>> +
>> + <p>
>> + and advances the packet to the next table.
>> + </p>
>> +
>> + <p>
>> + Where <var>VIP</var> is the virtual ip configured in the column
>> + <ref table="Logical_Switch_Port" column="options:virtual-ip"/>.
>> + </p>
>> + </li>
>> +
>> <li>
>> <p>
>> Priority-50 flows that match ARP requests to each known IP
>> address
>> @@ -541,7 +569,8 @@ output;
>>
>> <p>
>> These flows are omitted for logical ports (other than router
>> ports or
>> - <code>localport</code> ports) that are down.
>> + <code>localport</code> ports) that are down and for logical
>> ports of
>> + type <code>virtual</code>.
>> </p>
>> </li>
>>
>> @@ -588,7 +617,8 @@ nd_na_router {
>>
>> <p>
>> These flows are omitted for logical ports (other than router
>> ports or
>> - <code>localport</code> ports) that are down.
>> + <code>localport</code> ports) that are down and for logical
>> ports of
>> + type <code>virtual</code>.
>> </p>
>> </li>
>>
>> @@ -2031,6 +2061,33 @@ next;
>> <code>eth.dst = <var>E</var>; next;</code>.
>> </p>
>>
>> + <p>
>> + For each virtual ip <var>A</var> configured on a logical port
>> + of type <code>virtual</code> and its virtual parent set in
>> + its corresponding <ref db="OVN_Southbound"
>> table="Port_Binding"/>
>> + record and the virtual parent with the Ethernet address
>> <var>E</var>
>> + and the virtual ip is reachable via the router port
>> <var>P</var>, a
>> + priority-100 flow with match <code>outport === <var>P</var>
>> + && reg0 == <var>A</var></code> has actions
>> + <code>eth.dst = <var>E</var>; next;</code>.
>> + </p>
>> +
>> + <p>
>> + For each virtual ip <var>A</var> configured on a logical port
>> + of type <code>virtual</code> and its virtual parent
>> <code>not</code>
>> + set in its corresponding
>> + <ref db="OVN_Southbound" table="Port_Binding"/>
>> + record and the virtual ip <var>A</var> is reachable via the
>> + router port <var>P</var>, a
>> + priority-100 flow with match <code>outport === <var>P</var>
>> + && reg0 == <var>A</var></code> has actions
>> + <code>eth.dst = <var>00:00:00:00:00:00</var>; next;</code>.
>> + This flow is added so that the ARP is always resolved for the
>> + virtual ip <var>A</var> by generating ARP request and
>> + <code>not</code> consulting the MAC_Binding table as it can
>> have
>> + incorrect value for the virtual ip <var>A</var>.
>> + </p>
>> +
>> <p>
>> For each IPv6 address <var>A</var> whose host is known to have
>> Ethernet address <var>E</var> on router port <var>P</var>, a
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index eb6c47cad..ae09cf338 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4878,96 +4878,146 @@ build_lswitch_flows(struct hmap *datapaths,
>> struct hmap *ports,
>> continue;
>> }
>>
>> - /*
>> - * Add ARP/ND reply flows if either the
>> - * - port is up or
>> - * - port type is router or
>> - * - port type is localport
>> - */
>> - if (!lsp_is_up(op->nbsp) && strcmp(op->nbsp->type, "router") &&
>> - strcmp(op->nbsp->type, "localport")) {
>> - continue;
>> - }
>> + if (!strcmp(op->nbsp->type, "virtual")) {
>> + /* Handle
>> + * - GARPs for virtual ip which belongs to a logical port
>> + * of type 'virtual' and bind that port.
>> + *
>> + * - ARP reply from the virtual ip which belongs to a
>> logical
>> + * port of type 'virtual' and bind that port.
>> + * */
>> + ovs_be32 ip;
>> + const char *virtual_ip = smap_get(&op->nbsp->options,
>> + "virtual-ip");
>> + const char *virtual_parents = smap_get(&op->nbsp->options,
>> + "virtual-parents");
>> + if (!virtual_ip || !virtual_parents ||
>> + !ip_parse(virtual_ip, &ip)) {
>> + continue;
>> + }
>>
>> - if (lsp_is_external(op->nbsp)) {
>> - continue;
>> - }
>> + char *tokstr = xstrdup(virtual_parents);
>> + char *save_ptr = NULL;
>> + char *vparent;
>> + for (vparent = strtok_r(tokstr, ",", &save_ptr); vparent !=
>> NULL;
>> + vparent = strtok_r(NULL, ",", &save_ptr)) {
>> + struct ovn_port *vp = ovn_port_find(ports, vparent);
>> + if (!vp || vp->od != op->od) {
>> + /* vparent name should be valid and it should belong
>> + * to the same logical switch. */
>> + continue;
>> + }
>>
>> - for (size_t i = 0; i < op->n_lsp_addrs; i++) {
>> - for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) {
>> ds_clear(&match);
>> - ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
>> - op->lsp_addrs[i].ipv4_addrs[j].addr_s);
>> + ds_put_format(&match, "inport == \"%s\" && "
>> + "!is_chassis_resident(%s) && "
>> + "((arp.op == 1 && arp.spa == %s && "
>> + "arp.tpa == %s) || (arp.op == 2 && "
>> + "arp.spa == %s))",
>> + vparent, op->json_key, virtual_ip,
>> virtual_ip,
>> + virtual_ip);
>> ds_clear(&actions);
>> ds_put_format(&actions,
>> - "eth.dst = eth.src; "
>> - "eth.src = %s; "
>> - "arp.op = 2; /* ARP reply */ "
>> - "arp.tha = arp.sha; "
>> - "arp.sha = %s; "
>> - "arp.tpa = arp.spa; "
>> - "arp.spa = %s; "
>> - "outport = inport; "
>> - "flags.loopback = 1; "
>> - "output;",
>> - op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
>> - op->lsp_addrs[i].ipv4_addrs[j].addr_s);
>> - ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
>> + "bind_vport(%s, inport); "
>> + "next;",
>> + op->json_key);
>> + ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
>> 100,
>> ds_cstr(&match), ds_cstr(&actions));
>> + }
>>
>> - /* Do not reply to an ARP request from the port that
>> owns the
>> - * address (otherwise a DHCP client that ARPs to check
>> for a
>> - * duplicate address will fail). Instead, forward it
>> the usual
>> - * way.
>> - *
>> - * (Another alternative would be to simply drop the
>> packet. If
>> - * everything is working as it is configured, then this
>> would
>> - * produce equivalent results, since no one should reply
>> to the
>> - * request. But ARPing for one's own IP address is
>> intended to
>> - * detect situations where the network is not working as
>> - * configured, so dropping the request would frustrate
>> that
>> - * intent.) */
>> - ds_put_format(&match, " && inport == %s", op->json_key);
>> - ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
>> 100,
>> - ds_cstr(&match), "next;");
>> + free(tokstr);
>> + } else {
>> + /*
>> + * Add ARP/ND reply flows if either the
>> + * - port is up or
>> + * - port type is router or
>> + * - port type is localport
>> + */
>> + if (!lsp_is_up(op->nbsp) && strcmp(op->nbsp->type, "router")
>> &&
>> + strcmp(op->nbsp->type, "localport")) {
>> + continue;
>> }
>>
>> - /* For ND solicitations, we need to listen for both the
>> - * unicast IPv6 address and its all-nodes multicast address,
>> - * but always respond with the unicast IPv6 address. */
>> - for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) {
>> - ds_clear(&match);
>> - ds_put_format(&match,
>> - "nd_ns && ip6.dst == {%s, %s} && nd.target ==
>> %s",
>> - op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> - op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
>> - op->lsp_addrs[i].ipv6_addrs[j].addr_s);
>> + if (lsp_is_external(op->nbsp)) {
>> + continue;
>> + }
>>
>> - ds_clear(&actions);
>> - ds_put_format(&actions,
>> - "%s { "
>> + for (size_t i = 0; i < op->n_lsp_addrs; i++) {
>> + for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs;
>> j++) {
>> + ds_clear(&match);
>> + ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
>> + op->lsp_addrs[i].ipv4_addrs[j].addr_s);
>> + ds_clear(&actions);
>> + ds_put_format(&actions,
>> + "eth.dst = eth.src; "
>> "eth.src = %s; "
>> - "ip6.src = %s; "
>> - "nd.target = %s; "
>> - "nd.tll = %s; "
>> + "arp.op = 2; /* ARP reply */ "
>> + "arp.tha = arp.sha; "
>> + "arp.sha = %s; "
>> + "arp.tpa = arp.spa; "
>> + "arp.spa = %s; "
>> "outport = inport; "
>> "flags.loopback = 1; "
>> - "output; "
>> - "};",
>> - !strcmp(op->nbsp->type, "router") ?
>> - "nd_na_router" : "nd_na",
>> - op->lsp_addrs[i].ea_s,
>> - op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> - op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> - op->lsp_addrs[i].ea_s);
>> - ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
>> - ds_cstr(&match), ds_cstr(&actions));
>> + "output;",
>> + op->lsp_addrs[i].ea_s, op->lsp_addrs[i].ea_s,
>> + op->lsp_addrs[i].ipv4_addrs[j].addr_s);
>> + ovn_lflow_add(lflows, op->od,
>> S_SWITCH_IN_ARP_ND_RSP, 50,
>> + ds_cstr(&match), ds_cstr(&actions));
>> +
>> + /* Do not reply to an ARP request from the port that
>> owns
>> + * the address (otherwise a DHCP client that ARPs to
>> check
>> + * for a duplicate address will fail). Instead,
>> forward
>> + * it the usual way.
>> + *
>> + * (Another alternative would be to simply drop the
>> packet.
>> + * If everything is working as it is configured,
>> then this
>> + * would produce equivalent results, since no one
>> should
>> + * reply to the request. But ARPing for one's own IP
>> + * address is intended to detect situations where the
>> + * network is not working as configured, so dropping
>> the
>> + * request would frustrate that intent.) */
>> + ds_put_format(&match, " && inport == %s",
>> op->json_key);
>> + ovn_lflow_add(lflows, op->od,
>> S_SWITCH_IN_ARP_ND_RSP, 100,
>> + ds_cstr(&match), "next;");
>> + }
>>
>> - /* Do not reply to a solicitation from the port that
>> owns the
>> - * address (otherwise DAD detection will fail). */
>> - ds_put_format(&match, " && inport == %s", op->json_key);
>> - ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
>> 100,
>> - ds_cstr(&match), "next;");
>> + /* For ND solicitations, we need to listen for both the
>> + * unicast IPv6 address and its all-nodes multicast
>> address,
>> + * but always respond with the unicast IPv6 address. */
>> + for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs;
>> j++) {
>> + ds_clear(&match);
>> + ds_put_format(&match,
>> + "nd_ns && ip6.dst == {%s, %s} && nd.target
>> == %s",
>> + op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> + op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
>> + op->lsp_addrs[i].ipv6_addrs[j].addr_s);
>> +
>> + ds_clear(&actions);
>> + ds_put_format(&actions,
>> + "%s { "
>> + "eth.src = %s; "
>> + "ip6.src = %s; "
>> + "nd.target = %s; "
>> + "nd.tll = %s; "
>> + "outport = inport; "
>> + "flags.loopback = 1; "
>> + "output; "
>> + "};",
>> + !strcmp(op->nbsp->type, "router") ?
>> + "nd_na_router" : "nd_na",
>> + op->lsp_addrs[i].ea_s,
>> + op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> + op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> + op->lsp_addrs[i].ea_s);
>> + ovn_lflow_add(lflows, op->od,
>> S_SWITCH_IN_ARP_ND_RSP, 50,
>> + ds_cstr(&match), ds_cstr(&actions));
>> +
>> + /* Do not reply to a solicitation from the port that
>> owns
>> + * the address (otherwise DAD detection will fail).
>> */
>> + ds_put_format(&match, " && inport == %s",
>> op->json_key);
>> + ovn_lflow_add(lflows, op->od,
>> S_SWITCH_IN_ARP_ND_RSP, 100,
>> + ds_cstr(&match), "next;");
>> + }
>> }
>> }
>> }
>> @@ -7504,7 +7554,8 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> hmap *ports,
>> 100, ds_cstr(&match),
>> ds_cstr(&actions));
>> }
>> }
>> - } else if (op->od->n_router_ports && strcmp(op->nbsp->type,
>> "router")) {
>> + } else if (op->od->n_router_ports && strcmp(op->nbsp->type,
>> "router")
>> + && strcmp(op->nbsp->type, "virtual")) {
>> /* This is a logical switch port that backs a VM or a
>> container.
>> * Extract its addresses. For each of the address, go
>> through all
>> * the router ports attached to the switch (to which this
>> port
>> @@ -7581,6 +7632,105 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct hmap *ports,
>> }
>> }
>> }
>> + } else if (op->od->n_router_ports && strcmp(op->nbsp->type,
>> "router")
>> + && !strcmp(op->nbsp->type, "virtual")) {
>> + /* This is a virtual port. Add ARP replies for the virtual
>> ip with
>> + * the mac of the present active virtual parent.
>> + * If the logical port doesn't have virtual parent set in
>> + * Port_Binding table, then add the flow to set eth.dst to
>> + * 00:00:00:00:00:00 and advance to next table so that ARP is
>> + * resolved by router pipeline using the arp{} action.
>> + * The MAC_Binding entry for the virtual ip might be
>> invalid. */
>> + ovs_be32 ip;
>> +
>> + const char *vip = smap_get(&op->nbsp->options,
>> + "virtual-ip");
>> + const char *virtual_parents = smap_get(&op->nbsp->options,
>> + "virtual-parents");
>> + if (!vip || !virtual_parents ||
>> + !ip_parse(vip, &ip) || !op->sb) {
>> + continue;
>> + }
>> +
>> + if (!op->sb->virtual_parent || !op->sb->virtual_parent[0] ||
>> + !op->sb->chassis) {
>> + /* The virtual port is not claimed yet. */
>> + for (size_t i = 0; i < op->od->n_router_ports; i++) {
>> + const char *peer_name = smap_get(
>> + &op->od->router_ports[i]->nbsp->options,
>> + "router-port");
>> + if (!peer_name) {
>> + continue;
>> + }
>> +
>> + struct ovn_port *peer = ovn_port_find(ports,
>> peer_name);
>> + if (!peer || !peer->nbrp) {
>> + continue;
>> + }
>> +
>> + if (find_lrp_member_ip(peer, vip)) {
>> + ds_clear(&match);
>> + ds_put_format(&match, "outport == %s && reg0 ==
>> %s",
>> + peer->json_key, vip);
>> +
>> + ds_clear(&actions);
>> + ds_put_format(&actions,
>> + "eth.dst = 00:00:00:00:00:00;
>> next;");
>> + ovn_lflow_add(lflows, peer->od,
>> + S_ROUTER_IN_ARP_RESOLVE, 100,
>> + ds_cstr(&match),
>> ds_cstr(&actions));
>> + break;
>> + }
>> + }
>> + } else {
>> + struct ovn_port *vp =
>> + ovn_port_find(ports, op->sb->virtual_parent);
>> + if (!vp || !vp->nbsp) {
>> + continue;
>> + }
>> +
>> + for (size_t i = 0; i < vp->n_lsp_addrs; i++) {
>> + bool found_vip_network = false;
>> + const char *ea_s = vp->lsp_addrs[i].ea_s;
>> + for (size_t j = 0; j < vp->od->n_router_ports; j++) {
>> + /* Get the Logical_Router_Port that the
>> + * Logical_Switch_Port is connected to, as
>> + * 'peer'. */
>> + const char *peer_name = smap_get(
>> + &vp->od->router_ports[j]->nbsp->options,
>> + "router-port");
>> + if (!peer_name) {
>> + continue;
>> + }
>> +
>> + struct ovn_port *peer =
>> + ovn_port_find(ports, peer_name);
>> + if (!peer || !peer->nbrp) {
>> + continue;
>> + }
>> +
>> + if (!find_lrp_member_ip(peer, vip)) {
>> + continue;
>> + }
>> +
>> + ds_clear(&match);
>> + ds_put_format(&match, "outport == %s && reg0 ==
>> %s",
>> + peer->json_key, vip);
>> +
>> + ds_clear(&actions);
>> + ds_put_format(&actions, "eth.dst = %s; next;",
>> ea_s);
>> + ovn_lflow_add(lflows, peer->od,
>> + S_ROUTER_IN_ARP_RESOLVE, 100,
>> + ds_cstr(&match),
>> ds_cstr(&actions));
>> + found_vip_network = true;
>> + break;
>> + }
>> +
>> + if (found_vip_network) {
>> + break;
>> + }
>> + }
>> + }
>> } else if (!strcmp(op->nbsp->type, "router")) {
>> /* This is a logical switch port that connects to a router.
>> */
>>
>> @@ -9256,6 +9406,8 @@ main(int argc, char *argv[])
>> &sbrec_port_binding_col_gateway_chassis);
>> ovsdb_idl_add_column(ovnsb_idl_loop.idl,
>> &sbrec_port_binding_col_ha_chassis_group);
>> + ovsdb_idl_add_column(ovnsb_idl_loop.idl,
>> + &sbrec_port_binding_col_virtual_parent);
>> ovsdb_idl_add_column(ovnsb_idl_loop.idl,
>> &sbrec_gateway_chassis_col_chassis);
>> ovsdb_idl_add_column(ovnsb_idl_loop.idl,
>> &sbrec_gateway_chassis_col_name);
>> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>> index 57b6edbf8..f5f10a5c1 100644
>> --- a/ovn/ovn-nb.xml
>> +++ b/ovn/ovn-nb.xml
>> @@ -465,6 +465,31 @@
>> </li>
>> </ul>
>> </dd>
>> +
>> + <dt><code>virtual</code></dt>
>> + <dd>
>> + <p>
>> + Represents a logical port which does not have an OVS
>> + port in the integration bridge and has a virtual ip
>> configured
>> + in the <ref column="options:virtual-ip"/> column. This
>> virtual ip
>> + can move around between the logical ports configured in
>> + the <ref column="options:virtual-parents"/> column.
>> + </p>
>> +
>> + <p>
>> + One of the use case where <code>virtual</code>
>> + ports can be used is.
>> + </p>
>> +
>> + <ul>
>> + <li>
>> + The <code>virtual ip</code> represents a load balancer
>> vip
>> + and the <code>virtual parents</code> provide load
>> balancer
>> + service in an active-standby setup with the active
>> virtual
>> + parent owning the <code>virtual ip</code>.
>> + </li>
>> + </ul>
>> + </dd>
>> </dl>
>> </column>
>> </group>
>> @@ -618,6 +643,26 @@
>> interface, in bits.
>> </column>
>> </group>
>> +
>> + <group title="Virtual port Options">
>> + <p>
>> + These options apply when <ref column="type"/> is
>> + <code>virtual</code>.
>> + </p>
>> +
>> + <column name="options" key="virtual-ip">
>> + This option represents the virtual IPv4 address.
>> + </column>
>> +
>> + <column name="options" key="virtual-parents">
>> + This options represents a set of logical port names (with in
>> the same
>> + logical switch) which can own the <code>virtual ip</code>
>> configured
>> + in the <ref column="options:virtual-ip"/>. All these virtual
>> parents
>> + should add the <code>virtual ip</code> in the
>> + <ref column="port_security"/> if port security addressed are
>> enabled.
>> + </column>
>> + </group>
>> +
>> </group>
>>
>> <group title="Containers">
>> diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
>> index 2b7bc57a7..5c013b17e 100644
>> --- a/ovn/ovn-sb.ovsschema
>> +++ b/ovn/ovn-sb.ovsschema
>> @@ -1,7 +1,7 @@
>> {
>> "name": "OVN_Southbound",
>> - "version": "2.4.0",
>> - "cksum": "3059284885 20260",
>> + "version": "2.5.0",
>> + "cksum": "1257419092 20387",
>> "tables": {
>> "SB_Global": {
>> "columns": {
>> @@ -173,6 +173,8 @@
>> "minInteger": 1,
>> "maxInteger": 4095},
>> "min": 0, "max": 1}},
>> + "virtual_parent": {"type": {"key": "string", "min": 0,
>> + "max": 1}},
>> "chassis": {"type": {"key": {"type": "uuid",
>> "refTable": "Chassis",
>> "refType": "weak"},
>> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
>> index 544a071fa..17c45bbac 100644
>> --- a/ovn/ovn-sb.xml
>> +++ b/ovn/ovn-sb.xml
>> @@ -2017,6 +2017,24 @@ tcp.flags = RST;
>> </p>
>> <p><b>Prerequisite:</b> <code>igmp</code></p>
>> </dd>
>> +
>> + <dt><code>bind_vport(<var>V</var>, <var>P</var>);</code></dt>
>> + <dd>
>> + <p>
>> + <b>Parameters</b>: logical port string field <var>V</var>
>> + of type <code>virtual</code>, logical port string field
>> + <var>P</var>.
>> + </p>
>> +
>> + <p>
>> + Binds the virtual logical port <var>V</var> and sets the
>> + <ref table="Port_Binding" column="chassis"/> column and
>> + <ref table="Port_Binding" column="virtual_parent"/> of
>> + the table <ref table="Port_Binding"/>.
>> + <ref table="Port_Binding" column="virtual_parent"/> is
>> + set to <var>P</var>.
>> + </p>
>> + </dd>
>> </dl>
>> </column>
>>
>> @@ -2480,6 +2498,13 @@ tcp.flags = RST;
>> the <code>outport</code> will be reset to the value of the
>> distributed port.
>> </dd>
>> +
>> + <dt><code>virtual</code></dt>
>> + <dd>
>> + Represents a logical port with an <code>virtual ip</code>.
>> + This <code>virtual ip</code> can be configured on a
>> + logical port (which is refered as virtual parent).
>> + </dd>
>> </dl>
>> </column>
>> </group>
>> @@ -2720,6 +2745,27 @@ tcp.flags = RST;
>> </column>
>> </group>
>>
>> + <group title="Virtual ports">
>> + <column name="virtual_parent">
>> + <p>
>> + This column is set by <code>ovn-controller</code> with one of
>> the
>> + value from the
>> + <ref table="Logical_Switch_Port"
>> column="options:virtual-parents"
>> + db="OVN_Northbound"/> in the OVN_Northbound database's
>> + <ref table="Logical_Switch_Port" db="OVN_Northbound"/> table
>> + when the OVN action <code>bind_vport</code> is executed.
>> + <code>ovn-controller</code> also sets the
>> + <ref column="chassis"/> column when it executes this action
>> + with its chassis id.
>> + </p>
>> +
>> + <p>
>> + <code>ovn-controller</code> sets this column only if the
>> + <ref column="type"/> is "virtual".
>> + </p>
>> + </column>
>> + </group>
>> +
>> <group title="Naming">
>> <column name="external_ids" key="name">
>> <p>
>> diff --git a/ovn/utilities/ovn-trace.c b/ovn/utilities/ovn-trace.c
>> index 044eb1cc2..b532b8eaf 100644
>> --- a/ovn/utilities/ovn-trace.c
>> +++ b/ovn/utilities/ovn-trace.c
>> @@ -2144,6 +2144,9 @@ trace_actions(const struct ovnact *ovnacts, size_t
>> ovnacts_len,
>>
>> case OVNACT_CHECK_PKT_LARGER:
>> break;
>> +
>> + case OVNACT_BIND_VPORT:
>> + break;
>> }
>> }
>> ds_destroy(&s);
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index cb380d275..2837be167 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -1368,6 +1368,15 @@ reg0 = check_pkt_larger(foo);
>> reg0[0] = check_pkt_larger(foo);
>> Syntax error at `foo' expecting `;'.
>>
>> +# bind_vport
>> +# lsp1's port key is 0x11.
>> +bind_vport("lsp1", inport);
>> + encodes as controller(userdata=00.00.00.11.00.00.00.00.11.00.00.00)
>> +
>> +# lsp2 doesn't exist. So it should be encoded as drop.
>> +bind_vport("lsp2", inport);
>> + encodes as drop
>> +
>> # Miscellaneous negative tests.
>> ;
>> Syntax error at `;'.
>> @@ -14345,6 +14354,278 @@ OVN_CLEANUP([hv1],[hv2])
>>
>> AT_CLEANUP
>>
>> +AT_SETUP([ovn -- virtual ports])
>> +AT_KEYWORDS([virtual ports])
>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> +ovn_start
>> +
>> +send_garp() {
>> + local hv=$1 inport=$2 eth_src=$3 eth_dst=$4 spa=$5 tpa=$6
>> + local
>> request=${eth_dst}${eth_src}08060001080006040001${eth_src}${spa}${eth_dst}${tpa}
>> + as hv$hv ovs-appctl netdev-dummy/receive hv${hv}-vif$inport $request
>> +}
>> +
>> +send_arp_reply() {
>> + local hv=$1 inport=$2 eth_src=$3 eth_dst=$4 spa=$5 tpa=$6
>> + local
>> request=${eth_dst}${eth_src}08060001080006040002${eth_src}${spa}${eth_dst}${tpa}
>> + as hv$hv ovs-appctl netdev-dummy/receive hv${hv}-vif$inport $request
>> +}
>> +
>> +net_add n1
>> +
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> + set interface hv1-vif1 external-ids:iface-id=sw0-p1 \
>> + options:tx_pcap=hv1/vif1-tx.pcap \
>> + options:rxq_pcap=hv1/vif1-rx.pcap \
>> + ofport-request=1
>> +ovs-vsctl -- add-port br-int hv1-vif2 -- \
>> + set interface hv1-vif2 external-ids:iface-id=sw0-p3 \
>> + options:tx_pcap=hv1/vif2-tx.pcap \
>> + options:rxq_pcap=hv1/vif2-rx.pcap \
>> + ofport-request=2
>> +
>> +sim_add hv2
>> +as hv2
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.2
>> +ovs-vsctl -- add-port br-int hv2-vif1 -- \
>> + set interface hv2-vif1 external-ids:iface-id=sw0-p2 \
>> + options:tx_pcap=hv2/vif1-tx.pcap \
>> + options:rxq_pcap=hv2/vif1-rx.pcap \
>> + ofport-request=1
>> +ovs-vsctl -- add-port br-int hv2-vif2 -- \
>> + set interface hv2-vif2 external-ids:iface-id=sw1-p1 \
>> + options:tx_pcap=hv2/vif2-tx.pcap \
>> + options:rxq_pcap=hv2/vif2-rx.pcap \
>> + ofport-request=2
>> +
>> +ovn-nbctl ls-add sw0
>> +
>> +ovn-nbctl lsp-add sw0 sw0-vir
>> +ovn-nbctl lsp-set-addresses sw0-vir "50:54:00:00:00:10 10.0.0.10"
>> +ovn-nbctl lsp-set-port-security sw0-vir "50:54:00:00:00:10 10.0.0.10"
>> +ovn-nbctl lsp-set-type sw0-vir virtual
>> +ovn-nbctl set logical_switch_port sw0-vir options:virtual-ip=10.0.0.10
>> +ovn-nbctl set logical_switch_port sw0-vir
>> options:virtual-parents=sw0-p1,sw0-p2
>> +
>> +ovn-nbctl lsp-add sw0 sw0-p1
>> +ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3"
>> +ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3
>> 10.0.0.10"
>> +
>> +ovn-nbctl lsp-add sw0 sw0-p2
>> +ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4"
>> +ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4
>> 10.0.0.10"
>> +
>> +ovn-nbctl lsp-add sw0 sw0-p3
>> +ovn-nbctl lsp-set-addresses sw0-p3 "50:54:00:00:00:05 10.0.0.5"
>> +ovn-nbctl lsp-set-port-security sw0-p3 "50:54:00:00:00:05 10.0.0.5"
>> +
>> +# Create the second logical switch with one port
>> +ovn-nbctl ls-add sw1
>> +ovn-nbctl lsp-add sw1 sw1-p1
>> +ovn-nbctl lsp-set-addresses sw1-p1 "40:54:00:00:00:03 20.0.0.3"
>> +ovn-nbctl lsp-set-port-security sw1-p1 "40:54:00:00:00:03 20.0.0.3"
>> +
>> +# Create a logical router and attach both logical switches
>> +ovn-nbctl lr-add lr0
>> +ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
>> +ovn-nbctl lsp-add sw0 sw0-lr0
>> +ovn-nbctl lsp-set-type sw0-lr0 router
>> +ovn-nbctl lsp-set-addresses sw0-lr0 00:00:00:00:ff:01
>> +ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
>> +
>> +ovn-nbctl lrp-add lr0 lr0-sw1 00:00:00:00:ff:02 20.0.0.1/24
>> +ovn-nbctl lsp-add sw1 sw1-lr0
>> +ovn-nbctl lsp-set-type sw1-lr0 router
>> +ovn-nbctl lsp-set-addresses sw1-lr0 00:00:00:00:ff:02
>> +ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1
>> +
>> +OVN_POPULATE_ARP
>> +ovn-nbctl --wait=hv sync
>> +
>> +# Check that logical flows are added for sw0-vir in lsp_in_arp_rsp
>> pipeline
>> +# with bind_vport action.
>> +
>> +ovn-sbctl dump-flows sw0 | grep ls_in_arp_rsp | grep bind_vport >
>> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=11(ls_in_arp_rsp ), priority=100 , match=(inport ==
>> "sw0-p1" && !is_chassis_resident("sw0-vir") && ((arp.op == 1 && arp.spa ==
>> 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa ==
>> 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
>> + table=11(ls_in_arp_rsp ), priority=100 , match=(inport ==
>> "sw0-p2" && !is_chassis_resident("sw0-vir") && ((arp.op == 1 && arp.spa ==
>> 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa ==
>> 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
>> +])
>> +
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +# Since the sw0-vir is not claimed by any chassis, eth.dst should be set
>> to
>> +# zero if the ip4.dst is the virtual ip in the router pipeline.
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=9 (lr_in_arp_resolve ), priority=100 , match=(outport ==
>> "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;)
>> +])
>> +
>> +ip_to_hex() {
>> + printf "%02x%02x%02x%02x" "$@"
>> +}
>> +
>> +hv1_ch_uuid=`ovn-sbctl --bare --columns _uuid find chassis name="hv1"`
>> +hv2_ch_uuid=`ovn-sbctl --bare --columns _uuid find chassis name="hv2"`
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
>> +logical_port=sw0-vir) = x], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = x])
>> +
>> +# From sw0-p0 send GARP for 10.0.0.10. hv1 should claim sw0-vir
>> +# and sw0-p1 should be its virtual_parent.
>> +eth_src=505400000003
>> +eth_dst=ffffffffffff
>> +spa=$(ip_to_hex 10 0 0 10)
>> +tpa=$(ip_to_hex 10 0 0 10)
>> +send_garp 1 1 $eth_src $eth_dst $spa $tpa
>> +
>> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
>> port_binding \
>> +logical_port=sw0-vir) = x$hv1_ch_uuid], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = xsw0-p1])
>> +
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +# There should be an arp resolve flow to resolve the virtual_ip with the
>> +# sw0-p1's MAC.
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=9 (lr_in_arp_resolve ), priority=100 , match=(outport ==
>> "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;)
>> +])
>> +
>> +# send the garp from sw0-p2 (in hv2). hv2 should claim sw0-vir
>> +# and sw0-p2 shpuld be its virtual_parent.
>> +eth_src=505400000004
>> +eth_dst=ffffffffffff
>> +spa=$(ip_to_hex 10 0 0 10)
>> +tpa=$(ip_to_hex 10 0 0 10)
>> +send_garp 2 1 $eth_src $eth_dst $spa $tpa
>> +
>> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
>> port_binding \
>> +logical_port=sw0-vir) = x$hv2_ch_uuid], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = xsw0-p2])
>> +
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +# There should be an arp resolve flow to resolve the virtual_ip with the
>> +# sw0-p2's MAC.
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=9 (lr_in_arp_resolve ), priority=100 , match=(outport ==
>> "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;)
>> +])
>> +
>> +# Now send arp reply from sw0-p1. hv1 should claim sw0-vir
>> +# and sw0-p1 shpuld be its virtual_parent.
>> +eth_src=505400000003
>> +eth_dst=ffffffffffff
>> +spa=$(ip_to_hex 10 0 0 10)
>> +tpa=$(ip_to_hex 10 0 0 4)
>> +send_arp_reply 1 1 $eth_src $eth_dst $spa $tpa
>> +
>> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
>> port_binding \
>> +logical_port=sw0-vir) = x$hv1_ch_uuid], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = xsw0-p1])
>> +
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=9 (lr_in_arp_resolve ), priority=100 , match=(outport ==
>> "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;)
>> +])
>> +
>> +# Delete hv1-vif1 port. hv1 should release sw0-vir
>> +as hv1 ovs-vsctl del-port hv1-vif1
>> +
>> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
>> port_binding \
>> +logical_port=sw0-vir) = x], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = x])
>> +
>> +# Since the sw0-vir is not claimed by any chassis, eth.dst should be set
>> to
>> +# zero if the ip4.dst is the virtual ip.
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=9 (lr_in_arp_resolve ), priority=100 , match=(outport ==
>> "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;)
>> +])
>> +
>> +# Now send arp reply from sw0-p2. hv2 should claim sw0-vir
>> +# and sw0-p2 shpuld be its virtual_parent.
>> +eth_src=505400000004
>> +eth_dst=ffffffffffff
>> +spa=$(ip_to_hex 10 0 0 10)
>> +tpa=$(ip_to_hex 10 0 0 3)
>> +send_arp_reply 2 1 $eth_src $eth_dst $spa $tpa
>> +
>> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
>> port_binding \
>> +logical_port=sw0-vir) = x$hv2_ch_uuid], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = xsw0-p2])
>> +
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=9 (lr_in_arp_resolve ), priority=100 , match=(outport ==
>> "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;)
>> +])
>> +
>> +# Delete sw0-p2 logical port
>> +ovn-nbctl lsp-del sw0-p2
>> +
>> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find
>> port_binding \
>> +logical_port=sw0-vir) = x], [0], [])
>> +
>> +AT_CHECK([test x$(ovn-sbctl --bare --columns virtual_parent find
>> port_binding \
>> +logical_port=sw0-vir) = x])
>> +
>> +# Clear virtual_ip column of sw0-vir. There should be no bind_vport
>> flows.
>> +ovn-nbctl --wait=hv remove logical_switch_port sw0-vir options virtual-ip
>> +
>> +ovn-sbctl dump-flows sw0 | grep ls_in_arp_rsp | grep bind_vport >
>> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> +])
>> +
>> +# Add back virtual_ip and clear virtual_parents.
>> +ovn-nbctl --wait=hv set logical_switch_port sw0-vir
>> options:virtual-ip=10.0.0.10
>> +
>> +ovn-sbctl dump-flows sw0 | grep ls_in_arp_rsp | grep bind_vport >
>> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> + table=11(ls_in_arp_rsp ), priority=100 , match=(inport ==
>> "sw0-p1" && !is_chassis_resident("sw0-vir") && ((arp.op == 1 && arp.spa ==
>> 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa ==
>> 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
>> +])
>> +
>> +ovn-nbctl --wait=hv remove logical_switch_port sw0-vir options
>> virtual-parents
>> +ovn-sbctl dump-flows sw0 | grep ls_in_arp_rsp | grep bind_vport >
>> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> +])
>> +
>> +ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 ==
>> 10.0.0.10" \
>> +> lflows.txt
>> +
>> +AT_CHECK([cat lflows.txt], [0], [dnl
>> +])
>> +
>> +OVN_CLEANUP([hv1], [hv2])
>> +AT_CLEANUP
>> +
>> # Run ovn-nbctl in daemon mode, change to a backup database and verify
>> that
>> # an insert operation is not allowed.
>> AT_SETUP([ovn -- can't write to a backup database server instance])
>> diff --git a/tests/test-ovn.c b/tests/test-ovn.c
>> index 0b9e8246e..cf1bc5432 100644
>> --- a/tests/test-ovn.c
>> +++ b/tests/test-ovn.c
>> @@ -1253,6 +1253,7 @@ test_parse_actions(struct ovs_cmdl_context *ctx
>> OVS_UNUSED)
>> simap_put(&ports, "eth0", 5);
>> simap_put(&ports, "eth1", 6);
>> simap_put(&ports, "LOCAL", ofp_to_u16(OFPP_LOCAL));
>> + simap_put(&ports, "lsp1", 0x11);
>>
>> ds_init(&input);
>> while (!ds_get_test_line(&input, stdin)) {
>> --
>> 2.21.0
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev