Load_Balancer health checks were silently failing for baremetal pool
members whose backend Logical_Switch_Port has type=external on a
Logical_Switch that has a localnet port (typical for Neutron and the
ovn-octavia-provider baremetal driver on a provider VLAN).
When the controller emits a health-check probe it stamps the source
MAC with $svc_monitor_mac (or the LRP MAC) and sends it out the LRP.
The reply from the baremetal member re-enters br-int via the localnet
port, so at S_SWITCH_IN_L2_LKUP MFF_LOG_INPORT carries the localnet
LSP's tunnel_key, not the backend LSP's. The per-backend reply
lflow's match "inport == <backend> && ..." never fires, and even the
generic per-LS "eth.dst == $svc_monitor_mac" lflow that calls
handle_svc_check(inport) feeds pinctrl the localnet tunnel_key,
which pinctrl_find_svc_monitor() cannot resolve to a service monitor.
The CMS therefore concludes the member is down.
Fix the issue inside the logical switch ingress pipeline at the
earliest possible stage so that every downstream lflow (HM reply,
ARP responder, DHCP, FDB, ACL, ...) observes a sensible inport for
the external LSP, as suggested by Numan Siddique.
For each external LSP that lives on a switch with a localnet port,
install a new lflow at S_SWITCH_IN_CHECK_PORT_SEC priority 75:
match : inport == <localnet_port> && eth.src == <external_mac>
action: flags.localnet = 1; inport = <external_lsp>; next;
The match is specific enough (combined eth.src + localnet inport)
that it does not affect any other localnet traffic. Setting
flags.localnet here preserves the semantics that
build_lswitch_from_localnet_op() and build_lswitch_learn_fdb_op()
previously provided at S_SWITCH_IN_LOOKUP_FDB, which would no longer
fire for external-LSP-sourced packets after the rewrite.
With this rewrite in place:
* The original per-backend HM reply lflow at S_SWITCH_IN_L2_LKUP
("inport == <backend>" / "handle_svc_check(inport);") works
without modification.
* The generic per-LS "eth.dst == $svc_monitor_mac" lflow uses the
backend LSP's tunnel_key when calling handle_svc_check(inport),
so pinctrl_find_svc_monitor() succeeds for the
$svc_monitor_mac-sourced probe case as well.
Two follow-on adjustments are required because they depended on
MFF_LOG_INPORT being the localnet port for external-LSP traffic:
* build_lswitch_dhcp_options_and_response() now calls
build_dhcpv4/v6_options_flows() with op (the external LSP) as
the inport for is_external, eliminating the previous
per-localnet-port enumeration.
* build_drop_arp_nd_flows_for_unbound_router_ports() now matches
on op->json_key (the external LSP) instead of the localnet
port.
tests/ovn-northd.at gains a unit test that exercises a regular VIF
backend on a tenant LS and a type=external backend on a provider LS
with a localnet, asserting the original HM reply lflow forms and the
new inport-rewrite lflow.
tests/ovn.at "external logical port" is updated to assert that the
DHCPv4/v6 controller OF flows installed for an external port carry
reg14 == external_lsp_key (the rewritten inport), not the localnet
port's key.
Signed-off-by: JayGue Lee <[email protected]>
---
NEWS | 12 +++++
northd/northd.c | 96 +++++++++++++++++++++++++++++---------
tests/ovn-northd.at | 111 ++++++++++++++++++++++++++++++++++++++++++++
tests/ovn.at | 16 +++++--
4 files changed, 207 insertions(+), 28 deletions(-)
diff --git a/NEWS b/NEWS
index 68cdbff..b5f514d 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,18 @@ Post v26.03.0
* Add ECMP/multi-homing support for EVPN FDB entries. FDB entries
backed by a kernel nexthop group are load-balanced via OpenFlow
select groups with weighted buckets.
+ - Fixed Load_Balancer health check replies failing silently for
+ baremetal pool members whose backend LSP is type=external on a
+ Logical_Switch that has a localnet port. ovn-northd now installs
+ an early inport-rewrite lflow at ls_in_check_port_sec that
+ substitutes MFF_LOG_INPORT from the localnet port to the external
+ LSP when eth.src matches the external port's MAC, so every
+ downstream pipeline stage (including the per-backend HM reply
+ lflow and the generic per-LS svc_monitor_mac lflow) observes
+ inport == <external_lsp> and pinctrl_find_svc_monitor() succeeds.
+ The DHCP and unbound-router ARP/ND drop lflows for external
+ ports were updated to key on the external LSP's inport
+ accordingly.
OVN v26.03.0 - xxx xx xxxx
--------------------------
diff --git a/northd/northd.c b/northd/northd.c
index 8305e04..9d889ae 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8876,7 +8876,7 @@ build_lb_health_check_response_lflows(
const struct ovn_datapaths *lr_datapaths,
const struct shash *meter_groups,
struct ds *match,
- struct ds *action)
+ struct ds *action OVS_UNUSED)
{
/* For each LB backend that is monitored by a source_ip belonging
* to a real LRP, install rule that punts service check replies to the
@@ -8923,7 +8923,6 @@ build_lb_health_check_response_lflows(
}
ds_clear(match);
- ds_clear(action);
/* icmp6 type 1 and icmp4 type 3 are included in the match, because
* the controller is using them to detect unreachable ports. */
@@ -10100,6 +10099,13 @@
build_drop_arp_nd_flows_for_unbound_router_ports(struct ovn_port *op,
{
struct ds match = DS_EMPTY_INITIALIZER;
+ /* v6 (Numan-alt): with the early inport rewrite installed at
+ * S_SWITCH_IN_CHECK_PORT_SEC, packets from the external LSP arrive
+ * here with MFF_LOG_INPORT == op (the external LSP), not the
+ * localnet port (which was the value at table 0). The match is
+ * therefore keyed on op->json_key. The 'port' (localnet) argument
+ * is still used for incremental processing tagging through
+ * WITH_IO_PORT below. */
for (size_t i = 0; i < op->n_lsp_addrs; i++) {
struct ovn_port *rp;
VECTOR_FOR_EACH (&op->od->router_ports, rp) {
@@ -10110,7 +10116,7 @@ build_drop_arp_nd_flows_for_unbound_router_ports(struct
ovn_port *op,
&match, "inport == %s && eth.src == %s"
" && !is_chassis_resident(%s)"
" && arp.tpa == %s && arp.op == 1",
- port->json_key,
+ op->json_key,
op->lsp_addrs[i].ea_s, op->json_key,
rp->lsp_addrs[k].ipv4_addrs[l].addr_s);
ovn_lflow_add(lflows, op->od, S_SWITCH_IN_EXTERNAL_PORT,
@@ -10126,7 +10132,7 @@ build_drop_arp_nd_flows_for_unbound_router_ports(struct
ovn_port *op,
&match, "inport == %s && eth.src == %s"
" && !is_chassis_resident(%s)"
" && nd_ns && ip6.dst == {%s, %s} && nd.target == %s",
- port->json_key,
+ op->json_key,
op->lsp_addrs[i].ea_s, op->json_key,
rp->lsp_addrs[k].ipv6_addrs[l].addr_s,
rp->lsp_addrs[k].ipv6_addrs[l].sn_addr_s,
@@ -10144,7 +10150,7 @@ build_drop_arp_nd_flows_for_unbound_router_ports(struct
ovn_port *op,
&match, "inport == %s && eth.src == %s"
" && eth.dst == %s"
" && !is_chassis_resident(%s)",
- port->json_key,
+ op->json_key,
op->lsp_addrs[i].ea_s, rp->lsp_addrs[k].ea_s,
op->json_key);
ovn_lflow_add(lflows, op->od, S_SWITCH_IN_EXTERNAL_PORT, 100,
@@ -10929,24 +10935,21 @@ build_lswitch_dhcp_options_and_response(struct
ovn_port *op,
}
for (size_t i = 0; i < op->n_lsp_addrs; i++) {
- if (is_external) {
- struct ovn_port *lp;
- VECTOR_FOR_EACH (&op->od->localnet_ports, lp) {
- build_dhcpv4_options_flows(
- op, &op->lsp_addrs[i], lp, is_external,
- meter_groups, lflows, op->lflow_ref);
- build_dhcpv6_options_flows(
- op, &op->lsp_addrs[i], lp, is_external,
- meter_groups, lflows, op->lflow_ref);
- }
- } else {
- build_dhcpv4_options_flows(op, &op->lsp_addrs[i], op,
- is_external, meter_groups,
- lflows, op->lflow_ref);
- build_dhcpv6_options_flows(op, &op->lsp_addrs[i], op,
- is_external, meter_groups,
- lflows, op->lflow_ref);
- }
+ /* v6 (Numan-alt): for both regular VIF and type=external LSPs we
+ * now pass the LSP itself (op) as the inport. For external
+ * ports, the inport rewrite added in
+ * build_lswitch_external_lsp_inport_rewrite() at
+ * S_SWITCH_IN_CHECK_PORT_SEC has already substituted
+ * MFF_LOG_INPORT from the localnet port to the external LSP by
+ * the time we reach S_SWITCH_IN_DHCP_OPTIONS. So a single set
+ * of dhcp lflows keyed on the external LSP is enough; we no
+ * longer need to enumerate every localnet port. */
+ build_dhcpv4_options_flows(op, &op->lsp_addrs[i], op,
+ is_external, meter_groups,
+ lflows, op->lflow_ref);
+ build_dhcpv6_options_flows(op, &op->lsp_addrs[i], op,
+ is_external, meter_groups,
+ lflows, op->lflow_ref);
}
}
@@ -11025,6 +11028,52 @@ build_lswitch_external_port(struct ovn_port *op,
}
}
+/* PoC v6 (Numan-alt): for each external LSP on a switch with a
+ * localnet port, rewrite MFF_LOG_INPORT from the localnet port to the
+ * external LSP when eth.src matches one of the external port's MACs.
+ * This makes downstream stages observe inport == <external_lsp> for
+ * traffic originating from that baremetal MAC. Intentionally placed
+ * at S_SWITCH_IN_CHECK_PORT_SEC priority 75 so it fires before the
+ * existing priority-70 generic port-sec rules but does not collide
+ * with the priority-100 disabled-port drop. */
+static void
+build_lswitch_external_lsp_inport_rewrite(struct ovn_port *op,
+ struct lflow_table *lflows,
+ struct ds *match,
+ struct ds *actions)
+{
+ ovs_assert(op->nbsp);
+ if (!lsp_is_external(op->nbsp)) {
+ return;
+ }
+ if (!ls_has_localnet_port(op->od)) {
+ return;
+ }
+ /* v6 (Numan-alt) Phase 4: also set flags.localnet here. The
+ * existing S_SWITCH_IN_LOOKUP_FDB lflow generated by
+ * build_lswitch_learn_fdb_op() sets flags.localnet = 1 only when
+ * inport == <localnet> at that table; once we have rewritten
+ * inport to the external LSP, that match no longer fires. Copy
+ * the assignment into our rewrite action so downstream stages
+ * keyed on flags.localnet == 1 (e.g. some HM reply matches in the
+ * pre-v5 codebase) continue to work for the external LSP case. */
+ struct ovn_port *lp;
+ VECTOR_FOR_EACH (&op->od->localnet_ports, lp) {
+ for (size_t i = 0; i < op->n_lsp_addrs; i++) {
+ ds_clear(match);
+ ds_clear(actions);
+ ds_put_format(match, "inport == %s && eth.src == %s",
+ lp->json_key, op->lsp_addrs[i].ea_s);
+ ds_put_format(actions,
+ "flags.localnet = 1; inport = %s; next;",
+ op->json_key);
+ ovn_lflow_add(lflows, op->od, S_SWITCH_IN_CHECK_PORT_SEC, 75,
+ ds_cstr(match), ds_cstr(actions),
+ op->lflow_ref);
+ }
+ }
+}
+
/* Ingress table 30: Destination lookup, broadcast and multicast handling
* (priority 70 - 100). */
static void
@@ -19583,6 +19632,7 @@ build_lswitch_and_lrouter_iterate_by_lsp(struct
ovn_port *op,
meter_groups, actions, match);
build_lswitch_dhcp_options_and_response(op, lflows, meter_groups);
build_lswitch_external_port(op, lflows);
+ build_lswitch_external_lsp_inport_rewrite(op, lflows, match, actions);
build_lswitch_icmp_packet_toobig_admin_flows(op, lflows, match, actions);
build_lswitch_ip_unicast_lookup(op, lflows, actions,
match);
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 074b152..dee022c 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1687,6 +1687,117 @@ OVN_CLEANUP_NORTHD
AT_CLEANUP
])
+OVN_FOR_EACH_NORTHD_NO_HV_PARALLELIZATION([
+AT_SETUP([Load balancer health check reply lflow for type=external backend on
localnet LS])
+ovn_start
+
+# Topology:
+#
+# lr0 --(lr0-sw0)-- sw0 (regular tenant LS, 10.0.0.0/24)
+# `-- vm-port (type="", regular VIF backend)
+#
+# lr0 --(lr0-prov)-- prov (provider LS with localnet)
+# |-- prov-localnet (type=localnet)
+# `-- bm-port (type=external, baremetal pool member)
+#
+# A baremetal LB pool member's LSP is type=external; replies to HM probes
+# re-enter br-int via the localnet port, so MFF_LOG_INPORT carries the
+# localnet LSP's tunnel_key and the original
+# "inport == <bm-port> && ... ; handle_svc_check(inport);"
+# reply lflow never matches. pinctrl_find_svc_monitor() is keyed on
+# (dp_key, port_key) where port_key = backend LSP's tunnel_key, so
+# MFF_LOG_INPORT must hold that tunnel_key when the controller op fires.
+#
+# v6 (Numan-alt) approach: install an inport-rewrite lflow at
+# S_SWITCH_IN_CHECK_PORT_SEC priority 75 keyed on
+# (inport == <localnet> && eth.src == <bm_mac>) which assigns
+# flags.localnet = 1; inport = "<bm-port>"; next;
+# Once that fires, every downstream stage (including the original
+# per-backend handle_svc_check lflow at S_SWITCH_IN_L2_LKUP and the
+# generic per-LS svc_monitor_mac lflow) sees inport == <bm-port> and
+# works without further modification.
+
+check ovn-nbctl lr-add lr0
+check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:01:01 10.0.0.1/24
+check ovn-nbctl lrp-add lr0 lr0-prov 00:00:00:00:02:01 10.0.50.1/24
+
+check ovn-nbctl ls-add sw0
+check ovn-nbctl --wait=sb lsp-add sw0 sw0-lr0 \
+ -- lsp-set-type sw0-lr0 router \
+ -- lsp-set-options sw0-lr0 router-port=lr0-sw0 \
+ -- lsp-set-addresses sw0-lr0 router
+check ovn-nbctl --wait=sb lsp-add sw0 vm-port \
+ -- lsp-set-addresses vm-port "00:00:00:00:01:02 10.0.0.10"
+
+check ovn-nbctl ls-add prov
+check ovn-nbctl --wait=sb lsp-add prov prov-lr0 \
+ -- lsp-set-type prov-lr0 router \
+ -- lsp-set-options prov-lr0 router-port=lr0-prov \
+ -- lsp-set-addresses prov-lr0 router
+check ovn-nbctl --wait=sb lsp-add prov prov-localnet \
+ -- lsp-set-type prov-localnet localnet \
+ -- lsp-set-options prov-localnet network_name=physnet1 \
+ -- lsp-set-addresses prov-localnet unknown
+check ovn-nbctl --wait=sb lsp-add prov bm-port \
+ -- lsp-set-type bm-port external \
+ -- lsp-set-addresses bm-port "00:00:00:00:02:0a 10.0.50.10"
+
+check ovn-sbctl chassis-add hv1 geneve 127.0.0.1
+check ovn-sbctl lsp-bind vm-port hv1
+check ovn-sbctl lsp-bind bm-port hv1
+
+# LB has both a regular-VIF backend on sw0 and a type=external backend on prov.
+check ovn-nbctl lb-add lb1 192.168.0.10:80 10.0.0.10:80,10.0.50.10:80 tcp
+check ovn-nbctl --wait=sb set load_balancer lb1 \
+ ip_port_mappings:10.0.0.10=vm-port:10.0.0.1
+check ovn-nbctl --wait=sb set load_balancer lb1 \
+ ip_port_mappings:10.0.50.10=bm-port:10.0.50.1
+
+check_uuid ovn-nbctl --wait=sb -- --id=@hc create Load_Balancer_Health_Check \
+ vip="192.168.0.10\:80" -- add Load_Balancer lb1 health_check @hc
+
+check ovn-nbctl lr-lb-add lr0 lb1
+check ovn-nbctl ls-lb-add sw0 lb1
+check ovn-nbctl ls-lb-add prov lb1
+check ovn-nbctl --wait=sb sync
+
+# Regular backend on sw0: original "inport == <vm-port>" /
"handle_svc_check(inport);"
+# behavior unchanged.
+AT_CAPTURE_FILE([sw0_lflows])
+ovn-sbctl dump-flows sw0 | grep ls_in_l2_lkup | grep handle_svc_check \
+ > sw0_lflows
+AT_CHECK([cat sw0_lflows | ovn_strip_lflows], [0], [dnl
+ table=??(ls_in_l2_lkup ), priority=110 , match=(eth.dst ==
$svc_monitor_mac && (tcp || icmp || icmp6)), action=(handle_svc_check(inport);)
+ table=??(ls_in_l2_lkup ), priority=110 , match=(inport == "vm-port" &&
ip4.dst == 10.0.0.1 && ip4.src == 10.0.0.10 && eth.dst == 00:00:00:00:01:01 &&
tcp.src == 80), action=(handle_svc_check(inport);)
+])
+
+# type=external backend on prov (localnet LS): the per-backend reply
+# lflow keeps the ORIGINAL inport-based match because the
+# inport-rewrite at S_SWITCH_IN_CHECK_PORT_SEC has already substituted
+# MFF_LOG_INPORT to <bm-port> by the time the packet reaches L2_LKUP.
+AT_CAPTURE_FILE([prov_lflows])
+ovn-sbctl dump-flows prov | grep ls_in_l2_lkup | grep handle_svc_check \
+ > prov_lflows
+AT_CHECK([cat prov_lflows | ovn_strip_lflows], [0], [dnl
+ table=??(ls_in_l2_lkup ), priority=110 , match=(eth.dst ==
$svc_monitor_mac && (tcp || icmp || icmp6)), action=(handle_svc_check(inport);)
+ table=??(ls_in_l2_lkup ), priority=110 , match=(inport == "bm-port" &&
ip4.dst == 10.0.50.1 && ip4.src == 10.0.50.10 && eth.dst == 00:00:00:00:02:01
&& tcp.src == 80), action=(handle_svc_check(inport);)
+])
+
+# v6 inport-rewrite lflow at ls_in_check_port_sec priority 75: turns
+# (inport == <prov-localnet> && eth.src == <bm_mac>) into
+# (flags.localnet = 1; inport = "<bm-port>"; next;)
+AT_CAPTURE_FILE([prov_rewrite_lflows])
+ovn-sbctl dump-flows prov | grep ls_in_check_port_sec \
+ | grep 'priority=75 ' | grep 'inport = ' \
+ > prov_rewrite_lflows
+AT_CHECK([cat prov_rewrite_lflows | ovn_strip_lflows], [0], [dnl
+ table=??(ls_in_check_port_sec), priority=75 , match=(inport ==
"prov-localnet" && eth.src == 00:00:00:00:02:0a), action=(flags.localnet = 1;
inport = "bm-port"; next;)
+])
+
+OVN_CLEANUP_NORTHD
+AT_CLEANUP
+])
+
OVN_FOR_EACH_NORTHD_NO_HV([
AT_SETUP([Load balancer VIP in NAT entries])
AT_SKIP_IF([test $HAVE_PYTHON = no])
diff --git a/tests/ovn.at b/tests/ovn.at
index fbaa63d..74747f4 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -21474,6 +21474,11 @@ grep controller | grep tp_src=546 | grep \
check ovn-nbctl --wait=hv lsp-add-localnet-port ls1 ln-public phys
ln_public_key=$(fetch_column Port_Binding tunnel_key logical_port=ln-public)
+# v6 (Numan-alt): DHCP lflows for external ports now match on the
+# external LSP's inport (after the inport-rewrite at table 0), so the
+# OF flow's reg14 value is the external LSP's tunnel_key, not the
+# localnet's.
+lp_ext1_key=$(fetch_column Port_Binding tunnel_key logical_port=ls1-lp_ext1)
# The ls1-lp_ext1 should be bound to hv1 as only hv1 is part of the
# ha chassis group.
@@ -21485,13 +21490,13 @@ wait_for_ports_up ls1-lp_ext1
(ovn-sbctl dump-flows lr0; ovn-sbctl dump-flows ls1) > sbflows
as hv1 ovs-ofctl dump-flows br-int > brintflows
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | \
-grep controller | grep "0a.00.00.06" | grep reg14=0x$ln_public_key | \
+grep controller | grep "0a.00.00.06" | grep reg14=0x$lp_ext1_key | \
wc -l], [0], [1
])
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | \
grep controller | grep tp_src=546 | grep \
"ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | \
-grep reg14=0x$ln_public_key | wc -l], [0], [1
+grep reg14=0x$lp_ext1_key | wc -l], [0], [1
])
# There should be no DHCPv4/v6 flows for ls1-lp_ext1 on hv2
@@ -21736,14 +21741,15 @@ wait_row_count Port_Binding 1
logical_port=ls1-lp_ext1 chassis=$hv2_uuid
wait_for_ports_up ls1-lp_ext1
# There should be OF flows for DHCP4/v6 for the ls1-lp_ext1 port in hv2
+# v6 (Numan-alt): reg14 carries the external LSP's tunnel_key.
AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | \
-grep controller | grep "0a.00.00.06" | grep reg14=0x$ln_public_key | \
+grep controller | grep "0a.00.00.06" | grep reg14=0x$lp_ext1_key | \
wc -l], [0], [1
])
AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | \
grep controller | grep tp_src=546 | grep \
"ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | \
-grep reg14=0x$ln_public_key | wc -l], [0], [1
+grep reg14=0x$lp_ext1_key | wc -l], [0], [1
])
# There should be no DHCPv4/v6 flows for ls1-lp_ext1 on hv1
@@ -21753,7 +21759,7 @@ grep controller | grep "0a.00.00.06" | wc -l], [0], [0
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | \
grep controller | grep tp_src=546 | grep \
"ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | \
-grep reg14=0x$ln_public_key | wc -l], [0], [0
+grep reg14=0x$lp_ext1_key | wc -l], [0], [0
])
# Send DHCPDISCOVER again for hv1/ext1. The DHCP response should come from
--
2.49.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev