On 8/15/24 05:02, [email protected] wrote:
> From: Numan Siddique <[email protected]>
>
> IPv6 ND Solicitation (NS) responder logical flows match on ip6.dst
> field. These flows when translated to datapath flows also match on
> ip6.dst, which means a separate datapath flow per destination IP
> address. This may cause significant performance issues in some
> setups (particularly ovs-dpdk telco deployments).
>
> This patch addresses this issue by matching on eth.mcast6 so that
> datapath flows for normal IPv6 traffic doesn't have to match on
> ip6.dst. IPv6 NS packets are generally multicast. A new logical
> match "nd_ns_mcast" is added for this purpose.
>
> After this patch, We no longer respond to IPv6 NS unicast packets.
> Let the target reply to it, so that the sender has the ability to
> monitor the targe liveness via the unicast ND solicitations.
> This behavior now matches the IPv4 ARP responder flows. Note that
> after the commit [1] which was recently added we now only respond
> to IPv4 ARP broadcast packets.
>
> A recent patch [2] from Ilya partially addressed the same datapath
> flow explosion issue by matching on eth.mcast6 for MLD packets.
> With this patch, we now completely address the datapath flow
> explosion issue for IPv6 traffic provided all the logical ports
> of a logical switch are not configured with port security.
>
> [1] - c48ed1736a58 ("Do not reply on unicast arps for IPv4 targets.")
> [2] - 43c34f2e6676 ("logical-fields: Add missing multicast matches for MLD
> and IGMP.")
>
> Note: Documentation for 'eth.mcastv6' and 'ip6.mcast' predicates were
> missing from ovn-sb.xml and this patch adds it.
>
> Reported-at: https://issues.redhat.com/browse/FDP-728
> Reported-by: Mike Pattrick <[email protected]>
> Signed-off-by: Numan Siddique <[email protected]>
> ---
>
> v1 -> v2
> -------
> - Addressed review comments from Ilya.
>
Hi Numan,
For proxy-arp we still match on unicast ND as far as I can tell. I'm
not sure if we can remove that part though. Maybe it's something to
consider as a follow up?
build_lswitch_arp_nd_responder_known_ips()
[...]
/* Add IPv6 NDP responses.
* For ND solicitations, we need to listen for both the
* unicast IPv6 address and its all-nodes multicast address,
* but always respond with the unicast IPv6 address. */
if (op->proxy_arp_addrs.n_ipv6_addrs) {
struct ds ip6_dst_match = DS_EMPTY_INITIALIZER;
struct ds nd_target_match = DS_EMPTY_INITIALIZER;
for (size_t j = 0; j < op->proxy_arp_addrs.n_ipv6_addrs; j++) {
ds_put_format(&ip6_dst_match, "%s/%u, %s/%u, ",
op->proxy_arp_addrs.ipv6_addrs[j].addr_s,
op->proxy_arp_addrs.ipv6_addrs[j].plen,
op->proxy_arp_addrs.ipv6_addrs[j].sn_addr_s,
op->proxy_arp_addrs.ipv6_addrs[j].plen);
ds_put_format(&nd_target_match, "%s/%u, ",
op->proxy_arp_addrs.ipv6_addrs[j].addr_s,
op->proxy_arp_addrs.ipv6_addrs[j].plen);
}
Aside from that I have a few small comments on the test. In general
the change makes sense to me.
Thanks,
Dumitru
> lib/logical-fields.c | 3 ++
> northd/northd.c | 21 +++++---
> ovn-sb.xml | 3 ++
> tests/ovn-northd.at | 4 +-
> tests/ovn.at | 4 +-
> tests/system-ovn.at | 123 +++++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 146 insertions(+), 12 deletions(-)
>
> diff --git a/lib/logical-fields.c b/lib/logical-fields.c
> index 2c9d3c61bf..5a8b53f2b6 100644
> --- a/lib/logical-fields.c
> +++ b/lib/logical-fields.c
> @@ -293,6 +293,9 @@ ovn_init_symtab(struct shash *symtab)
> "icmp6.type == {135, 136} && icmp6.code == 0 && ip.ttl ==
> 255");
> expr_symtab_add_predicate(symtab, "nd_ns",
> "icmp6.type == 135 && icmp6.code == 0 && ip.ttl == 255");
> + expr_symtab_add_predicate(symtab, "nd_ns_mcast",
> + "ip6.mcast && icmp6.type == 135 && icmp6.code == 0 && "
> + "ip.ttl == 255");
> expr_symtab_add_predicate(symtab, "nd_na",
> "icmp6.type == 136 && icmp6.code == 0 && ip.ttl == 255");
> expr_symtab_add_predicate(symtab, "nd_rs",
> diff --git a/northd/northd.c b/northd/northd.c
> index c4824aae3b..c9ae1e958f 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -9633,16 +9633,21 @@ build_lswitch_arp_nd_responder_known_ips(struct
> ovn_port *op,
> op->lflow_ref);
> }
>
> - /* For ND solicitations, we need to listen for both the
> - * unicast IPv6 address and its all-nodes multicast address,
> - * but always respond with the unicast IPv6 address. */
> + /* For ND solicitations:
> + * - Reply only for the all-nodes multicast address(es) of the
> + * logical port IPv6 address(es).
> + *
> + * - Do not reply for unicast ND solicitations. Let the target
> + * reply it, so that the sender has the ability to monitor
> + * the targe liveness via the unicast ND solicitations.
> + */
> for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) {
> ds_clear(match);
> - ds_put_format(match,
> - "nd_ns && ip6.dst == {%s, %s} && nd.target == %s",
> - op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> - op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
> - op->lsp_addrs[i].ipv6_addrs[j].addr_s);
> + ds_put_format(
> + match,
> + "nd_ns_mcast && ip6.dst == %s && nd.target == %s",
> + op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
> + op->lsp_addrs[i].ipv6_addrs[j].addr_s);
>
> ds_clear(actions);
> ds_put_format(actions,
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index de0bd636fd..c11296d7c4 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -1146,6 +1146,7 @@
> <ul>
> <li><code>eth.bcast</code> expands to <code>eth.dst ==
> ff:ff:ff:ff:ff:ff</code></li>
> <li><code>eth.mcast</code> expands to <code>eth.dst[40]</code></li>
> + <li><code>eth.mcastv6</code> expands to <code>eth.dst[32..47] ==
> 0x3333</code></li>
> <li><code>vlan.present</code> expands to
> <code>vlan.tci[12]</code></li>
> <li><code>ip4</code> expands to <code>eth.type == 0x800</code></li>
> <li><code>ip4.src_mcast</code> expands to
> @@ -1161,8 +1162,10 @@
> <li><code>ip.first_frag</code> expands to <code>ip.is_frag
> && !ip.later_frag</code></li>
> <li><code>arp</code> expands to <code>eth.type == 0x806</code></li>
> <li><code>rarp</code> expands to <code>eth.type == 0x8035</code></li>
> + <li><code>ip6.mcast</code> expands to <code>eth.mcastv6 &&
> ip6.dst[120..127] == 0xff</code></li>
> <li><code>nd</code> expands to <code>icmp6.type == {135, 136}
> && icmp6.code == 0 && ip.ttl == 255</code></li>
> <li><code>nd_ns</code> expands to <code>icmp6.type == 135 &&
> icmp6.code == 0 && ip.ttl == 255</code></li>
> + <li><code>nd_ns_mcast</code> expands to <code>ip6.mcast &&
> icmp6.type == 135 && icmp6.code == 0 && ip.ttl ==
> 255</code></li>
> <li><code>nd_na</code> expands to <code>icmp6.type == 136 &&
> icmp6.code == 0 && ip.ttl == 255</code></li>
> <li><code>nd_rs</code> expands to <code>icmp6.type == 133 &&
> icmp6.code == 0 && ip.ttl == 255</code></li>
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 3c49c0d43b..93ccbce6b0 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -9503,9 +9503,9 @@ AT_CAPTURE_FILE([S1flows])
> AT_CHECK([grep -e "ls_in_arp_rsp" S1flows | ovn_strip_lflows], [0], [dnl
> table=??(ls_in_arp_rsp ), priority=0 , match=(1), action=(next;)
> table=??(ls_in_arp_rsp ), priority=100 , match=(arp.tpa ==
> 192.168.0.10 && arp.op == 1 && eth.dst == ff:ff:ff:ff:ff:ff && inport ==
> "S1-vm1"), action=(next;)
> - table=??(ls_in_arp_rsp ), priority=100 , match=(nd_ns && ip6.dst ==
> {fd00::10, ff02::1:ff00:10} && nd.target == fd00::10 && inport == "S1-vm1"),
> action=(next;)
> + table=??(ls_in_arp_rsp ), priority=100 , match=(nd_ns_mcast &&
> ip6.dst == ff02::1:ff00:10 && nd.target == fd00::10 && inport == "S1-vm1"),
> action=(next;)
> table=??(ls_in_arp_rsp ), priority=50 , match=(arp.tpa ==
> 192.168.0.10 && arp.op == 1 && eth.dst == ff:ff:ff:ff:ff:ff), action=(eth.dst
> = eth.src; eth.src = 50:54:00:00:00:10; arp.op = 2; /* ARP reply */ arp.tha =
> arp.sha; arp.sha = 50:54:00:00:00:10; arp.tpa = arp.spa; arp.spa =
> 192.168.0.10; outport = inport; flags.loopback = 1; output;)
> - table=??(ls_in_arp_rsp ), priority=50 , match=(nd_ns && ip6.dst ==
> {fd00::10, ff02::1:ff00:10} && nd.target == fd00::10), action=(nd_na {
> eth.src = 50:54:00:00:00:10; ip6.src = fd00::10; nd.target = fd00::10; nd.tll
> = 50:54:00:00:00:10; outport = inport; flags.loopback = 1; output; };)
> + table=??(ls_in_arp_rsp ), priority=50 , match=(nd_ns_mcast &&
> ip6.dst == ff02::1:ff00:10 && nd.target == fd00::10), action=(nd_na { eth.src
> = 50:54:00:00:00:10; ip6.src = fd00::10; nd.target = fd00::10; nd.tll =
> 50:54:00:00:00:10; outport = inport; flags.loopback = 1; output; };)
> ])
>
> #Set the disable_arp_nd_rsp option and verify the flow
> diff --git a/tests/ovn.at b/tests/ovn.at
> index a1d689e849..7583da7e07 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -8617,7 +8617,7 @@ done
> # Complete Neighbor Solicitation packet and Neighbor Advertisement packet
> # vif1 -> NS -> vif2. vif1 <- NA <- ovn-controller.
> # vif2 will not receive NS packet, since ovn-controller will reply for it.
> -ns_packet=3333ffa1f9aefa163e94059886dd6000000000203afffd81ce49a9480000f8163efffe940598fd81ce49a9480000f8163efffea1f9ae8700e01160000000fd81ce49a9480000f8163efffea1f9ae0101fa163e940598
> +ns_packet=3333ffa1f9aefa163e94059886dd6000000000203afffd81ce49a9480000f8163efffe940598ff0200000000000000000001ffa1f9ae8700e01160000000fd81ce49a9480000f8163efffea1f9ae0101fa163e940598
>
> na_packet=fa163e940598fa163ea1f9ae86dd6000000000203afffd81ce49a9480000f8163efffea1f9aefd81ce49a9480000f8163efffe9405988800e9ed60000000fd81ce49a9480000f8163efffea1f9ae0201fa163ea1f9ae
>
> as hv1 ovs-appctl netdev-dummy/receive vif1 $ns_packet
> @@ -14417,7 +14417,7 @@ test_ns_na() {
> local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5
>
> packet=$(fmt_pkt "
> - Ether(dst='ff:ff:ff:ff:ff:ff', src='${src_mac}') /
> + Ether(dst='33:33:ff:ff:ff:ff', src='${src_mac}') /
> IPv6(src='${src_ip}', dst='ff02::1:ff00:2') /
> ICMPv6ND_NS(tgt='${dst_ip}')
> ")
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 6e4ec42473..e43244ca60 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -13899,3 +13899,126 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> /.*terminating with signal 15.*/d"])
> AT_CLEANUP
> ])
> +
> +dnl This test sends IPv6 TCP packets between two lports of the same
> +dnl logical switch and checks that the datapath flows don't match on
> +dnl IPv6 source and destination addresses.
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn -- IPv6 switching - datapath flows])
> +AT_KEYWORDS([IPv6])
> +
> +ovn_start
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-int])
> +
> +dnl Set external-ids in br-int needed for ovn-controller
> +check ovs-vsctl \
> + -- set Open_vSwitch . external-ids:system-id=hv1 \
> + -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> + -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true
> +
> +dnl Start ovn-controller
> +start_daemon ovn-controller
> +
> +check ovn-nbctl ls-add sw0
> +
> +check ovn-nbctl lsp-add sw0 vm0
> +check ovn-nbctl lsp-set-addresses vm0 "f0:00:0f:01:02:03 10.0.0.3 1000::3"
> +
> +check ovn-nbctl lsp-add sw0 vm1
> +check ovn-nbctl lsp-set-addresses vm1 "f0:00:0f:01:02:04 10.0.0.4 1000::4"
> +
> +check ovn-nbctl --wait=hv sync
> +
> +ADD_NAMESPACES(vm0)
> +ADD_VETH(vm0, vm0, br-int, "1000::3/64", "f0:00:0f:01:02:03", "1000::1",
> "nodad")
> +
> +ADD_NAMESPACES(vm1)
> +ADD_VETH(vm1, vm1, br-int, "1000::4/64", "f0:00:0f:01:02:04", "1000::1",
> "nodad")
> +
It's probably best to ensure OVN bound the ports here, e.g.:
dnl Wait for ovn-controller to catch up.
wait_for_ports_up
check ovn-nbctl --wait=hv sync
The sync before ADD_NAMESPACES can probably be removed then.
> +dnl Checks the datapath flows and makes sure that there are no matches to
> +dnl IPv6 src or dst fields.
> +check_dp_flows() {
> + ipv6_src_dst_match=$1
> + dnl Dump the flows. Will be helpful while debugging.
> + ovs-appctl dpctl/dump-flows
It might be better to change this to something like:
ovs-appctl dpctl/dump-flows > dp-flows
AT_CAPTURE_FILE([dp-flows])
> +
> + eth_match1="eth(src=f0:00:0f:01:02:03,dst=f0:00:0f:01:02:04)"
> + eth_match2="eth(src=f0:00:0f:01:02:04,dst=f0:00:0f:01:02:03)"
> + ipv6_src_match="ipv6(src="
> + ipv6_dst_match="ipv6(dst="
> +
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match1" -c], [0], [dnl
> +2
> +])
> +
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match2" -c], [0], [dnl
> +2
> +])
> +
> + if [[ "$ipv6_src_dst_match" == "yes" ]]; then
> + dnl Check that the dp flow for
> eth(src=f0:00:0f:01:02:03,dst=f0:00:0f:01:02:04)
> + dnl matches on IPv6 src or dst.
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match1" | grep -e
> "$ipv6_src_match" -e "$ipv6_dst_match" -c], [0], [dnl
> +2
> +])
> +
> + dnl Check that the dp flow for
> eth(src=f0:00:0f:01:02:04,dst=f0:00:0f:01:02:03)
> + dnl matches on IPv6 src or dst.
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match2" | grep -e
> "$ipv6_src_match" -e "$ipv6_dst_match" -c], [0], [dnl
> +2
> +])
> + else
> + dnl Check that the dp flow for
> eth(src=f0:00:0f:01:02:03,dst=f0:00:0f:01:02:04)
> + dnl doesn't match on IPv6 src or dst.
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match1" | grep
> "$ipv6_src_match"], [1], [])
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match1" | grep
> "$ipv6_dst_match"], [1], [])
> +
> + dnl Check that the dp flow for
> eth(src=f0:00:0f:01:02:04,dst=f0:00:0f:01:02:03)
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match2" | grep
> "$ipv6_src_match"], [1], [])
> + AT_CHECK([ovs-appctl dpctl/dump-flows | grep "$eth_match2" | grep
> "$ipv6_dst_match"], [1], [])
> + fi
> +}
> +
> +NS_CHECK_EXEC([vm0], [ping6 -q -c 3 -i 0.3 -w 2 1000::4 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +# Start IPv6 TCP server on vm0.
> +NETNS_DAEMONIZE([vm0], [nc -6 -l -k 1000::3 4242], [nc-vm1-1.pid])
> +NS_CHECK_EXEC([vm1], [nc -6 1000::3 4242 -z], [0], [ignore], [ignore])
> +
> +check_dp_flows no
> +
> +# Now configure port security on vm0. dp flows should now match on IPv6
> fields.
> +check ovn-nbctl lsp-set-port-security vm0 "f0:00:0f:01:02:03 10.0.0.3
> 1000::3"
> +check ovn-nbctl --wait=hv sync
> +
> +NS_CHECK_EXEC([vm0], [ping6 -q -c 3 -i 0.3 -w 2 1000::4 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +NS_CHECK_EXEC([vm1], [nc -6 1000::3 4242 -z], [0], [ignore], [ignore])
> +check_dp_flows yes
> +
The test failed here with the userspace datapath in the ovsrobot run:
https://github.com/ovsrobot/ovn/actions/runs/10398228623/job/28795300825
Is it a matter of waiting for output instead of checking for datapath flows
immediately? Alternatively do we need to wait for the revalidator to run
(ovs-appctl revalidator/wait)?
> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> +
> +as ovn-sb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as ovn-nb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as northd
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +as
> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> +/failed to query port patch-.*/d
> +/.*terminating with signal 15.*/d"])
> +AT_CLEANUP
> +])
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev