On 3/28/23 08:42, Nobuhiro MIKI wrote:
> SRv6 (Segment Routing IPv6) tunnel vport is responsible
> for encapsulation and decapsulation the inner packets with
> IPv6 header and an extended header called SRH
> (Segment Routing Header). See spec in:
>
> https://datatracker.ietf.org/doc/html/rfc8754
>
> This patch implements SRv6 tunneling in userspace datapath.
> It uses `remote_ip` and `local_ip` options as with existing
> tunnel protocols. It also adds a dedicated `srv6_segs` option
> to define a sequence of routers called segment list.
>
> Signed-off-by: Nobuhiro MIKI <[email protected]>
> ---
> Documentation/faq/configuration.rst | 21 +++++
> Documentation/faq/releases.rst | 1 +
> NEWS | 2 +
> include/linux/openvswitch.h | 1 +
> include/sparse/netinet/in.h | 1 +
> lib/dpif-netlink-rtnl.c | 5 ++
> lib/dpif-netlink.c | 5 ++
> lib/netdev-native-tnl.c | 127 ++++++++++++++++++++++++++++
> lib/netdev-native-tnl.h | 10 +++
> lib/netdev-vport.c | 53 ++++++++++++
> lib/netdev.h | 4 +
> lib/packets.h | 11 +++
> lib/tnl-ports.c | 5 +-
> ofproto/ofproto-dpif-xlate.c | 3 +
> tests/system-kmod-macros.at | 8 ++
> tests/system-traffic.at | 119 ++++++++++++++++++++++++++
> tests/system-userspace-macros.at | 6 ++
> tests/tunnel.at | 56 ++++++++++++
> 18 files changed, 437 insertions(+), 1 deletion(-)
>
<snip>
> +int
> +netdev_srv6_build_header(const struct netdev *netdev,
> + struct ovs_action_push_tnl *data,
> + const struct netdev_tnl_build_header_params *params)
> +{
> + struct netdev_vport *dev = netdev_vport_cast(netdev);
> + struct netdev_tunnel_config *tnl_cfg;
> + const struct in6_addr *segs;
> + struct srv6_base_hdr *srh;
> + struct in6_addr *s;
> + ovs_be16 dl_type;
> + int err = 0;
> + int nr_segs;
> + int i;
> +
> + ovs_mutex_lock(&dev->mutex);
> + tnl_cfg = &dev->tnl_cfg;
> +
> + if (tnl_cfg->srv6_num_segs) {
> + nr_segs = tnl_cfg->srv6_num_segs;
> + segs = tnl_cfg->srv6_segs;
> + } else {
> + /*
> + * If explicit segment list setting is omitted, tunnel destination
> + * is considered to be the first segment list.
> + */
> + nr_segs = 1;
> + segs = ¶ms->flow->tunnel.ipv6_dst;
> + }
> +
> + if (!ipv6_addr_equals(&segs[0], ¶ms->flow->tunnel.ipv6_dst)) {
> + err = EINVAL;
> + goto out;
> + }
> +
> + srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING);
> + srh->rt_hdr.segments_left = nr_segs - 1;
> + srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
> + srh->rt_hdr.hdrlen = 2 * nr_segs;
> + srh->last_entry = nr_segs - 1;
> + srh->flags = 0;
> + srh->tag = 0;
> +
> + dl_type = params->flow->dl_type;
> + if (dl_type == htons(ETH_TYPE_IP)) {
> + srh->rt_hdr.nexthdr = IPPROTO_IPIP;
> + } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> + srh->rt_hdr.nexthdr = IPPROTO_IPV6;
> + }
We should probably error out here if for some reason it's not an IP or IPv6.
> +
> + s = ALIGNED_CAST(struct in6_addr *,
> + (char *) srh + sizeof *srh);
> + for (i = 0; i < nr_segs; i++) {
> + /* Segment list is written to the header in reverse order. */
> + memcpy(s, &segs[nr_segs - i - 1], sizeof *s);
> + s++;
> + }
> +
> + data->header_len += sizeof *srh + 8 * srh->rt_hdr.hdrlen;
> + data->tnl_type = OVS_VPORT_TYPE_SRV6;
> +out:
> + ovs_mutex_unlock(&dev->mutex);
> +
> + return err;
> +}
> +
> +void
> +netdev_srv6_push_header(const struct netdev *netdev OVS_UNUSED,
> + struct dp_packet *packet OVS_UNUSED,
> + const struct ovs_action_push_tnl *data OVS_UNUSED)
'packet' and 'data' are actually used by the function.
> +{
> + int ip_tot_size;
> +
> + netdev_tnl_push_ip_header(packet, data->header,
> + data->header_len, &ip_tot_size);
> +}
> +
<snip>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index a9cf3cbee0be..15c814d6285b 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3632,6 +3632,9 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx,
> struct eth_addr dmac,
> case OVS_VPORT_TYPE_BAREUDP:
> nw_proto = IPPROTO_UDP;
> break;
> + case OVS_VPORT_TYPE_SRV6:
> + nw_proto = IPPROTO_IPIP;
Sorry, just noticed that we're setting nw_proto to IPIP
always after encapsulation, even if the packet was IPv6.
We should check the dl_type of the original flow and set
the nw_proto appropriately.
You can check that the ping6 test fails if we request to
drop encapsulated IPv4 packets like this:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 65bae736f..33d60244e 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1241,7 +1241,8 @@ dnl using veth pair. Kernel side tunnel endpoint (SID) is
dnl 'fc00:a::1/128', so add it to the route.
ADD_BR([br-underlay])
ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
-AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "priority=1,actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay
"priority=100,ipv6,nw_proto=4,actions=drop"])
AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
AT_CHECK([ip link set dev br-underlay up])
AT_CHECK([ip -6 route add fc00:a::1/128 dev br-underlay via fc00::1])
---
Even though the packet itself has nw_proto=41, the flow structure
that tracks it has nw_proto=4, and it is used for flow matching
after encapsulation.
A simple ternary operator solves the issue for me:
case OVS_VPORT_TYPE_SRV6:
- nw_proto = IPPROTO_IPIP;
+ nw_proto = (flow->dl_type == htons(ETH_TYPE_IP))
+ ? IPPROTO_IPIP : IPPROTO_IPV6;
break;
Might be good to have this case covered in tests as well.
Either by something similar to what I did above, or by a
separate test.
What do you think?
<snip>
> +dnl Set up tunnel endpoints on OVS outside the namespace.
> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
> +dnl [sleep infinity]
This line returned. :)
The rest of the set looks fine to me. Thanks!
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev