On 3/28/23 08:42, Nobuhiro MIKI wrote:
> SRv6 (Segment Routing IPv6) tunnel vport is responsible
> for encapsulation and decapsulation the inner packets with
> IPv6 header and an extended header called SRH
> (Segment Routing Header). See spec in:
> 
> https://datatracker.ietf.org/doc/html/rfc8754
> 
> This patch implements SRv6 tunneling in userspace datapath.
> It uses `remote_ip` and `local_ip` options as with existing
> tunnel protocols. It also adds a dedicated `srv6_segs` option
> to define a sequence of routers called segment list.
> 
> Signed-off-by: Nobuhiro MIKI <[email protected]>
> ---
>  Documentation/faq/configuration.rst |  21 +++++
>  Documentation/faq/releases.rst      |   1 +
>  NEWS                                |   2 +
>  include/linux/openvswitch.h         |   1 +
>  include/sparse/netinet/in.h         |   1 +
>  lib/dpif-netlink-rtnl.c             |   5 ++
>  lib/dpif-netlink.c                  |   5 ++
>  lib/netdev-native-tnl.c             | 127 ++++++++++++++++++++++++++++
>  lib/netdev-native-tnl.h             |  10 +++
>  lib/netdev-vport.c                  |  53 ++++++++++++
>  lib/netdev.h                        |   4 +
>  lib/packets.h                       |  11 +++
>  lib/tnl-ports.c                     |   5 +-
>  ofproto/ofproto-dpif-xlate.c        |   3 +
>  tests/system-kmod-macros.at         |   8 ++
>  tests/system-traffic.at             | 119 ++++++++++++++++++++++++++
>  tests/system-userspace-macros.at    |   6 ++
>  tests/tunnel.at                     |  56 ++++++++++++
>  18 files changed, 437 insertions(+), 1 deletion(-)
> 

<snip>

> +int
> +netdev_srv6_build_header(const struct netdev *netdev,
> +                         struct ovs_action_push_tnl *data,
> +                         const struct netdev_tnl_build_header_params *params)
> +{
> +    struct netdev_vport *dev = netdev_vport_cast(netdev);
> +    struct netdev_tunnel_config *tnl_cfg;
> +    const struct in6_addr *segs;
> +    struct srv6_base_hdr *srh;
> +    struct in6_addr *s;
> +    ovs_be16 dl_type;
> +    int err = 0;
> +    int nr_segs;
> +    int i;
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    tnl_cfg = &dev->tnl_cfg;
> +
> +    if (tnl_cfg->srv6_num_segs) {
> +        nr_segs = tnl_cfg->srv6_num_segs;
> +        segs = tnl_cfg->srv6_segs;
> +    } else {
> +        /*
> +         * If explicit segment list setting is omitted, tunnel destination
> +         * is considered to be the first segment list.
> +         */
> +        nr_segs = 1;
> +        segs = &params->flow->tunnel.ipv6_dst;
> +    }
> +
> +    if (!ipv6_addr_equals(&segs[0], &params->flow->tunnel.ipv6_dst)) {
> +        err = EINVAL;
> +        goto out;
> +    }
> +
> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING);
> +    srh->rt_hdr.segments_left = nr_segs - 1;
> +    srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
> +    srh->rt_hdr.hdrlen = 2 * nr_segs;
> +    srh->last_entry = nr_segs - 1;
> +    srh->flags = 0;
> +    srh->tag = 0;
> +
> +    dl_type = params->flow->dl_type;
> +    if (dl_type == htons(ETH_TYPE_IP)) {
> +        srh->rt_hdr.nexthdr = IPPROTO_IPIP;
> +    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> +        srh->rt_hdr.nexthdr = IPPROTO_IPV6;
> +    }

We should probably error out here if for some reason it's not an IP or IPv6.

> +
> +    s = ALIGNED_CAST(struct in6_addr *,
> +                     (char *) srh + sizeof *srh);
> +    for (i = 0; i < nr_segs; i++) {
> +        /* Segment list is written to the header in reverse order. */
> +        memcpy(s, &segs[nr_segs - i - 1], sizeof *s);
> +        s++;
> +    }
> +
> +    data->header_len += sizeof *srh + 8 * srh->rt_hdr.hdrlen;
> +    data->tnl_type = OVS_VPORT_TYPE_SRV6;
> +out:
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    return err;
> +}
> +
> +void
> +netdev_srv6_push_header(const struct netdev *netdev OVS_UNUSED,
> +                        struct dp_packet *packet OVS_UNUSED,
> +                        const struct ovs_action_push_tnl *data OVS_UNUSED)

'packet' and 'data' are actually used by the function.

> +{
> +    int ip_tot_size;
> +
> +    netdev_tnl_push_ip_header(packet, data->header,
> +                              data->header_len, &ip_tot_size);
> +}
> +

<snip>

> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index a9cf3cbee0be..15c814d6285b 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3632,6 +3632,9 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, 
> struct eth_addr dmac,
>      case OVS_VPORT_TYPE_BAREUDP:
>          nw_proto = IPPROTO_UDP;
>          break;
> +    case OVS_VPORT_TYPE_SRV6:
> +        nw_proto = IPPROTO_IPIP;

Sorry, just noticed that we're setting nw_proto to IPIP
always after encapsulation, even if the packet was IPv6.
We should check the dl_type of the original flow and set
the nw_proto appropriately.

You can check that the ping6 test fails if we request to
drop encapsulated IPv4 packets like this:

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 65bae736f..33d60244e 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1241,7 +1241,8 @@ dnl using veth pair. Kernel side tunnel endpoint (SID) is
 dnl 'fc00:a::1/128', so add it to the route.
 ADD_BR([br-underlay])
 ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
-AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "priority=1,actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ipv6,nw_proto=4,actions=drop"])
 AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
 AT_CHECK([ip link set dev br-underlay up])
 AT_CHECK([ip -6 route add fc00:a::1/128 dev br-underlay via fc00::1])
---

Even though the packet itself has nw_proto=41, the flow structure
that tracks it has nw_proto=4, and it is used for flow matching
after encapsulation.

A simple ternary operator solves the issue for me:

     case OVS_VPORT_TYPE_SRV6:
-        nw_proto = IPPROTO_IPIP;
+        nw_proto = (flow->dl_type == htons(ETH_TYPE_IP))
+                   ? IPPROTO_IPIP : IPPROTO_IPV6;
         break;

Might be good to have this case covered in tests as well.
Either by something similar to what I did above, or by a
separate test.

What do you think?

<snip>

> +dnl Set up tunnel endpoints on OVS outside the namespace.
> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
> +dnl [sleep infinity]

This line returned. :)

The rest of the set looks fine to me.  Thanks!

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to