On 2023/03/28 21:31, Ilya Maximets wrote:
> On 3/28/23 08:42, Nobuhiro MIKI wrote:
>> SRv6 (Segment Routing IPv6) tunnel vport is responsible
>> for encapsulation and decapsulation the inner packets with
>> IPv6 header and an extended header called SRH
>> (Segment Routing Header). See spec in:
>>
>> https://datatracker.ietf.org/doc/html/rfc8754
>>
>> This patch implements SRv6 tunneling in userspace datapath.
>> It uses `remote_ip` and `local_ip` options as with existing
>> tunnel protocols. It also adds a dedicated `srv6_segs` option
>> to define a sequence of routers called segment list.
>>
>> Signed-off-by: Nobuhiro MIKI <[email protected]>
>> ---
>>  Documentation/faq/configuration.rst |  21 +++++
>>  Documentation/faq/releases.rst      |   1 +
>>  NEWS                                |   2 +
>>  include/linux/openvswitch.h         |   1 +
>>  include/sparse/netinet/in.h         |   1 +
>>  lib/dpif-netlink-rtnl.c             |   5 ++
>>  lib/dpif-netlink.c                  |   5 ++
>>  lib/netdev-native-tnl.c             | 127 ++++++++++++++++++++++++++++
>>  lib/netdev-native-tnl.h             |  10 +++
>>  lib/netdev-vport.c                  |  53 ++++++++++++
>>  lib/netdev.h                        |   4 +
>>  lib/packets.h                       |  11 +++
>>  lib/tnl-ports.c                     |   5 +-
>>  ofproto/ofproto-dpif-xlate.c        |   3 +
>>  tests/system-kmod-macros.at         |   8 ++
>>  tests/system-traffic.at             | 119 ++++++++++++++++++++++++++
>>  tests/system-userspace-macros.at    |   6 ++
>>  tests/tunnel.at                     |  56 ++++++++++++
>>  18 files changed, 437 insertions(+), 1 deletion(-)
>>
> 
> <snip>
> 
>> +int
>> +netdev_srv6_build_header(const struct netdev *netdev,
>> +                         struct ovs_action_push_tnl *data,
>> +                         const struct netdev_tnl_build_header_params 
>> *params)
>> +{
>> +    struct netdev_vport *dev = netdev_vport_cast(netdev);
>> +    struct netdev_tunnel_config *tnl_cfg;
>> +    const struct in6_addr *segs;
>> +    struct srv6_base_hdr *srh;
>> +    struct in6_addr *s;
>> +    ovs_be16 dl_type;
>> +    int err = 0;
>> +    int nr_segs;
>> +    int i;
>> +
>> +    ovs_mutex_lock(&dev->mutex);
>> +    tnl_cfg = &dev->tnl_cfg;
>> +
>> +    if (tnl_cfg->srv6_num_segs) {
>> +        nr_segs = tnl_cfg->srv6_num_segs;
>> +        segs = tnl_cfg->srv6_segs;
>> +    } else {
>> +        /*
>> +         * If explicit segment list setting is omitted, tunnel destination
>> +         * is considered to be the first segment list.
>> +         */
>> +        nr_segs = 1;
>> +        segs = &params->flow->tunnel.ipv6_dst;
>> +    }
>> +
>> +    if (!ipv6_addr_equals(&segs[0], &params->flow->tunnel.ipv6_dst)) {
>> +        err = EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING);
>> +    srh->rt_hdr.segments_left = nr_segs - 1;
>> +    srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>> +    srh->rt_hdr.hdrlen = 2 * nr_segs;
>> +    srh->last_entry = nr_segs - 1;
>> +    srh->flags = 0;
>> +    srh->tag = 0;
>> +
>> +    dl_type = params->flow->dl_type;
>> +    if (dl_type == htons(ETH_TYPE_IP)) {
>> +        srh->rt_hdr.nexthdr = IPPROTO_IPIP;
>> +    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
>> +        srh->rt_hdr.nexthdr = IPPROTO_IPV6;
>> +    }
> 
> We should probably error out here if for some reason it's not an IP or IPv6.
> 

For cases other than IP or IPv6, fix to return EOPNOTSUPP, which means 
unsupported.

>> +
>> +    s = ALIGNED_CAST(struct in6_addr *,
>> +                     (char *) srh + sizeof *srh);
>> +    for (i = 0; i < nr_segs; i++) {
>> +        /* Segment list is written to the header in reverse order. */
>> +        memcpy(s, &segs[nr_segs - i - 1], sizeof *s);
>> +        s++;
>> +    }
>> +
>> +    data->header_len += sizeof *srh + 8 * srh->rt_hdr.hdrlen;
>> +    data->tnl_type = OVS_VPORT_TYPE_SRV6;
>> +out:
>> +    ovs_mutex_unlock(&dev->mutex);
>> +
>> +    return err;
>> +}
>> +
>> +void
>> +netdev_srv6_push_header(const struct netdev *netdev OVS_UNUSED,
>> +                        struct dp_packet *packet OVS_UNUSED,
>> +                        const struct ovs_action_push_tnl *data OVS_UNUSED)
> 
> 'packet' and 'data' are actually used by the function.
> 

Thanks.
I'll remove the annotation OVS_UNUSED from 'packet' and 'data'.

>> +{
>> +    int ip_tot_size;
>> +
>> +    netdev_tnl_push_ip_header(packet, data->header,
>> +                              data->header_len, &ip_tot_size);
>> +}
>> +
> 
> <snip>
> 
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index a9cf3cbee0be..15c814d6285b 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -3632,6 +3632,9 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, 
>> struct eth_addr dmac,
>>      case OVS_VPORT_TYPE_BAREUDP:
>>          nw_proto = IPPROTO_UDP;
>>          break;
>> +    case OVS_VPORT_TYPE_SRV6:
>> +        nw_proto = IPPROTO_IPIP;
> 
> Sorry, just noticed that we're setting nw_proto to IPIP
> always after encapsulation, even if the packet was IPv6.
> We should check the dl_type of the original flow and set
> the nw_proto appropriately.
> 
> You can check that the ping6 test fails if we request to
> drop encapsulated IPv4 packets like this:
> 
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 65bae736f..33d60244e 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1241,7 +1241,8 @@ dnl using veth pair. Kernel side tunnel endpoint (SID) 
> is
>  dnl 'fc00:a::1/128', so add it to the route.
>  ADD_BR([br-underlay])
>  ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> -AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "priority=1,actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay 
> "priority=100,ipv6,nw_proto=4,actions=drop"])
>  AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
>  AT_CHECK([ip link set dev br-underlay up])
>  AT_CHECK([ip -6 route add fc00:a::1/128 dev br-underlay via fc00::1])
> ---
> 
> Even though the packet itself has nw_proto=41, the flow structure
> that tracks it has nw_proto=4, and it is used for flow matching
> after encapsulation.
> 
> A simple ternary operator solves the issue for me:
> 
>      case OVS_VPORT_TYPE_SRV6:
> -        nw_proto = IPPROTO_IPIP;
> +        nw_proto = (flow->dl_type == htons(ETH_TYPE_IP))
> +                   ? IPPROTO_IPIP : IPPROTO_IPV6;
>          break;
> 
> Might be good to have this case covered in tests as well.
> Either by something similar to what I did above, or by a
> separate test.
> 
> What do you think?
> 

I was able to reproduce this problem in my environment.
And I think your solution with ternary operator is great, so I will adopt it.

As for testing, I think it would be more robust if we implement a whitelist
to allow only IPPROTO_IPV6(41) and IPPROTO_ICMPV6(58) as shown below:

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 165be2db5c3b..63753adbccce 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1239,9 +1242,12 @@ NS_EXEC([at_ns0], [sysctl -w 
net.ipv6.conf.all.forwarding=1])
 dnl Set up underlay link from host into the namespace 'at_ns0'
 dnl using veth pair. Kernel side tunnel endpoint (SID) is
 dnl 'fc00:a::1/128', so add it to the route.
+dnl Only IPPROTO_IPV6(41) and IPPROTO_ICMPV6(58) are needed in underlay link.
 ADD_BR([br-underlay])
 ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
-AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "priority=1,actions=drop"])
+AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ipv6,nw_proto=41,actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ipv6,nw_proto=58,actions=normal"])


> <snip>
> 
>> +dnl Set up tunnel endpoints on OVS outside the namespace.
>> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
>> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
>> +dnl [sleep infinity]
> 
> This line returned. :)

Oops, this line will simply be deleted.

> 
> The rest of the set looks fine to me.  Thanks!
> 
> Best regards, Ilya Maximets.

Again, thanks for your review!

Best Regards,
Nobuhiro MIKI
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to