On 2023/03/28 21:31, Ilya Maximets wrote:
> On 3/28/23 08:42, Nobuhiro MIKI wrote:
>> SRv6 (Segment Routing IPv6) tunnel vport is responsible
>> for encapsulation and decapsulation the inner packets with
>> IPv6 header and an extended header called SRH
>> (Segment Routing Header). See spec in:
>>
>> https://datatracker.ietf.org/doc/html/rfc8754
>>
>> This patch implements SRv6 tunneling in userspace datapath.
>> It uses `remote_ip` and `local_ip` options as with existing
>> tunnel protocols. It also adds a dedicated `srv6_segs` option
>> to define a sequence of routers called segment list.
>>
>> Signed-off-by: Nobuhiro MIKI <[email protected]>
>> ---
>> Documentation/faq/configuration.rst | 21 +++++
>> Documentation/faq/releases.rst | 1 +
>> NEWS | 2 +
>> include/linux/openvswitch.h | 1 +
>> include/sparse/netinet/in.h | 1 +
>> lib/dpif-netlink-rtnl.c | 5 ++
>> lib/dpif-netlink.c | 5 ++
>> lib/netdev-native-tnl.c | 127 ++++++++++++++++++++++++++++
>> lib/netdev-native-tnl.h | 10 +++
>> lib/netdev-vport.c | 53 ++++++++++++
>> lib/netdev.h | 4 +
>> lib/packets.h | 11 +++
>> lib/tnl-ports.c | 5 +-
>> ofproto/ofproto-dpif-xlate.c | 3 +
>> tests/system-kmod-macros.at | 8 ++
>> tests/system-traffic.at | 119 ++++++++++++++++++++++++++
>> tests/system-userspace-macros.at | 6 ++
>> tests/tunnel.at | 56 ++++++++++++
>> 18 files changed, 437 insertions(+), 1 deletion(-)
>>
>
> <snip>
>
>> +int
>> +netdev_srv6_build_header(const struct netdev *netdev,
>> + struct ovs_action_push_tnl *data,
>> + const struct netdev_tnl_build_header_params
>> *params)
>> +{
>> + struct netdev_vport *dev = netdev_vport_cast(netdev);
>> + struct netdev_tunnel_config *tnl_cfg;
>> + const struct in6_addr *segs;
>> + struct srv6_base_hdr *srh;
>> + struct in6_addr *s;
>> + ovs_be16 dl_type;
>> + int err = 0;
>> + int nr_segs;
>> + int i;
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + tnl_cfg = &dev->tnl_cfg;
>> +
>> + if (tnl_cfg->srv6_num_segs) {
>> + nr_segs = tnl_cfg->srv6_num_segs;
>> + segs = tnl_cfg->srv6_segs;
>> + } else {
>> + /*
>> + * If explicit segment list setting is omitted, tunnel destination
>> + * is considered to be the first segment list.
>> + */
>> + nr_segs = 1;
>> + segs = ¶ms->flow->tunnel.ipv6_dst;
>> + }
>> +
>> + if (!ipv6_addr_equals(&segs[0], ¶ms->flow->tunnel.ipv6_dst)) {
>> + err = EINVAL;
>> + goto out;
>> + }
>> +
>> + srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING);
>> + srh->rt_hdr.segments_left = nr_segs - 1;
>> + srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>> + srh->rt_hdr.hdrlen = 2 * nr_segs;
>> + srh->last_entry = nr_segs - 1;
>> + srh->flags = 0;
>> + srh->tag = 0;
>> +
>> + dl_type = params->flow->dl_type;
>> + if (dl_type == htons(ETH_TYPE_IP)) {
>> + srh->rt_hdr.nexthdr = IPPROTO_IPIP;
>> + } else if (dl_type == htons(ETH_TYPE_IPV6)) {
>> + srh->rt_hdr.nexthdr = IPPROTO_IPV6;
>> + }
>
> We should probably error out here if for some reason it's not an IP or IPv6.
>
For cases other than IP or IPv6, fix to return EOPNOTSUPP, which means
unsupported.
>> +
>> + s = ALIGNED_CAST(struct in6_addr *,
>> + (char *) srh + sizeof *srh);
>> + for (i = 0; i < nr_segs; i++) {
>> + /* Segment list is written to the header in reverse order. */
>> + memcpy(s, &segs[nr_segs - i - 1], sizeof *s);
>> + s++;
>> + }
>> +
>> + data->header_len += sizeof *srh + 8 * srh->rt_hdr.hdrlen;
>> + data->tnl_type = OVS_VPORT_TYPE_SRV6;
>> +out:
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return err;
>> +}
>> +
>> +void
>> +netdev_srv6_push_header(const struct netdev *netdev OVS_UNUSED,
>> + struct dp_packet *packet OVS_UNUSED,
>> + const struct ovs_action_push_tnl *data OVS_UNUSED)
>
> 'packet' and 'data' are actually used by the function.
>
Thanks.
I'll remove the annotation OVS_UNUSED from 'packet' and 'data'.
>> +{
>> + int ip_tot_size;
>> +
>> + netdev_tnl_push_ip_header(packet, data->header,
>> + data->header_len, &ip_tot_size);
>> +}
>> +
>
> <snip>
>
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index a9cf3cbee0be..15c814d6285b 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -3632,6 +3632,9 @@ propagate_tunnel_data_to_flow(struct xlate_ctx *ctx,
>> struct eth_addr dmac,
>> case OVS_VPORT_TYPE_BAREUDP:
>> nw_proto = IPPROTO_UDP;
>> break;
>> + case OVS_VPORT_TYPE_SRV6:
>> + nw_proto = IPPROTO_IPIP;
>
> Sorry, just noticed that we're setting nw_proto to IPIP
> always after encapsulation, even if the packet was IPv6.
> We should check the dl_type of the original flow and set
> the nw_proto appropriately.
>
> You can check that the ping6 test fails if we request to
> drop encapsulated IPv4 packets like this:
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 65bae736f..33d60244e 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1241,7 +1241,8 @@ dnl using veth pair. Kernel side tunnel endpoint (SID)
> is
> dnl 'fc00:a::1/128', so add it to the route.
> ADD_BR([br-underlay])
> ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> -AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "priority=1,actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay
> "priority=100,ipv6,nw_proto=4,actions=drop"])
> AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> AT_CHECK([ip link set dev br-underlay up])
> AT_CHECK([ip -6 route add fc00:a::1/128 dev br-underlay via fc00::1])
> ---
>
> Even though the packet itself has nw_proto=41, the flow structure
> that tracks it has nw_proto=4, and it is used for flow matching
> after encapsulation.
>
> A simple ternary operator solves the issue for me:
>
> case OVS_VPORT_TYPE_SRV6:
> - nw_proto = IPPROTO_IPIP;
> + nw_proto = (flow->dl_type == htons(ETH_TYPE_IP))
> + ? IPPROTO_IPIP : IPPROTO_IPV6;
> break;
>
> Might be good to have this case covered in tests as well.
> Either by something similar to what I did above, or by a
> separate test.
>
> What do you think?
>
I was able to reproduce this problem in my environment.
And I think your solution with ternary operator is great, so I will adopt it.
As for testing, I think it would be more robust if we implement a whitelist
to allow only IPPROTO_IPV6(41) and IPPROTO_ICMPV6(58) as shown below:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 165be2db5c3b..63753adbccce 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1239,9 +1242,12 @@ NS_EXEC([at_ns0], [sysctl -w
net.ipv6.conf.all.forwarding=1])
dnl Set up underlay link from host into the namespace 'at_ns0'
dnl using veth pair. Kernel side tunnel endpoint (SID) is
dnl 'fc00:a::1/128', so add it to the route.
+dnl Only IPPROTO_IPV6(41) and IPPROTO_ICMPV6(58) are needed in underlay link.
ADD_BR([br-underlay])
ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
-AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "priority=1,actions=drop"])
+AT_CHECK([ovs-ofctl add-flow br-underlay
"priority=100,ipv6,nw_proto=41,actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay
"priority=100,ipv6,nw_proto=58,actions=normal"])
> <snip>
>
>> +dnl Set up tunnel endpoints on OVS outside the namespace.
>> +ADD_OVS_TUNNEL6([srv6], [br0], [at_srv6], [fc00:a::1], [fc00:100::100/64])
>> +AT_CHECK([ovs-vsctl set bridge br0 other_config:hwaddr=aa:55:aa:55:00:00])
>> +dnl [sleep infinity]
>
> This line returned. :)
Oops, this line will simply be deleted.
>
> The rest of the set looks fine to me. Thanks!
>
> Best regards, Ilya Maximets.
Again, thanks for your review!
Best Regards,
Nobuhiro MIKI
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev