On 5/9/23 11:38, Nobuhiro MIKI wrote:
> It supports flowlabel based load balancing by controlling the flowlabel
> of outer IPv6 header, which is already implemented in Linux kernel as
> seg6_flowlabel sysctl [1].
> 
> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
> 
> Signed-off-by: Nobuhiro MIKI <[email protected]>
> ---

Hi.  Thanks for the patch!  See some comments inline.

Best regards, Ilya Maximets.

>  lib/flow.c                    | 24 +++++++++++
>  lib/flow.h                    |  1 +
>  lib/netdev-native-tnl.c       | 22 +++++++++-
>  lib/netdev-vport.c            |  8 ++++
>  lib/netdev.h                  | 12 ++++++
>  tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml          | 29 +++++++++++++
>  7 files changed, 174 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index 9501a259e9d4..f27ec4795bc7 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>      return hash_finish(hash, 8 * FLOW_U64S);
>  }
>  
> +uint32_t
> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
> +{
> +    uint32_t hash = basis;
> +
> +    if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> +        const uint64_t *flow_u64 = (const uint64_t *) flow;
> +        int ofs = offsetof(struct flow, ipv6_src) / 8;
> +        int end = ofs + 2 * sizeof flow->ipv6_src / 8;
> +
> +        for (;ofs < end; ofs++) {
> +            hash = hash_add64(hash, flow_u64[ofs]);
> +        }
> +
> +        hash = hash_add(hash, flow->nw_proto);
> +        hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
> +    } else if (flow->dl_type == htons(ETH_TYPE_IP)
> +               || flow->dl_type == htons(ETH_TYPE_ARP)) {
> +        hash = flow_hash_5tuple(flow, basis);
> +    }
> +
> +    return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
> +}

Is it necessary to use a custom hash function for this?
I guess, above can be replaced with a call to flow_hash_5tuple using the
ipv6_label as a basis.  OTOH, the seg6_make_flowlabel() in the kernel
just using the skb hash which likely doesn't include label and even just
a random number of locally originated traffic, so maybe mixing in the
label is not really necessary?

See comments below though.

> +
>  /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as an
>   * OpenFlow 1.0 "dl_vlan" value:
>   *
> diff --git a/lib/flow.h b/lib/flow.h
> index a9d026e1ce3b..7b8ef5164465 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>  uint32_t flow_hash_in_wildcards(const struct flow *,
>                                  const struct flow_wildcards *,
>                                  uint32_t basis);
> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>  
>  bool flow_equal_except(const struct flow *a, const struct flow *b,
>                         const struct flow_wildcards *);
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index 55e1bd567fa1..18bd9df57175 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>      struct netdev_tunnel_config *tnl_cfg;
>      const struct in6_addr *segs;
>      struct srv6_base_hdr *srh;
> +    uint32_t ipv6_label = 0;
>      struct in6_addr *s;
>      ovs_be16 dl_type;
>      int err = 0;
> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>          goto out;
>      }
>  
> -    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
> +    switch (tnl_cfg->srv6_flowlabel) {
> +    case SRV6_FLOWLABEL_COPY:
> +        ipv6_label = ntohl(params->flow->ipv6_label);
> +        break;
> +
> +    case SRV6_FLOWLABEL_ZERO:
> +        ipv6_label = 0;
> +        break;
> +
> +    case SRV6_FLOWLABEL_COMPUTE:
> +        ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
> +        break;
> +
> +    default:
> +        err = EINVAL;
> +        goto out;
> +    }
> +
> +    srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
> +                                     ipv6_label);

We can't really do that on the header build stage.  The main reason
is that we're building the header based on the packet information.
In case of COPY we're using the label from the original packaet, in case
of COMPUTE we're using majority of the packet header to produce the value.
But if we're using packet fields, we have to add these fields to the
match criteria.  And that is not happening here.

I didn't try, but I'm sure that if you'll look at results of ofproto/trace
calls in your tests below, you'll see that even though produced actions
are different, the match criteria is exactly the same.  That means that
when the first packet with label A will hit the datapath, the flow that
pushes SRv6 header with flow label A will be added to the datapath.
If the next packet will have label B it will match the installed datapath
flow (because it doesn't match on the label) and the SRv6 header with
label A will be pushed.

Matching on the actual fields used to produce the action is necessary.
However, there is a workaround.  That is to generate these fileds
on a header push, and not on the header build.

See for example the netdev_tnl_push_udp_header() function.  It is updating
the UDP source port on header push.  IIUC, this is for the same reason
the SRv6 flow label is updated - to ensure load balancing of UDP flows
in tunnels.  The way update is done is by using packet's RSS hash.
See the netdev_tnl_get_src_port().  By using RSS hash we can avoid
re-parsing and re-hashing of the packet.  And packets from the same session
should normally have the same RSS hash, so that works.  And we don't need
to match on any extra fileds, because the hash of actual packet is used
while pushing the header.

Another example is netdev_gre_push_header() where we update a sequence
number while pushing a header (doesn't look thread-safe though :) ).

So, we should probbaly just use RSS hash as a flow label for SRv6 and
set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
COMPUTE case and parse out and use the ipv6_label of the current packet
in case of COPY.  The actual packet is available in the header push
context.  And kernel's seg6_make_flowlabel() seems to do a very similar
thing.  Full packet parsing will not be good for performance reasons.

Does that make sense?

>      srh->rt_hdr.segments_left = nr_segs - 1;
>      srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>      srh->rt_hdr.hdrlen = 2 * nr_segs;
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 663ee8606c3b..2141621cf23e 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
> *args, char **errp)
>                                name, node->value);
>                  break;
>              }
> +        } else if (!strcmp(node->key, "srv6_flowlabel")) {
> +            if (!strcmp(node->value, "zero")) {
> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
> +            } else if (!strcmp(node->value, "compute")) {
> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
> +            } else {
> +                tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
> +            }
>          } else if (!strcmp(node->key, "payload_type")) {
>              if (!strcmp(node->value, "mpls")) {
>                   tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
> diff --git a/lib/netdev.h b/lib/netdev.h
> index ff207f56c28c..743a56ca1629 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>      NETDEV_PT_LEGACY_L3,
>  };
>  
> +enum netdev_srv6_flowlabel {
> +    /* Copy the flowlabel from inner packet. */
> +    SRV6_FLOWLABEL_COPY,
> +
> +    /* Simply set flowlabel to 0. */
> +    SRV6_FLOWLABEL_ZERO,
> +
> +    /* Calculate a hash for some fields and set the result to flowlabel. */
> +    SRV6_FLOWLABEL_COMPUTE,
> +};
> +
>  /* Configuration specific to tunnels. */
>  struct netdev_tunnel_config {
>      ovs_be64 in_key;
> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>      uint8_t srv6_num_segs;
>      #define SRV6_MAX_SEGS 6
>      struct in6_addr srv6_segs[SRV6_MAX_SEGS];
> +    enum netdev_srv6_flowlabel srv6_flowlabel;
>  };
>  
>  void netdev_run(void);
> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
> index e300fe3a0d26..33edc8319eed 100644
> --- a/tests/tunnel-push-pop-ipv6.at
> +++ b/tests/tunnel-push-pop-ipv6.at
> @@ -1,5 +1,84 @@
>  AT_BANNER([tunnel_push_pop_ipv6])
>  
> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
> +
> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1 
> datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2 
> datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3 
> datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
> +                       options:remote_ip=2001:cafe::91 ofport_request=2 \
> +                       options:srv6_flowlabel=copy \
> +                       ], [0])
> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
> +                       options:remote_ip=2001:cafe::92 ofport_request=3 \
> +                       options:srv6_flowlabel=zero \
> +                       ], [0])
> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
> +                       options:remote_ip=2001:cafe::93 ofport_request=4 \
> +                       options:srv6_flowlabel=compute \
> +                       ], [0])
> +
> +dnl First setup dummy interface IP address, then add the route
> +dnl so that tnl-port table can get valid IP address for the device.
> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01], 
> [0], [OK
> +])
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02], 
> [0], [OK
> +])
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03], 
> [0], [OK
> +])
> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
> +
> +dnl Check "srv6_flowlabel=copy".
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>  [0], [stdout])
> +cat stdout
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=12345
> +])
> +
> +dnl Check "srv6_flowlabel=zero".
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=0
> +])
> +
> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=944785
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=772289
> +])
> +
> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=468935
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=1012207
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
> +label=629290
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>  
>  OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index edb5eafa04c3..7a0682503233 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3287,6 +3287,35 @@
>             <ref column="options" key="remote_ip"/>.
>          </p>
>        </column>
> +      <column name="options" key="srv6_flowlabel"
> +              type='{"type": "string",
> +                     "enum": ["set", ["zero", "copy", "compute"]]}'>
> +        <p>
> +          Optional.
> +          This option controls how flowlabel in outer IPv6 header is
> +          configured. This would give you the benefit of IPv6 flow label

We should avoid pronouns like 'you' in the documentation.  It should describe
the functionality without trying to talk to the reader.  E.g. 'It gives'
instead of 'it will give you'.

> +          based load balancing, which is supported by some popular vendor
> +          appliances. You can choose from the following three types, as

This also should be somehow re-worded to not have 'You'.  If nothing works,
can just be replaced with 'Users'.

> +          in net.ipv6.seg6_flowlabel syscall.

s/syscall/sysconfig/

> +        </p>
> +        <ul>
> +          <li>
> +            By default, or if this option is <code>copy</code>, copy the
> +            flowlabel of inner IPv6 header to the flowlabel of outer IPv6
> +            header. If inner header is not IPv6, it is set to 0.
> +          </li>
> +          <li>
> +            If this option is <code>zero</code>, simply set flowlabel to 0.
> +          </li>
> +          <li>
> +            If this option is <code>compute</code>, calculate a hash for
> +            some fields in inner header and set the result to flowlabel.

'some fields' is a bit strange thing to say.  "hash over fields of the inner
header" or smething like that.  Same applie to the comment in the code.

> +            If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
> +            flowlabel are the targets of hash calculation. If it is IPv4,
> +            src_ip, dst_ip, L4 proto, src_port, and dst_port are the targets.

I'm not sure if we can actually say which fields are going to be used
if we go with RSS hash.

> +          </li>
> +        </ul>
> +      </column>
>      </group>
>  
>      <group title="Patch Options">

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to