On 2023/05/11 6:33, Ilya Maximets wrote:
> On 5/9/23 11:38, Nobuhiro MIKI wrote:
>> It supports flowlabel based load balancing by controlling the flowlabel
>> of outer IPv6 header, which is already implemented in Linux kernel as
>> seg6_flowlabel sysctl [1].
>>
>> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
>>
>> Signed-off-by: Nobuhiro MIKI <[email protected]>
>> ---
>
> Hi. Thanks for the patch! See some comments inline.
>
Hi Ilya,
Thanks for your review!
>
>> lib/flow.c | 24 +++++++++++
>> lib/flow.h | 1 +
>> lib/netdev-native-tnl.c | 22 +++++++++-
>> lib/netdev-vport.c | 8 ++++
>> lib/netdev.h | 12 ++++++
>> tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>> vswitchd/vswitch.xml | 29 +++++++++++++
>> 7 files changed, 174 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/flow.c b/lib/flow.c
>> index 9501a259e9d4..f27ec4795bc7 100644
>> --- a/lib/flow.c
>> +++ b/lib/flow.c
>> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>> return hash_finish(hash, 8 * FLOW_U64S);
>> }
>>
>> +uint32_t
>> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
>> +{
>> + uint32_t hash = basis;
>> +
>> + if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>> + const uint64_t *flow_u64 = (const uint64_t *) flow;
>> + int ofs = offsetof(struct flow, ipv6_src) / 8;
>> + int end = ofs + 2 * sizeof flow->ipv6_src / 8;
>> +
>> + for (;ofs < end; ofs++) {
>> + hash = hash_add64(hash, flow_u64[ofs]);
>> + }
>> +
>> + hash = hash_add(hash, flow->nw_proto);
>> + hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
>> + } else if (flow->dl_type == htons(ETH_TYPE_IP)
>> + || flow->dl_type == htons(ETH_TYPE_ARP)) {
>> + hash = flow_hash_5tuple(flow, basis);
>> + }
>> +
>> + return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number. */
>> +}
>
> Is it necessary to use a custom hash function for this?
> I guess, above can be replaced with a call to flow_hash_5tuple using the
> ipv6_label as a basis. OTOH, the seg6_make_flowlabel() in the kernel
> just using the skb hash which likely doesn't include label and even just
> a random number of locally originated traffic, so maybe mixing in the
> label is not really necessary?
>
> See comments below though.
>
I also did a quick check seg6_make_flowlabel() in the kernel.
It calls skb_get_hash() and flowlabel seems to not be included.
I guess I was mistaken. Sorry.
Af far as I know, the purpose is load balancing, so I feel 5tuple
hash is sufficient. And there is SRV6_FLOWLABEL_COPY if someone want
flowlabel based hash instead 5tuple hash.
>> +
>> /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted as
>> an
>> * OpenFlow 1.0 "dl_vlan" value:
>> *
>> diff --git a/lib/flow.h b/lib/flow.h
>> index a9d026e1ce3b..7b8ef5164465 100644
>> --- a/lib/flow.h
>> +++ b/lib/flow.h
>> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>> uint32_t flow_hash_in_wildcards(const struct flow *,
>> const struct flow_wildcards *,
>> uint32_t basis);
>> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>>
>> bool flow_equal_except(const struct flow *a, const struct flow *b,
>> const struct flow_wildcards *);
>> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>> index 55e1bd567fa1..18bd9df57175 100644
>> --- a/lib/netdev-native-tnl.c
>> +++ b/lib/netdev-native-tnl.c
>> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>> struct netdev_tunnel_config *tnl_cfg;
>> const struct in6_addr *segs;
>> struct srv6_base_hdr *srh;
>> + uint32_t ipv6_label = 0;
>> struct in6_addr *s;
>> ovs_be16 dl_type;
>> int err = 0;
>> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>> goto out;
>> }
>>
>> - srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
>> + switch (tnl_cfg->srv6_flowlabel) {
>> + case SRV6_FLOWLABEL_COPY:
>> + ipv6_label = ntohl(params->flow->ipv6_label);
>> + break;
>> +
>> + case SRV6_FLOWLABEL_ZERO:
>> + ipv6_label = 0;
>> + break;
>> +
>> + case SRV6_FLOWLABEL_COMPUTE:
>> + ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
>> + break;
>> +
>> + default:
>> + err = EINVAL;
>> + goto out;
>> + }
>> +
>> + srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
>> + ipv6_label);
>
> We can't really do that on the header build stage. The main reason
> is that we're building the header based on the packet information.
> In case of COPY we're using the label from the original packaet, in case
> of COMPUTE we're using majority of the packet header to produce the value.
> But if we're using packet fields, we have to add these fields to the
> match criteria. And that is not happening here.
>
> I didn't try, but I'm sure that if you'll look at results of ofproto/trace
> calls in your tests below, you'll see that even though produced actions
> are different, the match criteria is exactly the same. That means that
> when the first packet with label A will hit the datapath, the flow that
> pushes SRv6 header with flow label A will be added to the datapath.
> If the next packet will have label B it will match the installed datapath
> flow (because it doesn't match on the label) and the SRv6 header with
> label A will be pushed.
>
> Matching on the actual fields used to produce the action is necessary.
> However, there is a workaround. That is to generate these fileds
> on a header push, and not on the header build.
>
> See for example the netdev_tnl_push_udp_header() function. It is updating
> the UDP source port on header push. IIUC, this is for the same reason
> the SRv6 flow label is updated - to ensure load balancing of UDP flows
> in tunnels. The way update is done is by using packet's RSS hash.
> See the netdev_tnl_get_src_port(). By using RSS hash we can avoid
> re-parsing and re-hashing of the packet. And packets from the same session
> should normally have the same RSS hash, so that works. And we don't need
> to match on any extra fileds, because the hash of actual packet is used
> while pushing the header.
>
> Another example is netdev_gre_push_header() where we update a sequence
> number while pushing a header (doesn't look thread-safe though :) ).
>
> So, we should probbaly just use RSS hash as a flow label for SRv6 and
> set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
> COMPUTE case and parse out and use the ipv6_label of the current packet
> in case of COPY. The actual packet is available in the header push
> context. And kernel's seg6_make_flowlabel() seems to do a very similar
> thing. Full packet parsing will not be good for performance reasons.
>
> Does that make sense?
>
Thanks very much for your kind explanation.
It makes sense to me.
I would like to add a test for the flowlabel for the second and
subsequent packets in the packet capture by specifying options:pcap
for the dummy port and sending the packet with "ovs-appctl
netdev-dummy/receive".
I have tried this and it seems to work.
Thanks also for the UDP and GRE examples. I now understand that both
rewrite fields at the time of header push.
Since header push is performed in the context of the actual packet,
I assume that header parsing and hashing is done for all packets.
For performance reasons, it makes sense to avoid parsing and hashing
and instead use the RSS hash. I would replace it with RSS hash.
And it can probably be retrieved with dp_packet_get_rss_hash().
>> srh->rt_hdr.segments_left = nr_segs - 1;
>> srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>> srh->rt_hdr.hdrlen = 2 * nr_segs;
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index 663ee8606c3b..2141621cf23e 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct
>> smap *args, char **errp)
>> name, node->value);
>> break;
>> }
>> + } else if (!strcmp(node->key, "srv6_flowlabel")) {
>> + if (!strcmp(node->value, "zero")) {
>> + tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
>> + } else if (!strcmp(node->value, "compute")) {
>> + tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
>> + } else {
>> + tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
>> + }
>> } else if (!strcmp(node->key, "payload_type")) {
>> if (!strcmp(node->value, "mpls")) {
>> tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
>> diff --git a/lib/netdev.h b/lib/netdev.h
>> index ff207f56c28c..743a56ca1629 100644
>> --- a/lib/netdev.h
>> +++ b/lib/netdev.h
>> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>> NETDEV_PT_LEGACY_L3,
>> };
>>
>> +enum netdev_srv6_flowlabel {
>> + /* Copy the flowlabel from inner packet. */
>> + SRV6_FLOWLABEL_COPY,
>> +
>> + /* Simply set flowlabel to 0. */
>> + SRV6_FLOWLABEL_ZERO,
>> +
>> + /* Calculate a hash for some fields and set the result to flowlabel. */
>> + SRV6_FLOWLABEL_COMPUTE,
>> +};
>> +
>> /* Configuration specific to tunnels. */
>> struct netdev_tunnel_config {
>> ovs_be64 in_key;
>> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>> uint8_t srv6_num_segs;
>> #define SRV6_MAX_SEGS 6
>> struct in6_addr srv6_segs[SRV6_MAX_SEGS];
>> + enum netdev_srv6_flowlabel srv6_flowlabel;
>> };
>>
>> void netdev_run(void);
>> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
>> index e300fe3a0d26..33edc8319eed 100644
>> --- a/tests/tunnel-push-pop-ipv6.at
>> +++ b/tests/tunnel-push-pop-ipv6.at
>> @@ -1,5 +1,84 @@
>> AT_BANNER([tunnel_push_pop_ipv6])
>>
>> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
>> +
>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy
>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1
>> datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2
>> datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3
>> datapath_type=dummy], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
>> + options:remote_ip=2001:cafe::91 ofport_request=2 \
>> + options:srv6_flowlabel=copy \
>> + ], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
>> + options:remote_ip=2001:cafe::92 ofport_request=3 \
>> + options:srv6_flowlabel=zero \
>> + ], [0])
>> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
>> + options:remote_ip=2001:cafe::93 ofport_request=4 \
>> + options:srv6_flowlabel=compute \
>> + ], [0])
>> +
>> +dnl First setup dummy interface IP address, then add the route
>> +dnl so that tnl-port table can get valid IP address for the device.
>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01],
>> [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02],
>> [0], [OK
>> +])
>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03],
>> [0], [OK
>> +])
>> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
>> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
>> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
>> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
>> +
>> +dnl Check "srv6_flowlabel=copy".
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>> [0], [stdout])
>> +cat stdout
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=12345
>> +])
>> +
>> +dnl Check "srv6_flowlabel=zero".
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>> [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=0
>> +])
>> +
>> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'],
>> [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=944785
>> +])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'],
>> [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=772289
>> +])
>> +
>> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'],
>> [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=468935
>> +])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>> [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=1012207
>> +])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'],
>> [0], [stdout])
>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>> +label=629290
>> +])
>> +
>> +OVS_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>> AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>>
>> OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy
>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index edb5eafa04c3..7a0682503233 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3287,6 +3287,35 @@
>> <ref column="options" key="remote_ip"/>.
>> </p>
>> </column>
>> + <column name="options" key="srv6_flowlabel"
>> + type='{"type": "string",
>> + "enum": ["set", ["zero", "copy", "compute"]]}'>
>> + <p>
>> + Optional.
>> + This option controls how flowlabel in outer IPv6 header is
>> + configured. This would give you the benefit of IPv6 flow label
>
> We should avoid pronouns like 'you' in the documentation. It should describe
> the functionality without trying to talk to the reader. E.g. 'It gives'
> instead of 'it will give you'.
>
OK. I'll fix.
>> + based load balancing, which is supported by some popular vendor
>> + appliances. You can choose from the following three types, as
>
> This also should be somehow re-worded to not have 'You'. If nothing works,
> can just be replaced with 'Users'.
>
OK.
>> + in net.ipv6.seg6_flowlabel syscall.
>
> s/syscall/sysconfig/
>
OK.
>> + </p>
>> + <ul>
>> + <li>
>> + By default, or if this option is <code>copy</code>, copy the
>> + flowlabel of inner IPv6 header to the flowlabel of outer IPv6
>> + header. If inner header is not IPv6, it is set to 0.
>> + </li>
>> + <li>
>> + If this option is <code>zero</code>, simply set flowlabel to 0.
>> + </li>
>> + <li>
>> + If this option is <code>compute</code>, calculate a hash for
>> + some fields in inner header and set the result to flowlabel.
>
> 'some fields' is a bit strange thing to say. "hash over fields of the inner
> header" or smething like that. Same applie to the comment in the code.
>
OK.
>> + If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
>> + flowlabel are the targets of hash calculation. If it is IPv4,
>> + src_ip, dst_ip, L4 proto, src_port, and dst_port are the
>> targets.
>
> I'm not sure if we can actually say which fields are going to be used
> if we go with RSS hash.
>
I will only mention that RSS hash is used, not field names.
>> + </li>
>> + </ul>
>> + </column>
>> </group>
>>
>> <group title="Patch Options">
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev