On 2023/05/11 20:50, Ilya Maximets wrote:
> On 5/11/23 12:39, Nobuhiro MIKI wrote:
>> On 2023/05/11 6:33, Ilya Maximets wrote:
>>> On 5/9/23 11:38, Nobuhiro MIKI wrote:
>>>> It supports flowlabel based load balancing by controlling the flowlabel
>>>> of outer IPv6 header, which is already implemented in Linux kernel as
>>>> seg6_flowlabel sysctl [1].
>>>>
>>>> [1]: https://docs.kernel.org/networking/seg6-sysctl.html
>>>>
>>>> Signed-off-by: Nobuhiro MIKI <[email protected]>
>>>> ---
>>>
>>> Hi. Thanks for the patch! See some comments inline.
>>>
>>
>> Hi Ilya,
>>
>> Thanks for your review!
>>
>>>
>>>> lib/flow.c | 24 +++++++++++
>>>> lib/flow.h | 1 +
>>>> lib/netdev-native-tnl.c | 22 +++++++++-
>>>> lib/netdev-vport.c | 8 ++++
>>>> lib/netdev.h | 12 ++++++
>>>> tests/tunnel-push-pop-ipv6.at | 79 +++++++++++++++++++++++++++++++++++
>>>> vswitchd/vswitch.xml | 29 +++++++++++++
>>>> 7 files changed, 174 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/flow.c b/lib/flow.c
>>>> index 9501a259e9d4..f27ec4795bc7 100644
>>>> --- a/lib/flow.c
>>>> +++ b/lib/flow.c
>>>> @@ -2734,6 +2734,30 @@ flow_hash_in_wildcards(const struct flow *flow,
>>>> return hash_finish(hash, 8 * FLOW_U64S);
>>>> }
>>>>
>>>> +uint32_t
>>>> +flow_hash_srv6_flowlabel(const struct flow *flow, uint32_t basis)
>>>> +{
>>>> + uint32_t hash = basis;
>>>> +
>>>> + if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>>>> + const uint64_t *flow_u64 = (const uint64_t *) flow;
>>>> + int ofs = offsetof(struct flow, ipv6_src) / 8;
>>>> + int end = ofs + 2 * sizeof flow->ipv6_src / 8;
>>>> +
>>>> + for (;ofs < end; ofs++) {
>>>> + hash = hash_add64(hash, flow_u64[ofs]);
>>>> + }
>>>> +
>>>> + hash = hash_add(hash, flow->nw_proto);
>>>> + hash = hash_add(hash, (OVS_FORCE uint32_t) flow->ipv6_label);
>>>> + } else if (flow->dl_type == htons(ETH_TYPE_IP)
>>>> + || flow->dl_type == htons(ETH_TYPE_ARP)) {
>>>> + hash = flow_hash_5tuple(flow, basis);
>>>> + }
>>>> +
>>>> + return hash_finish(hash, 42) & IPV6_LABEL_MASK; /* Arbitrary number.
>>>> */
>>>> +}
>>>
>>> Is it necessary to use a custom hash function for this?
>>> I guess, above can be replaced with a call to flow_hash_5tuple using the
>>> ipv6_label as a basis. OTOH, the seg6_make_flowlabel() in the kernel
>>> just using the skb hash which likely doesn't include label and even just
>>> a random number of locally originated traffic, so maybe mixing in the
>>> label is not really necessary?
>>>
>>> See comments below though.
>>>
>>
>> I also did a quick check seg6_make_flowlabel() in the kernel.
>> It calls skb_get_hash() and flowlabel seems to not be included.
>> I guess I was mistaken. Sorry.
>>
>> Af far as I know, the purpose is load balancing, so I feel 5tuple
>> hash is sufficient. And there is SRV6_FLOWLABEL_COPY if someone want
>> flowlabel based hash instead 5tuple hash.
>>
>>>> +
>>>> /* Sets the VLAN VID that 'flow' matches to 'vid', which is interpreted
>>>> as an
>>>> * OpenFlow 1.0 "dl_vlan" value:
>>>> *
>>>> diff --git a/lib/flow.h b/lib/flow.h
>>>> index a9d026e1ce3b..7b8ef5164465 100644
>>>> --- a/lib/flow.h
>>>> +++ b/lib/flow.h
>>>> @@ -258,6 +258,7 @@ bool flow_hash_fields_valid(enum nx_hash_fields);
>>>> uint32_t flow_hash_in_wildcards(const struct flow *,
>>>> const struct flow_wildcards *,
>>>> uint32_t basis);
>>>> +uint32_t flow_hash_srv6_flowlabel(const struct flow *, uint32_t basis);
>>>>
>>>> bool flow_equal_except(const struct flow *a, const struct flow *b,
>>>> const struct flow_wildcards *);
>>>> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>>>> index 55e1bd567fa1..18bd9df57175 100644
>>>> --- a/lib/netdev-native-tnl.c
>>>> +++ b/lib/netdev-native-tnl.c
>>>> @@ -856,6 +856,7 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>>> struct netdev_tunnel_config *tnl_cfg;
>>>> const struct in6_addr *segs;
>>>> struct srv6_base_hdr *srh;
>>>> + uint32_t ipv6_label = 0;
>>>> struct in6_addr *s;
>>>> ovs_be16 dl_type;
>>>> int err = 0;
>>>> @@ -882,7 +883,26 @@ netdev_srv6_build_header(const struct netdev *netdev,
>>>> goto out;
>>>> }
>>>>
>>>> - srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING, 0);
>>>> + switch (tnl_cfg->srv6_flowlabel) {
>>>> + case SRV6_FLOWLABEL_COPY:
>>>> + ipv6_label = ntohl(params->flow->ipv6_label);
>>>> + break;
>>>> +
>>>> + case SRV6_FLOWLABEL_ZERO:
>>>> + ipv6_label = 0;
>>>> + break;
>>>> +
>>>> + case SRV6_FLOWLABEL_COMPUTE:
>>>> + ipv6_label = flow_hash_srv6_flowlabel(params->flow, 0);
>>>> + break;
>>>> +
>>>> + default:
>>>> + err = EINVAL;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + srh = netdev_tnl_ip_build_header(data, params, IPPROTO_ROUTING,
>>>> + ipv6_label);
>>>
>>> We can't really do that on the header build stage. The main reason
>>> is that we're building the header based on the packet information.
>>> In case of COPY we're using the label from the original packaet, in case
>>> of COMPUTE we're using majority of the packet header to produce the value.
>>> But if we're using packet fields, we have to add these fields to the
>>> match criteria. And that is not happening here.
>>>
>>> I didn't try, but I'm sure that if you'll look at results of ofproto/trace
>>> calls in your tests below, you'll see that even though produced actions
>>> are different, the match criteria is exactly the same. That means that
>>> when the first packet with label A will hit the datapath, the flow that
>>> pushes SRv6 header with flow label A will be added to the datapath.
>>> If the next packet will have label B it will match the installed datapath
>>> flow (because it doesn't match on the label) and the SRv6 header with
>>> label A will be pushed.
>>>
>>> Matching on the actual fields used to produce the action is necessary.
>>> However, there is a workaround. That is to generate these fileds
>>> on a header push, and not on the header build.
>>>
>>> See for example the netdev_tnl_push_udp_header() function. It is updating
>>> the UDP source port on header push. IIUC, this is for the same reason
>>> the SRv6 flow label is updated - to ensure load balancing of UDP flows
>>> in tunnels. The way update is done is by using packet's RSS hash.
>>> See the netdev_tnl_get_src_port(). By using RSS hash we can avoid
>>> re-parsing and re-hashing of the packet. And packets from the same session
>>> should normally have the same RSS hash, so that works. And we don't need
>>> to match on any extra fileds, because the hash of actual packet is used
>>> while pushing the header.
>>>
>>> Another example is netdev_gre_push_header() where we update a sequence
>>> number while pushing a header (doesn't look thread-safe though :) ).
>>>
>>> So, we should probbaly just use RSS hash as a flow label for SRv6 and
>>> set it while pushing SRv6 header (in netdev_tnl_push_ip_header()) in the
>>> COMPUTE case and parse out and use the ipv6_label of the current packet
>>> in case of COPY. The actual packet is available in the header push
>>> context. And kernel's seg6_make_flowlabel() seems to do a very similar
>>> thing. Full packet parsing will not be good for performance reasons.
>>>
>>> Does that make sense?
>>>
>>
>> Thanks very much for your kind explanation.
>> It makes sense to me.
>>
>> I would like to add a test for the flowlabel for the second and
>> subsequent packets in the packet capture by specifying options:pcap
>> for the dummy port and sending the packet with "ovs-appctl
>> netdev-dummy/receive".
>> I have tried this and it seems to work.
>
> Just make sure that you're not checking the exact hash value.
> Hashing implementations depend on compiler flags and CPU
> architecture. So, the exact value of the flowlabel will be
> different depending on these factors. For example, OVS built
> with -msse4.2 will use special CPU instructions for crc hash
> calculations. Without this flag it will use implementation
> of Murmurhash.
>
> You may find one test example for the UDP port number being
> carved out in tests/tunnel-push-pop.at by looking for
> "Source port is based on a packet hash".
>
> So, you may extract the flow label from one packet and compare
> with the other, but you can't directly compare with some
> pre-computed value.
>
Thanks for the explanation.
Instead of using pre-computed values, hash values carved out of
several packets should be compared at runtime.
>>
>> Thanks also for the UDP and GRE examples. I now understand that both
>> rewrite fields at the time of header push.
>>
>> Since header push is performed in the context of the actual packet,
>> I assume that header parsing and hashing is done for all packets.
>> For performance reasons, it makes sense to avoid parsing and hashing
>> and instead use the RSS hash. I would replace it with RSS hash.
>> And it can probably be retrieved with dp_packet_get_rss_hash().
>
> Right. That's what netdev_tnl_get_src_port() does, for example.
>
I'll implement flowlabel hash as well.
Again, thanks for your review. I'll start working on v4.
>>
>>>> srh->rt_hdr.segments_left = nr_segs - 1;
>>>> srh->rt_hdr.type = IPV6_SRCRT_TYPE_4;
>>>> srh->rt_hdr.hdrlen = 2 * nr_segs;
>>>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>>>> index 663ee8606c3b..2141621cf23e 100644
>>>> --- a/lib/netdev-vport.c
>>>> +++ b/lib/netdev-vport.c
>>>> @@ -792,6 +792,14 @@ set_tunnel_config(struct netdev *dev_, const struct
>>>> smap *args, char **errp)
>>>> name, node->value);
>>>> break;
>>>> }
>>>> + } else if (!strcmp(node->key, "srv6_flowlabel")) {
>>>> + if (!strcmp(node->value, "zero")) {
>>>> + tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_ZERO;
>>>> + } else if (!strcmp(node->value, "compute")) {
>>>> + tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COMPUTE;
>>>> + } else {
>>>> + tnl_cfg.srv6_flowlabel = SRV6_FLOWLABEL_COPY;
>>>> + }
>>>> } else if (!strcmp(node->key, "payload_type")) {
>>>> if (!strcmp(node->value, "mpls")) {
>>>> tnl_cfg.payload_ethertype = htons(ETH_TYPE_MPLS);
>>>> diff --git a/lib/netdev.h b/lib/netdev.h
>>>> index ff207f56c28c..743a56ca1629 100644
>>>> --- a/lib/netdev.h
>>>> +++ b/lib/netdev.h
>>>> @@ -97,6 +97,17 @@ enum netdev_pt_mode {
>>>> NETDEV_PT_LEGACY_L3,
>>>> };
>>>>
>>>> +enum netdev_srv6_flowlabel {
>>>> + /* Copy the flowlabel from inner packet. */
>>>> + SRV6_FLOWLABEL_COPY,
>>>> +
>>>> + /* Simply set flowlabel to 0. */
>>>> + SRV6_FLOWLABEL_ZERO,
>>>> +
>>>> + /* Calculate a hash for some fields and set the result to flowlabel.
>>>> */
>>>> + SRV6_FLOWLABEL_COMPUTE,
>>>> +};
>>>> +
>>>> /* Configuration specific to tunnels. */
>>>> struct netdev_tunnel_config {
>>>> ovs_be64 in_key;
>>>> @@ -144,6 +155,7 @@ struct netdev_tunnel_config {
>>>> uint8_t srv6_num_segs;
>>>> #define SRV6_MAX_SEGS 6
>>>> struct in6_addr srv6_segs[SRV6_MAX_SEGS];
>>>> + enum netdev_srv6_flowlabel srv6_flowlabel;
>>>> };
>>>>
>>>> void netdev_run(void);
>>>> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
>>>> index e300fe3a0d26..33edc8319eed 100644
>>>> --- a/tests/tunnel-push-pop-ipv6.at
>>>> +++ b/tests/tunnel-push-pop-ipv6.at
>>>> @@ -1,5 +1,84 @@
>>>> AT_BANNER([tunnel_push_pop_ipv6])
>>>>
>>>> +AT_SETUP([tunnel_push_pop_ipv6 - srv6])
>>>> +
>>>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy
>>>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>>> +AT_CHECK([ovs-vsctl add-br int-br1 -- set bridge int-br1
>>>> datapath_type=dummy], [0])
>>>> +AT_CHECK([ovs-vsctl add-br int-br2 -- set bridge int-br2
>>>> datapath_type=dummy], [0])
>>>> +AT_CHECK([ovs-vsctl add-br int-br3 -- set bridge int-br3
>>>> datapath_type=dummy], [0])
>>>> +AT_CHECK([ovs-vsctl add-port int-br1 t1 -- set Interface t1 type=srv6 \
>>>> + options:remote_ip=2001:cafe::91 ofport_request=2 \
>>>> + options:srv6_flowlabel=copy \
>>>> + ], [0])
>>>> +AT_CHECK([ovs-vsctl add-port int-br2 t2 -- set Interface t2 type=srv6 \
>>>> + options:remote_ip=2001:cafe::92 ofport_request=3 \
>>>> + options:srv6_flowlabel=zero \
>>>> + ], [0])
>>>> +AT_CHECK([ovs-vsctl add-port int-br3 t3 -- set Interface t3 type=srv6 \
>>>> + options:remote_ip=2001:cafe::93 ofport_request=4 \
>>>> + options:srv6_flowlabel=compute \
>>>> + ], [0])
>>>> +
>>>> +dnl First setup dummy interface IP address, then add the route
>>>> +dnl so that tnl-port table can get valid IP address for the device.
>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::88/24], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl ovs/route/add 2001:cafe::0/24 br0], [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::91 aa:55:aa:55:00:01],
>>>> [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:55:aa:55:00:02],
>>>> [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::93 aa:55:aa:55:00:03],
>>>> [0], [OK
>>>> +])
>>>> +AT_CHECK([ovs-ofctl add-flow br0 action=1])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br1 action=2])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br2 action=3])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br3 action=4])
>>>> +
>>>> +dnl Check "srv6_flowlabel=copy".
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(2),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>>>> [0], [stdout])
>>>> +cat stdout
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=12345
>>>> +])
>>>> +
>>>> +dnl Check "srv6_flowlabel=zero".
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(3),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>>>> [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=0
>>>> +])
>>>> +
>>>> +dnl Check "srv6_flowlabel=compute" for IPv4 in IPv6 tunnels.
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)'],
>>>> [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=944785
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=6,tos=0,ttl=64,frag=no),tcp(src=800,dst=900)'],
>>>> [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=772289
>>>> +])
>>>> +
>>>> +dnl Check "srv6_flowlabel=compute" for IPv6 in IPv6 tunnels.
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=0,proto=47,tclass=0x0,hlimit=64)'],
>>>> [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=468935
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::2,label=12345,proto=47,tclass=0x0,hlimit=64)'],
>>>> [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=1012207
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy
>>>> 'in_port(4),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:beef::1,dst=2001:beef::3,label=0,proto=47,tclass=0x0,hlimit=64)'],
>>>> [0], [stdout])
>>>> +AT_CHECK([tail -1 stdout | grep -o 'label=[[0-9]]*'], [0], [dnl
>>>> +label=629290
>>>> +])
>>>> +
>>>> +OVS_VSWITCHD_STOP
>>>> +AT_CLEANUP
>>>> +
>>>> AT_SETUP([tunnel_push_pop_ipv6 - ip6gre])
>>>>
>>>> OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy
>>>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>>> index edb5eafa04c3..7a0682503233 100644
>>>> --- a/vswitchd/vswitch.xml
>>>> +++ b/vswitchd/vswitch.xml
>>>> @@ -3287,6 +3287,35 @@
>>>> <ref column="options" key="remote_ip"/>.
>>>> </p>
>>>> </column>
>>>> + <column name="options" key="srv6_flowlabel"
>>>> + type='{"type": "string",
>>>> + "enum": ["set", ["zero", "copy", "compute"]]}'>
>>>> + <p>
>>>> + Optional.
>>>> + This option controls how flowlabel in outer IPv6 header is
>>>> + configured. This would give you the benefit of IPv6 flow label
>>>
>>> We should avoid pronouns like 'you' in the documentation. It should
>>> describe
>>> the functionality without trying to talk to the reader. E.g. 'It gives'
>>> instead of 'it will give you'.
>>>
>>
>> OK. I'll fix.
>>
>>>> + based load balancing, which is supported by some popular vendor
>>>> + appliances. You can choose from the following three types, as
>>>
>>> This also should be somehow re-worded to not have 'You'. If nothing works,
>>> can just be replaced with 'Users'.
>>>
>>
>> OK.
>>
>>>> + in net.ipv6.seg6_flowlabel syscall.
>>>
>>> s/syscall/sysconfig/
>>>
>>
>> OK.
>>
>>>> + </p>
>>>> + <ul>
>>>> + <li>
>>>> + By default, or if this option is <code>copy</code>, copy the
>>>> + flowlabel of inner IPv6 header to the flowlabel of outer IPv6
>>>> + header. If inner header is not IPv6, it is set to 0.
>>>> + </li>
>>>> + <li>
>>>> + If this option is <code>zero</code>, simply set flowlabel to
>>>> 0.
>>>> + </li>
>>>> + <li>
>>>> + If this option is <code>compute</code>, calculate a hash for
>>>> + some fields in inner header and set the result to flowlabel.
>>>
>>> 'some fields' is a bit strange thing to say. "hash over fields of the inner
>>> header" or smething like that. Same applie to the comment in the code.
>>>
>>
>> OK.
>>
>>>> + If inner packet is IPv6, src_ip, dst_ip, L4 proto, and
>>>> + flowlabel are the targets of hash calculation. If it is IPv4,
>>>> + src_ip, dst_ip, L4 proto, src_port, and dst_port are the
>>>> targets.
>>>
>>> I'm not sure if we can actually say which fields are going to be used
>>> if we go with RSS hash.
>>>
>>
>> I will only mention that RSS hash is used, not field names.
>>
>>>> + </li>
>>>> + </ul>
>>>> + </column>
>>>> </group>
>>>>
>>>> <group title="Patch Options">
>>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev