On 10/19/22 00:07, Jan Scheurich wrote:
> Hi guys,
>
> I am afraid that commit is too long ago that would remember any details that
> caused us to change the code in beb75a40fdc2 ("userspace: Switching of L3
> packets in L2 pipeline"). What I vaguely remember was that I couldn’t
> comprehend the original code and it was not working correctly in some of the
> cases we needed/tested. But perhaps the changes we introduced also had corner
> cases we didn't consider.
>
> Question though:
>
>>> The datapath supports installing wider flows, and OVS relies on this
>>> behavior. For example if ipv4(src=1.1.1.1/192.0.0.0,
>>> dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of
>>> ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed to be
>>> added.
>
> That sounds strange to me. I always believed the datapath only supports
> non-overlapping flows, i.e. a packet can match at most one datapath flow.
> Only with that pre-requisite the dpcls classifier can work without
> priorities. Have I been wrong in this? What would be the semantics of adding
> a wider flow to the datapath? To my knowledge there is no guarantee that the
> dpcls subtables are visited in any specific order that would honor the mask
> width. And the first match will win.
Yes, you're right that classifier is always choosing the first match,
so it can not work correctly in general if there are multiple flows
that match the same packet.
What this change is trying to achieve, IIUC, is a bit of an odd case
where we have multiple flows that match on the same packet, but one
of them is a superset of another. And these flows should have identical
actions, of course, if we want the system to still function correctly.
(Should we actually compare actions to be identical and matches to
be a superset? If actions are not correct, the flow must be soon
revalidated, so that might not be a big concern. If the mask is not
a superset, the flow_wildcards_has_extra() check should catch that
and revalidate one of them, so should not be a problem either, I guess.
But we need to check that flow mods are always using ufid generated
from a full key and not updating actions on an incorrect flow. Same
for flow deletions, we need to be sure that while removing flows we're
actually finding and removing the one that we need.)
We should actually re-check the classier and the caches to make sure
that there will not be any surprises and they are capable of dealing
with possible duplicates...
> Please clarify this. And in which sense OVS relies on this behavior?
The problem is a revalidation process. During revalidation OVS removes
too generic flows from the datapath to avoid incorrect matches, but
allows too narrow flows to stay in the datapath to avoid the dataplane
disruption and also to avoid constant flow deletions if the datapath
ignores wildcards on certain fields/bits. See flow_wildcards_has_extra()
check in the revalidate_ukey__() function.
The problem here is that we have too narrow flow installed and now
OpenFlow rules got changed, so the actual flow should be more generic.
Revalidators will not remove the narrow flow, we will eventually get
an upcall on the packet that doesn't match the narrow flow, but we will
not be able to install a more generic flow because after masking with
the new wider mask the key matches on the narrow flow, so we get EEXIST.
This particular scenario is probably not an actual problem for the
userspace datapath, because upcalls are handled in a different code
path, so the wider flow should be installed just fine alongside with
the too narrow flow (We still need to check that all parts of the
code can handle this). So, it is only for a "slow" path here where
upper layers may decide to add a wider flow via dpif_flow_put(), but
will fail.
It's more of an issue for a kernel datapath where we seem to have
a similar code inside of a kernel.
In any case, if the "fast" upcall path is able to add such flows
(I didn't verify that), the "slow" dpif_flow_put() case should work
the same way to avoid confusion and potential more serious issues in
the future.
Best regards, Ilya Maximets.
>
> BR, Jan
>
>> -----Original Message-----
>> From: Ilya Maximets <[email protected]>
>> Sent: Tuesday, 18 October 2022 21:40
>> To: Eelco Chaudron <[email protected]>; [email protected]
>> Cc: [email protected]; Jan Scheurich <[email protected]>
>> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Use unmasked key when adding
>> datapath flows.
>>
>> On 10/18/22 18:42, Eelco Chaudron wrote:
>>> The datapath supports installing wider flows, and OVS relies on this
>>> behavior. For example if ipv4(src=1.1.1.1/192.0.0.0,
>>> dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of
>>> ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed to be
>>> added.
>>>
>>> However, if we try to add a wildcard rule, the installation fails:
>>>
>>> # ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
>>> ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 #
>>> ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
>>> ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
>>> ovs-vswitchd: updating flow table (File exists)
>>>
>>> The reason is that the key used to determine if the flow is already
>>> present in the system uses the original key ANDed with the mask.
>>> This results in the IP address not being part of the (miniflow) key,
>>> i.e., being substituted with an all-zero value. When doing the actual
>>> lookup, this results in the key wrongfully matching the first flow,
>>> and therefore the flow does not get installed. The solution is to use
>>> the unmasked key for the existence check, the same way this is handled
>>> in the userspace datapath.
>>>
>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>> ---
>>> lib/dpif-netdev.c | 33 +++++++++++++++++++++++++++++----
>>> tests/dpif-netdev.at | 14 ++++++++++++++
>>> 2 files changed, 43 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>> a45b46014..daa00aa2f 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3321,6 +3321,28 @@ netdev_flow_key_init_masked(struct
>> netdev_flow_key *dst,
>>> (dst_u64 - miniflow_get_values(&dst->mf))
>>> * 8); }
>>>
>>> +/* Initializes 'dst' as a copy of 'flow'. */ static inline void
>>> +netdev_flow_key_init(struct netdev_flow_key *key,
>>> + const struct flow *flow) {
>>> + uint64_t *dst = miniflow_values(&key->mf);
>>> + uint32_t hash = 0;
>>> + uint64_t value;
>>> +
>>> + miniflow_map_init(&key->mf, flow);
>>> + miniflow_init(&key->mf, flow);
>>> +
>>> + size_t n = dst - miniflow_get_values(&key->mf);
>>> +
>>> + FLOW_FOR_EACH_IN_MAPS (value, flow, key->mf.map) {
>>> + hash = hash_add64(hash, value);
>>> + }
>>> +
>>> + key->hash = hash_finish(hash, n * 8);
>>> + key->len = netdev_flow_key_size(n); }
>>> +
>>> static inline void
>>> emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
>>> const struct netdev_flow_key *key) @@ -4195,7
>>> +4217,7 @@ static int dpif_netdev_flow_put(struct dpif *dpif, const
>>> struct dpif_flow_put *put) {
>>> struct dp_netdev *dp = get_dp_netdev(dpif);
>>> - struct netdev_flow_key key, mask;
>>> + struct netdev_flow_key key;
>>> struct dp_netdev_pmd_thread *pmd;
>>> struct match match;
>>> ovs_u128 ufid;
>>> @@ -4244,9 +4266,12 @@ dpif_netdev_flow_put(struct dpif *dpif, const
>>> struct dpif_flow_put *put)
>>>
>>> /* Must produce a netdev_flow_key for lookup.
>>> * Use the same method as employed to create the key when adding
>>> - * the flow to the dplcs to make sure they match. */
>>> - netdev_flow_mask_init(&mask, &match);
>>> - netdev_flow_key_init_masked(&key, &match.flow, &mask);
>>> + * the flow to the dplcs to make sure they match.
>>> + * We need to put in the unmasked key as flow_put_on_pmd() will first
>> try
>>> + * to see if an entry exists doing a packet type lookup. As masked-out
>>> + * fields are interpreted as zeros, they could falsely match a wider IP
>>> + * address mask. Installation of the flow will use the match variable.
>>> */
>>> + netdev_flow_key_init(&key, &match.flow);
>>
>>
>> Hi, Eelco. Thanks for the patch.
>> It is indeed an issue that we can lookup the incorrect flow due to new mask
>> being wider, but I'm wondering if that will cause any other issues for L3
>> traffic as you seem to mainly revert part of this previous commit:
>> beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") The
>> commit message was tlking about something not being properly masked...
>>
>> CC: Jan, maybe you remember some details about that?
>>
>> Also, the "Use the same method" doesn't make a lot of sense with the
>> change.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>> if (put->pmd_id == PMD_ID_NULL) {
>>> if (cmap_count(&dp->poll_threads) == 0) { diff --git
>>> a/tests/dpif-netdev.at b/tests/dpif-netdev.at index
>>> 3179e1645..32054c52e 100644
>>> --- a/tests/dpif-netdev.at
>>> +++ b/tests/dpif-netdev.at
>>> @@ -636,6 +636,20 @@ OVS_VSWITCHD_STOP(["/flow: in_port is not an
>>> exact match/d /failed to put/d"]) AT_CLEANUP
>>>
>>> +AT_SETUP([dpif-netdev - check dpctl/add-flow wider ip match])
>>> +OVS_VSWITCHD_START(
>>> + [add-port br0 p1 \
>>> + -- set interface p1 type=dummy
>> options:pstream=punix:$OVS_RUNDIR/p0.sock \
>>> + -- set bridge br0 datapath-type=dummy])
>>> +
>>> +AT_CHECK([ovs-appctl revalidator/pause]) AT_CHECK([ovs-appctl
>>> +dpctl/add-flow
>>> +"in_port(1),eth_type(0x0800),ipv4(src=0.0.0.0/192.0.0.0,dst=0.0.0.0/1
>>> +92.0.0.0,frag=no)" "3"]) AT_CHECK([ovs-appctl dpctl/add-flow
>>> +"in_port(1),eth_type(0x0800),ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.1/
>>> +0.0.0.0,frag=no)" "3"]) AT_CHECK([ovs-appctl revalidator/resume])
>>> +
>>> +OVS_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> +
>>> # SEND_UDP_PKTS([p_name], [p_ofport]) # # Sends 128 packets to port
>>> 'p_name' with different UDP destination ports.
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 45444
>>> 5555731-d4c0eb6f205e38a3&q=1&e=8e5a091e-ba71-48db-af86-
>> f51651461be7&u=
>>> https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev
>>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev