On 19 Oct 2022, at 1:08, Ilya Maximets wrote:
> On 10/19/22 00:07, Jan Scheurich wrote:
>> Hi guys,
>>
>> I am afraid that commit is too long ago that would remember any details that
>> caused us to change the code in beb75a40fdc2 ("userspace: Switching of L3
>> packets in L2 pipeline"). What I vaguely remember was that I couldn’t
>> comprehend the original code and it was not working correctly in some of the
>> cases we needed/tested. But perhaps the changes we introduced also had
>> corner cases we didn't consider.
First of all, thanks Ilya, for following this up while I was asleep and digging
into who made the change :) I was trying to get the patch out quickly for some
feedback and forgot this step :(
My change does not seem to cause any failures in the test cases you changed.
But it looks like you did not have any specific test cases for the problem you
were trying to solve.
>> Question though:
>>
>>>> The datapath supports installing wider flows, and OVS relies on this
>>>> behavior. For example if ipv4(src=1.1.1.1/192.0.0.0,
>>>> dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of
>>>> ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed to be
>>>> added.
>>
>> That sounds strange to me. I always believed the datapath only supports
>> non-overlapping flows, i.e. a packet can match at most one datapath flow.
>> Only with that pre-requisite the dpcls classifier can work without
>> priorities. Have I been wrong in this? What would be the semantics of adding
>> a wider flow to the datapath? To my knowledge there is no guarantee that the
>> dpcls subtables are visited in any specific order that would honor the mask
>> width. And the first match will win.
>
> Yes, you're right that classifier is always choosing the first match,
> so it can not work correctly in general if there are multiple flows
> that match the same packet.
>
> What this change is trying to achieve, IIUC, is a bit of an odd case
> where we have multiple flows that match on the same packet, but one
> of them is a superset of another. And these flows should have identical
> actions, of course, if we want the system to still function correctly.
> (Should we actually compare actions to be identical and matches to
> be a superset? If actions are not correct, the flow must be soon
> revalidated, so that might not be a big concern. If the mask is not
> a superset, the flow_wildcards_has_extra() check should catch that
> and revalidate one of them, so should not be a problem either, I guess.
> But we need to check that flow mods are always using ufid generated
> from a full key and not updating actions on an incorrect flow. Same
> for flow deletions, we need to be sure that while removing flows we're
> actually finding and removing the one that we need.)
Let me take another look at this delete code, as the mod code uses the same
path as add, so that should be taken care of.
> We should actually re-check the classier and the caches to make sure
> that there will not be any surprises and they are capable of dealing
> with possible duplicates...
I did take a brief look at it and could not find any red flags, but if you have
something specific in mind please let me know and I recheck.
Also as the current PMD datapath code already uses it in this way (see below),
we might have caught issues earlier if the exist.
>> Please clarify this. And in which sense OVS relies on this behavior?
>
> The problem is a revalidation process. During revalidation OVS removes
> too generic flows from the datapath to avoid incorrect matches, but
> allows too narrow flows to stay in the datapath to avoid the dataplane
> disruption and also to avoid constant flow deletions if the datapath
> ignores wildcards on certain fields/bits. See flow_wildcards_has_extra()
> check in the revalidate_ukey__() function.
>
> The problem here is that we have too narrow flow installed and now
> OpenFlow rules got changed, so the actual flow should be more generic.
> Revalidators will not remove the narrow flow, we will eventually get
> an upcall on the packet that doesn't match the narrow flow, but we will
> not be able to install a more generic flow because after masking with
> the new wider mask the key matches on the narrow flow, so we get EEXIST.
>
> This particular scenario is probably not an actual problem for the
> userspace datapath, because upcalls are handled in a different code
> path, so the wider flow should be installed just fine alongside with
> the too narrow flow (We still need to check that all parts of the
> code can handle this). So, it is only for a "slow" path here where
> upper layers may decide to add a wider flow via dpif_flow_put(), but
> will fail.
Thats why I had the remark in the “the same way this is handled the userspace
datapath.” in the commit message.
So the dpif_netdev_flow_put() seems to be only used in the dummy datapath used
by all the test cases, but the actual dpif-netdev path seems to use the
handle_packet_upcall().
Where in handle_packet_upcall() it will do a dp_netdev_pmd_lookup_flow() using
the unmasked packet key (like my change below), and then does the
dp_netdev_flow_add() using the masked ukey.
Jan, I’m wondering why you only changed the dpif_netdev_flow_put() and not the
PMD path, as this is probably the path used in real life. Or is there some odd
kernel tunnel combination that is causing this path to be utilized? If so, do
you have a test case you can run on this change? The default dp make checks all
pass.
> It's more of an issue for a kernel datapath where we seem to have
> a similar code inside of a kernel.
Yes, I do have a kernel patch ready also for this. But I waited because I
wanted to get the userspace patch in/discussed first (before someone in the
kernel does a quick ack and commit ;)
> In any case, if the "fast" upcall path is able to add such flows
> (I didn't verify that), the "slow" dpif_flow_put() case should work
> the same way to avoid confusion and potential more serious issues in
> the future.
See above, I can confirm the fast path works similar to this change, i.e. using
the packet ukey.
> Best regards, Ilya Maximets.
>
>>
>> BR, Jan
>>
>>> -----Original Message-----
>>> From: Ilya Maximets <[email protected]>
>>> Sent: Tuesday, 18 October 2022 21:40
>>> To: Eelco Chaudron <[email protected]>; [email protected]
>>> Cc: [email protected]; Jan Scheurich <[email protected]>
>>> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Use unmasked key when adding
>>> datapath flows.
>>>
>>> On 10/18/22 18:42, Eelco Chaudron wrote:
>>>> The datapath supports installing wider flows, and OVS relies on this
>>>> behavior. For example if ipv4(src=1.1.1.1/192.0.0.0,
>>>> dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of
>>>> ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed to be
>>>> added.
>>>>
>>>> However, if we try to add a wildcard rule, the installation fails:
>>>>
>>>> # ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
>>>> ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 #
>>>> ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
>>>> ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
>>>> ovs-vswitchd: updating flow table (File exists)
>>>>
>>>> The reason is that the key used to determine if the flow is already
>>>> present in the system uses the original key ANDed with the mask.
>>>> This results in the IP address not being part of the (miniflow) key,
>>>> i.e., being substituted with an all-zero value. When doing the actual
>>>> lookup, this results in the key wrongfully matching the first flow,
>>>> and therefore the flow does not get installed. The solution is to use
>>>> the unmasked key for the existence check, the same way this is handled
>>>> in the userspace datapath.
>>>>
>>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>>> ---
>>>> lib/dpif-netdev.c | 33 +++++++++++++++++++++++++++++----
>>>> tests/dpif-netdev.at | 14 ++++++++++++++
>>>> 2 files changed, 43 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>>> a45b46014..daa00aa2f 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -3321,6 +3321,28 @@ netdev_flow_key_init_masked(struct
>>> netdev_flow_key *dst,
>>>> (dst_u64 - miniflow_get_values(&dst->mf))
>>>> * 8); }
>>>>
>>>> +/* Initializes 'dst' as a copy of 'flow'. */ static inline void
>>>> +netdev_flow_key_init(struct netdev_flow_key *key,
>>>> + const struct flow *flow) {
>>>> + uint64_t *dst = miniflow_values(&key->mf);
>>>> + uint32_t hash = 0;
>>>> + uint64_t value;
>>>> +
>>>> + miniflow_map_init(&key->mf, flow);
>>>> + miniflow_init(&key->mf, flow);
>>>> +
>>>> + size_t n = dst - miniflow_get_values(&key->mf);
>>>> +
>>>> + FLOW_FOR_EACH_IN_MAPS (value, flow, key->mf.map) {
>>>> + hash = hash_add64(hash, value);
>>>> + }
>>>> +
>>>> + key->hash = hash_finish(hash, n * 8);
>>>> + key->len = netdev_flow_key_size(n); }
>>>> +
>>>> static inline void
>>>> emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
>>>> const struct netdev_flow_key *key) @@ -4195,7
>>>> +4217,7 @@ static int dpif_netdev_flow_put(struct dpif *dpif, const
>>>> struct dpif_flow_put *put) {
>>>> struct dp_netdev *dp = get_dp_netdev(dpif);
>>>> - struct netdev_flow_key key, mask;
>>>> + struct netdev_flow_key key;
>>>> struct dp_netdev_pmd_thread *pmd;
>>>> struct match match;
>>>> ovs_u128 ufid;
>>>> @@ -4244,9 +4266,12 @@ dpif_netdev_flow_put(struct dpif *dpif, const
>>>> struct dpif_flow_put *put)
>>>>
>>>> /* Must produce a netdev_flow_key for lookup.
>>>> * Use the same method as employed to create the key when adding
>>>> - * the flow to the dplcs to make sure they match. */
>>>> - netdev_flow_mask_init(&mask, &match);
>>>> - netdev_flow_key_init_masked(&key, &match.flow, &mask);
>>>> + * the flow to the dplcs to make sure they match.
>>>> + * We need to put in the unmasked key as flow_put_on_pmd() will first
>>> try
>>>> + * to see if an entry exists doing a packet type lookup. As masked-out
>>>> + * fields are interpreted as zeros, they could falsely match a wider
>>>> IP
>>>> + * address mask. Installation of the flow will use the match
>>>> variable. */
>>>> + netdev_flow_key_init(&key, &match.flow);
>>>
>>>
>>> Hi, Eelco. Thanks for the patch.
>>> It is indeed an issue that we can lookup the incorrect flow due to new mask
>>> being wider, but I'm wondering if that will cause any other issues for L3
>>> traffic as you seem to mainly revert part of this previous commit:
>>> beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") The
>>> commit message was tlking about something not being properly masked...
>>>
>>> CC: Jan, maybe you remember some details about that?
>>>
>>> Also, the "Use the same method" doesn't make a lot of sense with the
>>> change.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>>>
>>>> if (put->pmd_id == PMD_ID_NULL) {
>>>> if (cmap_count(&dp->poll_threads) == 0) { diff --git
>>>> a/tests/dpif-netdev.at b/tests/dpif-netdev.at index
>>>> 3179e1645..32054c52e 100644
>>>> --- a/tests/dpif-netdev.at
>>>> +++ b/tests/dpif-netdev.at
>>>> @@ -636,6 +636,20 @@ OVS_VSWITCHD_STOP(["/flow: in_port is not an
>>>> exact match/d /failed to put/d"]) AT_CLEANUP
>>>>
>>>> +AT_SETUP([dpif-netdev - check dpctl/add-flow wider ip match])
>>>> +OVS_VSWITCHD_START(
>>>> + [add-port br0 p1 \
>>>> + -- set interface p1 type=dummy
>>> options:pstream=punix:$OVS_RUNDIR/p0.sock \
>>>> + -- set bridge br0 datapath-type=dummy])
>>>> +
>>>> +AT_CHECK([ovs-appctl revalidator/pause]) AT_CHECK([ovs-appctl
>>>> +dpctl/add-flow
>>>> +"in_port(1),eth_type(0x0800),ipv4(src=0.0.0.0/192.0.0.0,dst=0.0.0.0/1
>>>> +92.0.0.0,frag=no)" "3"]) AT_CHECK([ovs-appctl dpctl/add-flow
>>>> +"in_port(1),eth_type(0x0800),ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.1/
>>>> +0.0.0.0,frag=no)" "3"]) AT_CHECK([ovs-appctl revalidator/resume])
>>>> +
>>>> +OVS_VSWITCHD_STOP
>>>> +AT_CLEANUP
>>>> +
>>>> # SEND_UDP_PKTS([p_name], [p_ofport]) # # Sends 128 packets to port
>>>> 'p_name' with different UDP destination ports.
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>>> 45444
>>>> 5555731-d4c0eb6f205e38a3&q=1&e=8e5a091e-ba71-48db-af86-
>>> f51651461be7&u=
>>>> https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev
>>>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev