On 6 Jul 2023, at 4:32, Peng He wrote:

> Eelco Chaudron <echau...@redhat.com> 于2023年7月5日周三 22:16写道:
>
>>
>>
>> On 1 Jul 2023, at 4:43, Peng He wrote:
>>
>>> OVS allows overlapping megaflows, as long as the actions of these
>>> megaflows are equal. However, the current implementation of action
>>> modification relies on flow_lookup instead of ufid, this could result
>>> in looking up a wrong megaflow and make the ukeys and megaflows
>> inconsistent
>>>
>>> Just like the test case in the patch, at first we have a rule with the
>>> prefix:
>>>
>>> 10.1.2.0/24
>>>
>>> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with
>> IP
>>> 10.1.2.2 is received.
>>>
>>> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep
>> the
>>> 10.1.2.2/24 megaflow and just changes its action instead of extending
>>> the prefix into 10.1.2.2/16.
>>>
>>> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
>>> this time, we will have an overlapping megaflow with the right prefix:
>>> 10.1.0.2/16
>>>
>>> now we have two megaflows:
>>> 10.1.2.2/24
>>> 10.1.0.2/16
>>>
>>> last, suppose we have changed the ruleset again. The revalidator this
>>> time still decides to change the actions of both megaflows instead of
>>> deleting them.
>>>
>>> The dpif_netdev_flow_put will search the megaflow to modify with unmasked
>>> keys, however it might lookup the wrong megaflow as the key 10.1.2.2
>> matches
>>> both 10.1.2.2/24 and 10.1.0.2/16!
>>>
>>> This patch changes the megaflow lookup code in modification path into
>>> relying the ufid to find the correct megaflow instead of key lookup.
>>
>> Thanks for fixing Ilya’s comments! I’ve also copied in some of the v3
>> discussion, so we can wrap it up here.
>>
>> Acked-by: Eelco Chaudron <echau...@redhat.com>
>>
>> //Eelco
>>
>>> Signed-off-by: Peng He <hepeng.0...@bytedance.com>
>>> ---
>>>  lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
>>>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 82 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 70b953ae6..b90ed1542 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>>>          memset(stats, 0, sizeof *stats);
>>>      }
>>>
>>> -    ovs_mutex_lock(&pmd->flow_mutex);
>>> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>>> -    if (!netdev_flow) {
>>> -        if (put->flags & DPIF_FP_CREATE) {
>>> +    if (put->flags & DPIF_FP_CREATE) {
>>
>>
>> EC> Should this not be:
>> EC>
>> EC> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))
>> EC>
>> EC> I know this will break the fix, but I’m wondering what the possible
>> EC> combinations are.
>> EC> Quickly looking at the code the following ones seem to exist:
>> EC>
>> EC> DPIF_FP_CREATE  -> Create a flow, if one exists return EEXIST
>> EC> DPIF_FP_MODIFY  -> Modify a flow, if it does not exist return ENONT
>> EC> DPIF_FP_CREATE | DPIF_FP_MODIFY   -> If it exists modify it, if it does
>> EC> not exists create it.
>> EC>
>> EC> Could the last combination not potentially fail the lookup?
>> EC>
>> EC> Or are we assuming only standalone DPIF_FP_MODIFY’s are the problem? In
>> EC> theory, it could also be the combination.
>> EC>
>>
>> PH> sorry, I was wrong. Such a combination does exist in
>> PH>the dpif_probe_feature, and it probes the datapath by trying to put
>> flows:
>> PH>
>> PH>     error = dpif_flow_put(dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY |
>> PH>DPIF_FP_PROBE,
>> PH>                           key->data, key->size, NULL, 0,
>> PH>                           nl_actions, nl_actions_size,
>> PH>                           ufid, NON_PMD_CORE_ID, NULL);
>>
>> PH> so we CANNOT change the code into:
>>
>> PH> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))
>>
>> PH> as the issue the patch tries to fix only exist in MODIFY alone path.
>> PH> If CREATE bit is set, we need to go through the CREATE path no matter
>> if
>> PH> MODIFY exist or not.
>>
>> Ok, if this is only used by the probe function we should be fine. I did
>> quickly search the code and it seems to be the way. However, if it’s ever
>> used by any part of ovs with both flags set, it might fail the lookup and
>> we run into the same problem.
>>
>
> By "the same problem", you mean the problem listed in this patch or this
> fix might lead to a potential failure in other part of OVS?

If you have a request with both DPIF_FP_MODIFY and DPIF_FP_CREATE set, the 
lookup could still return the wrong entry. But if this is not used it should be 
fine for now.

>
>>
>>> +        ovs_mutex_lock(&pmd->flow_mutex);
>>> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>>> +        if (!netdev_flow) {
>>>              dp_netdev_flow_add(pmd, match, ufid, put->actions,
>>>                                 put->actions_len, ODPP_NONE);
>>>          } else {
>>> -            error = ENOENT;
>>> +            error = EEXIST;
>>>          }
>>> +        ovs_mutex_unlock(&pmd->flow_mutex);
>>>      } else {
>>> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
>>> +                                              put->key, put->key_len);
>>> +
>>>          if (put->flags & DPIF_FP_MODIFY) {
>>> -            struct dp_netdev_actions *new_actions;
>>> -            struct dp_netdev_actions *old_actions;
>>> +            if (!netdev_flow) {
>>> +                error = ENOENT;
>>> +            } else {
>>> +                struct dp_netdev_actions *new_actions;
>>> +                struct dp_netdev_actions *old_actions;
>>>
>>> -            new_actions = dp_netdev_actions_create(put->actions,
>>> -                                                   put->actions_len);
>>> +                new_actions = dp_netdev_actions_create(put->actions,
>>> +
>>  put->actions_len);
>>>
>>> -            old_actions = dp_netdev_flow_get_actions(netdev_flow);
>>> -            ovsrcu_set(&netdev_flow->actions, new_actions);
>>> +                old_actions = dp_netdev_flow_get_actions(netdev_flow);
>>> +                ovsrcu_set(&netdev_flow->actions, new_actions);
>>>
>>> -            queue_netdev_flow_put(pmd, netdev_flow, match,
>>> -                                  put->actions, put->actions_len,
>>> -                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
>>> -            log_netdev_flow_change(netdev_flow, match, old_actions,
>>> -                                   put->actions, put->actions_len);
>>> +                queue_netdev_flow_put(pmd, netdev_flow, match,
>>> +                                      put->actions, put->actions_len,
>>> +                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
>>> +                log_netdev_flow_change(netdev_flow, match, old_actions,
>>> +                                       put->actions, put->actions_len);
>>>
>>> -            if (stats) {
>>> -                get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
>>> -            }
>>> -            if (put->flags & DPIF_FP_ZERO_STATS) {
>>> +                if (stats) {
>>> +                    get_dpif_flow_status(pmd->dp, netdev_flow, stats,
>> NULL);
>>> +                }
>>> +                if (put->flags & DPIF_FP_ZERO_STATS) {
>>>                  /* XXX: The userspace datapath uses thread local
>> statistics
>>>                   * (for flows), which should be updated only by the
>> owning
>>>                   * thread.  Since we cannot write on stats memory here,
>>> @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>>>                   * - Should the need arise, this operation can be
>> implemented
>>>                   *   by keeping a base value (to be update here) for
>> each
>>>                   *   counter, and subtracting it before outputting the
>> stats */
>>> -                error = EOPNOTSUPP;
>>> +                    error = EOPNOTSUPP;
>>> +                }
>>> +                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
>>>              }
>>> -
>>> -            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
>>> -        } else if (put->flags & DPIF_FP_CREATE) {
>>> -            error = EEXIST;
>>>          } else {
>>> -            /* Overlapping flow. */
>>> -            error = EINVAL;
>>> +            if (netdev_flow) {
>>> +                /* Overlapping flow. */
>>> +                error = EINVAL;
>>> +            }
>>>          }
>>>      }
>>> -    ovs_mutex_unlock(&pmd->flow_mutex);
>>>      return error;
>>>  }
>>>
>>> diff --git a/tests/pmd.at b/tests/pmd.at
>>> index 48f3d432d..85211a30f 100644
>>> --- a/tests/pmd.at
>>> +++ b/tests/pmd.at
>>> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM
>> ovs-vswitchd.log | grep "PMD load based sleeps
>>>
>>>  OVS_VSWITCHD_STOP
>>>  AT_CLEANUP
>>> +
>>> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
>>> +
>>> +OVS_VSWITCHD_START(
>>> +[add-port br0 p1 \
>>> +   -- set interface p1 type=dummy-pmd \
>>> +   -- set bridge br0 datapath-type=dummy \
>>> +   -- add-port br0 p2 \
>>> +   -- set interface p2 type=dummy-pmd --
>>> +])
>>> +
>>> +dnl Add one openflow rule and generate a megaflow.
>>> +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
>> 10.1.2.0/24,actions=p2'])
>>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>>> +
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'],
>> [0], [
>>>
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>>> +dnl Replace openflow rules, trigger the revalidation.
>>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
>> actions=ct(commit)' | dnl
>>> +ovs-ofctl --bundle replace-flows br0 -])
>>> +AT_CHECK([ovs-appctl revalidator/wait])
>>> +
>>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
>> strip_xout_keep_actions], [0], [
>>>
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
>> actions:ct(commit)
>>>
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
>> actions:ct(commit)
>>> +])
>>> +
>>> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
>>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>>> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
>> tuple in pvector of subtable.
>>> +for i in `seq 0 256`;do
>>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
>>> +done
>>> +
>>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' |
>> dnl
>>> +ovs-ofctl --bundle replace-flows br0 -])
>>> +
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
>> strip_xout_keep_actions], [0], [
>>>
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
>>>
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
>>> +])
>>> +
>>> +OVS_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> --
>>> 2.25.1
>>
>>
>
> -- 
> hepeng

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to