Eelco Chaudron <echau...@redhat.com> 于2023年7月6日周四 15:52写道:

>
>
> On 6 Jul 2023, at 4:32, Peng He wrote:
>
> > Eelco Chaudron <echau...@redhat.com> 于2023年7月5日周三 22:16写道:
> >
> >>
> >>
> >> On 1 Jul 2023, at 4:43, Peng He wrote:
> >>
> >>> OVS allows overlapping megaflows, as long as the actions of these
> >>> megaflows are equal. However, the current implementation of action
> >>> modification relies on flow_lookup instead of ufid, this could result
> >>> in looking up a wrong megaflow and make the ukeys and megaflows
> >> inconsistent
> >>>
> >>> Just like the test case in the patch, at first we have a rule with the
> >>> prefix:
> >>>
> >>> 10.1.2.0/24
> >>>
> >>> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet
> with
> >> IP
> >>> 10.1.2.2 is received.
> >>>
> >>> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep
> >> the
> >>> 10.1.2.2/24 megaflow and just changes its action instead of extending
> >>> the prefix into 10.1.2.2/16.
> >>>
> >>> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> >>> this time, we will have an overlapping megaflow with the right prefix:
> >>> 10.1.0.2/16
> >>>
> >>> now we have two megaflows:
> >>> 10.1.2.2/24
> >>> 10.1.0.2/16
> >>>
> >>> last, suppose we have changed the ruleset again. The revalidator this
> >>> time still decides to change the actions of both megaflows instead of
> >>> deleting them.
> >>>
> >>> The dpif_netdev_flow_put will search the megaflow to modify with
> unmasked
> >>> keys, however it might lookup the wrong megaflow as the key 10.1.2.2
> >> matches
> >>> both 10.1.2.2/24 and 10.1.0.2/16!
> >>>
> >>> This patch changes the megaflow lookup code in modification path into
> >>> relying the ufid to find the correct megaflow instead of key lookup.
> >>
> >> Thanks for fixing Ilya’s comments! I’ve also copied in some of the v3
> >> discussion, so we can wrap it up here.
> >>
> >> Acked-by: Eelco Chaudron <echau...@redhat.com>
> >>
> >> //Eelco
> >>
> >>> Signed-off-by: Peng He <hepeng.0...@bytedance.com>
> >>> ---
> >>>  lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
> >>>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
> >>>  2 files changed, 82 insertions(+), 28 deletions(-)
> >>>
> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >>> index 70b953ae6..b90ed1542 100644
> >>> --- a/lib/dpif-netdev.c
> >>> +++ b/lib/dpif-netdev.c
> >>> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread
> *pmd,
> >>>          memset(stats, 0, sizeof *stats);
> >>>      }
> >>>
> >>> -    ovs_mutex_lock(&pmd->flow_mutex);
> >>> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> >>> -    if (!netdev_flow) {
> >>> -        if (put->flags & DPIF_FP_CREATE) {
> >>> +    if (put->flags & DPIF_FP_CREATE) {
> >>
> >>
> >> EC> Should this not be:
> >> EC>
> >> EC> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))
> >> EC>
> >> EC> I know this will break the fix, but I’m wondering what the possible
> >> EC> combinations are.
> >> EC> Quickly looking at the code the following ones seem to exist:
> >> EC>
> >> EC> DPIF_FP_CREATE  -> Create a flow, if one exists return EEXIST
> >> EC> DPIF_FP_MODIFY  -> Modify a flow, if it does not exist return ENONT
> >> EC> DPIF_FP_CREATE | DPIF_FP_MODIFY   -> If it exists modify it, if it
> does
> >> EC> not exists create it.
> >> EC>
> >> EC> Could the last combination not potentially fail the lookup?
> >> EC>
> >> EC> Or are we assuming only standalone DPIF_FP_MODIFY’s are the
> problem? In
> >> EC> theory, it could also be the combination.
> >> EC>
> >>
> >> PH> sorry, I was wrong. Such a combination does exist in
> >> PH>the dpif_probe_feature, and it probes the datapath by trying to put
> >> flows:
> >> PH>
> >> PH>     error = dpif_flow_put(dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY |
> >> PH>DPIF_FP_PROBE,
> >> PH>                           key->data, key->size, NULL, 0,
> >> PH>                           nl_actions, nl_actions_size,
> >> PH>                           ufid, NON_PMD_CORE_ID, NULL);
> >>
> >> PH> so we CANNOT change the code into:
> >>
> >> PH> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))
> >>
> >> PH> as the issue the patch tries to fix only exist in MODIFY alone path.
> >> PH> If CREATE bit is set, we need to go through the CREATE path no
> matter
> >> if
> >> PH> MODIFY exist or not.
> >>
> >> Ok, if this is only used by the probe function we should be fine. I did
> >> quickly search the code and it seems to be the way. However, if it’s
> ever
> >> used by any part of ovs with both flags set, it might fail the lookup
> and
> >> we run into the same problem.
> >>
> >
> > By "the same problem", you mean the problem listed in this patch or this
> > fix might lead to a potential failure in other part of OVS?
>
> If you have a request with both DPIF_FP_MODIFY and DPIF_FP_CREATE set, the
> lookup could still return the wrong entry. But if this is not used it
> should be fine for now.
>


OK. get.

>
> >
> >>
> >>> +        ovs_mutex_lock(&pmd->flow_mutex);
> >>> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> >>> +        if (!netdev_flow) {
> >>>              dp_netdev_flow_add(pmd, match, ufid, put->actions,
> >>>                                 put->actions_len, ODPP_NONE);
> >>>          } else {
> >>> -            error = ENOENT;
> >>> +            error = EEXIST;
> >>>          }
> >>> +        ovs_mutex_unlock(&pmd->flow_mutex);
> >>>      } else {
> >>> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
> >>> +                                              put->key, put->key_len);
> >>> +
> >>>          if (put->flags & DPIF_FP_MODIFY) {
> >>> -            struct dp_netdev_actions *new_actions;
> >>> -            struct dp_netdev_actions *old_actions;
> >>> +            if (!netdev_flow) {
> >>> +                error = ENOENT;
> >>> +            } else {
> >>> +                struct dp_netdev_actions *new_actions;
> >>> +                struct dp_netdev_actions *old_actions;
> >>>
> >>> -            new_actions = dp_netdev_actions_create(put->actions,
> >>> -                                                   put->actions_len);
> >>> +                new_actions = dp_netdev_actions_create(put->actions,
> >>> +
> >>  put->actions_len);
> >>>
> >>> -            old_actions = dp_netdev_flow_get_actions(netdev_flow);
> >>> -            ovsrcu_set(&netdev_flow->actions, new_actions);
> >>> +                old_actions = dp_netdev_flow_get_actions(netdev_flow);
> >>> +                ovsrcu_set(&netdev_flow->actions, new_actions);
> >>>
> >>> -            queue_netdev_flow_put(pmd, netdev_flow, match,
> >>> -                                  put->actions, put->actions_len,
> >>> -                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> >>> -            log_netdev_flow_change(netdev_flow, match, old_actions,
> >>> -                                   put->actions, put->actions_len);
> >>> +                queue_netdev_flow_put(pmd, netdev_flow, match,
> >>> +                                      put->actions, put->actions_len,
> >>> +                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> >>> +                log_netdev_flow_change(netdev_flow, match,
> old_actions,
> >>> +                                       put->actions,
> put->actions_len);
> >>>
> >>> -            if (stats) {
> >>> -                get_dpif_flow_status(pmd->dp, netdev_flow, stats,
> NULL);
> >>> -            }
> >>> -            if (put->flags & DPIF_FP_ZERO_STATS) {
> >>> +                if (stats) {
> >>> +                    get_dpif_flow_status(pmd->dp, netdev_flow, stats,
> >> NULL);
> >>> +                }
> >>> +                if (put->flags & DPIF_FP_ZERO_STATS) {
> >>>                  /* XXX: The userspace datapath uses thread local
> >> statistics
> >>>                   * (for flows), which should be updated only by the
> >> owning
> >>>                   * thread.  Since we cannot write on stats memory
> here,
> >>> @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread
> *pmd,
> >>>                   * - Should the need arise, this operation can be
> >> implemented
> >>>                   *   by keeping a base value (to be update here) for
> >> each
> >>>                   *   counter, and subtracting it before outputting the
> >> stats */
> >>> -                error = EOPNOTSUPP;
> >>> +                    error = EOPNOTSUPP;
> >>> +                }
> >>> +                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> >>>              }
> >>> -
> >>> -            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> >>> -        } else if (put->flags & DPIF_FP_CREATE) {
> >>> -            error = EEXIST;
> >>>          } else {
> >>> -            /* Overlapping flow. */
> >>> -            error = EINVAL;
> >>> +            if (netdev_flow) {
> >>> +                /* Overlapping flow. */
> >>> +                error = EINVAL;
> >>> +            }
> >>>          }
> >>>      }
> >>> -    ovs_mutex_unlock(&pmd->flow_mutex);
> >>>      return error;
> >>>  }
> >>>
> >>> diff --git a/tests/pmd.at b/tests/pmd.at
> >>> index 48f3d432d..85211a30f 100644
> >>> --- a/tests/pmd.at
> >>> +++ b/tests/pmd.at
> >>> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM
> >> ovs-vswitchd.log | grep "PMD load based sleeps
> >>>
> >>>  OVS_VSWITCHD_STOP
> >>>  AT_CLEANUP
> >>> +
> >>> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> >>> +
> >>> +OVS_VSWITCHD_START(
> >>> +[add-port br0 p1 \
> >>> +   -- set interface p1 type=dummy-pmd \
> >>> +   -- set bridge br0 datapath-type=dummy \
> >>> +   -- add-port br0 p2 \
> >>> +   -- set interface p2 type=dummy-pmd --
> >>> +])
> >>> +
> >>> +dnl Add one openflow rule and generate a megaflow.
> >>> +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
> >> 10.1.2.0/24,actions=p2'])
> >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> >> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> >>> +
> >>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'],
> >> [0], [
> >>>
> >>
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> >> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never,
> actions:2
> >>> +])
> >>> +
> >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> >> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> >>> +dnl Replace openflow rules, trigger the revalidation.
> >>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
> >> actions=ct(commit)' | dnl
> >>> +ovs-ofctl --bundle replace-flows br0 -])
> >>> +AT_CHECK([ovs-appctl revalidator/wait])
> >>> +
> >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> >> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> >>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> >> strip_xout_keep_actions], [0], [
> >>>
> >>
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> >> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
> >> actions:ct(commit)
> >>>
> >>
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> >> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
> >> actions:ct(commit)
> >>> +])
> >>> +
> >>> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> >> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> >>> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
> >> tuple in pvector of subtable.
> >>> +for i in `seq 0 256`;do
> >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> >> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> >>> +done
> >>> +
> >>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2'
> |
> >> dnl
> >>> +ovs-ofctl --bundle replace-flows br0 -])
> >>> +
> >>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> >> strip_xout_keep_actions], [0], [
> >>>
> >>
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> >> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> >>>
> >>
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> >> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
> actions:2
> >>> +])
> >>> +
> >>> +OVS_VSWITCHD_STOP
> >>> +AT_CLEANUP
> >>> --
> >>> 2.25.1
> >>
> >>
> >
> > --
> > hepeng
>
>

-- 
hepeng
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to