On 6 Jul 2023, at 4:32, Peng He wrote:
> Eelco Chaudron <echau...@redhat.com> 于2023年7月5日周三 22:16写道: > >> >> >> On 1 Jul 2023, at 4:43, Peng He wrote: >> >>> OVS allows overlapping megaflows, as long as the actions of these >>> megaflows are equal. However, the current implementation of action >>> modification relies on flow_lookup instead of ufid, this could result >>> in looking up a wrong megaflow and make the ukeys and megaflows >> inconsistent >>> >>> Just like the test case in the patch, at first we have a rule with the >>> prefix: >>> >>> 10.1.2.0/24 >>> >>> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with >> IP >>> 10.1.2.2 is received. >>> >>> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep >> the >>> 10.1.2.2/24 megaflow and just changes its action instead of extending >>> the prefix into 10.1.2.2/16. >>> >>> then suppose we have a 10.1.0.2 packet, since it misses the megaflow, >>> this time, we will have an overlapping megaflow with the right prefix: >>> 10.1.0.2/16 >>> >>> now we have two megaflows: >>> 10.1.2.2/24 >>> 10.1.0.2/16 >>> >>> last, suppose we have changed the ruleset again. The revalidator this >>> time still decides to change the actions of both megaflows instead of >>> deleting them. >>> >>> The dpif_netdev_flow_put will search the megaflow to modify with unmasked >>> keys, however it might lookup the wrong megaflow as the key 10.1.2.2 >> matches >>> both 10.1.2.2/24 and 10.1.0.2/16! >>> >>> This patch changes the megaflow lookup code in modification path into >>> relying the ufid to find the correct megaflow instead of key lookup. >> >> Thanks for fixing Ilya’s comments! I’ve also copied in some of the v3 >> discussion, so we can wrap it up here. >> >> Acked-by: Eelco Chaudron <echau...@redhat.com> >> >> //Eelco >> >>> Signed-off-by: Peng He <hepeng.0...@bytedance.com> >>> --- >>> lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++--------------------- >>> tests/pmd.at | 48 ++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 82 insertions(+), 28 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index 70b953ae6..b90ed1542 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, >>> memset(stats, 0, sizeof *stats); >>> } >>> >>> - ovs_mutex_lock(&pmd->flow_mutex); >>> - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL); >>> - if (!netdev_flow) { >>> - if (put->flags & DPIF_FP_CREATE) { >>> + if (put->flags & DPIF_FP_CREATE) { >> >> >> EC> Should this not be: >> EC> >> EC> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY)) >> EC> >> EC> I know this will break the fix, but I’m wondering what the possible >> EC> combinations are. >> EC> Quickly looking at the code the following ones seem to exist: >> EC> >> EC> DPIF_FP_CREATE -> Create a flow, if one exists return EEXIST >> EC> DPIF_FP_MODIFY -> Modify a flow, if it does not exist return ENONT >> EC> DPIF_FP_CREATE | DPIF_FP_MODIFY -> If it exists modify it, if it does >> EC> not exists create it. >> EC> >> EC> Could the last combination not potentially fail the lookup? >> EC> >> EC> Or are we assuming only standalone DPIF_FP_MODIFY’s are the problem? In >> EC> theory, it could also be the combination. >> EC> >> >> PH> sorry, I was wrong. Such a combination does exist in >> PH>the dpif_probe_feature, and it probes the datapath by trying to put >> flows: >> PH> >> PH> error = dpif_flow_put(dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY | >> PH>DPIF_FP_PROBE, >> PH> key->data, key->size, NULL, 0, >> PH> nl_actions, nl_actions_size, >> PH> ufid, NON_PMD_CORE_ID, NULL); >> >> PH> so we CANNOT change the code into: >> >> PH> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY)) >> >> PH> as the issue the patch tries to fix only exist in MODIFY alone path. >> PH> If CREATE bit is set, we need to go through the CREATE path no matter >> if >> PH> MODIFY exist or not. >> >> Ok, if this is only used by the probe function we should be fine. I did >> quickly search the code and it seems to be the way. However, if it’s ever >> used by any part of ovs with both flags set, it might fail the lookup and >> we run into the same problem. >> > > By "the same problem", you mean the problem listed in this patch or this > fix might lead to a potential failure in other part of OVS? If you have a request with both DPIF_FP_MODIFY and DPIF_FP_CREATE set, the lookup could still return the wrong entry. But if this is not used it should be fine for now. > >> >>> + ovs_mutex_lock(&pmd->flow_mutex); >>> + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL); >>> + if (!netdev_flow) { >>> dp_netdev_flow_add(pmd, match, ufid, put->actions, >>> put->actions_len, ODPP_NONE); >>> } else { >>> - error = ENOENT; >>> + error = EEXIST; >>> } >>> + ovs_mutex_unlock(&pmd->flow_mutex); >>> } else { >>> + netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid, >>> + put->key, put->key_len); >>> + >>> if (put->flags & DPIF_FP_MODIFY) { >>> - struct dp_netdev_actions *new_actions; >>> - struct dp_netdev_actions *old_actions; >>> + if (!netdev_flow) { >>> + error = ENOENT; >>> + } else { >>> + struct dp_netdev_actions *new_actions; >>> + struct dp_netdev_actions *old_actions; >>> >>> - new_actions = dp_netdev_actions_create(put->actions, >>> - put->actions_len); >>> + new_actions = dp_netdev_actions_create(put->actions, >>> + >> put->actions_len); >>> >>> - old_actions = dp_netdev_flow_get_actions(netdev_flow); >>> - ovsrcu_set(&netdev_flow->actions, new_actions); >>> + old_actions = dp_netdev_flow_get_actions(netdev_flow); >>> + ovsrcu_set(&netdev_flow->actions, new_actions); >>> >>> - queue_netdev_flow_put(pmd, netdev_flow, match, >>> - put->actions, put->actions_len, >>> - DP_NETDEV_FLOW_OFFLOAD_OP_MOD); >>> - log_netdev_flow_change(netdev_flow, match, old_actions, >>> - put->actions, put->actions_len); >>> + queue_netdev_flow_put(pmd, netdev_flow, match, >>> + put->actions, put->actions_len, >>> + DP_NETDEV_FLOW_OFFLOAD_OP_MOD); >>> + log_netdev_flow_change(netdev_flow, match, old_actions, >>> + put->actions, put->actions_len); >>> >>> - if (stats) { >>> - get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL); >>> - } >>> - if (put->flags & DPIF_FP_ZERO_STATS) { >>> + if (stats) { >>> + get_dpif_flow_status(pmd->dp, netdev_flow, stats, >> NULL); >>> + } >>> + if (put->flags & DPIF_FP_ZERO_STATS) { >>> /* XXX: The userspace datapath uses thread local >> statistics >>> * (for flows), which should be updated only by the >> owning >>> * thread. Since we cannot write on stats memory here, >>> @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, >>> * - Should the need arise, this operation can be >> implemented >>> * by keeping a base value (to be update here) for >> each >>> * counter, and subtracting it before outputting the >> stats */ >>> - error = EOPNOTSUPP; >>> + error = EOPNOTSUPP; >>> + } >>> + ovsrcu_postpone(dp_netdev_actions_free, old_actions); >>> } >>> - >>> - ovsrcu_postpone(dp_netdev_actions_free, old_actions); >>> - } else if (put->flags & DPIF_FP_CREATE) { >>> - error = EEXIST; >>> } else { >>> - /* Overlapping flow. */ >>> - error = EINVAL; >>> + if (netdev_flow) { >>> + /* Overlapping flow. */ >>> + error = EINVAL; >>> + } >>> } >>> } >>> - ovs_mutex_unlock(&pmd->flow_mutex); >>> return error; >>> } >>> >>> diff --git a/tests/pmd.at b/tests/pmd.at >>> index 48f3d432d..85211a30f 100644 >>> --- a/tests/pmd.at >>> +++ b/tests/pmd.at >>> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM >> ovs-vswitchd.log | grep "PMD load based sleeps >>> >>> OVS_VSWITCHD_STOP >>> AT_CLEANUP >>> + >>> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows]) >>> + >>> +OVS_VSWITCHD_START( >>> +[add-port br0 p1 \ >>> + -- set interface p1 type=dummy-pmd \ >>> + -- set bridge br0 datapath-type=dummy \ >>> + -- add-port br0 p2 \ >>> + -- set interface p2 type=dummy-pmd -- >>> +]) >>> + >>> +dnl Add one openflow rule and generate a megaflow. >>> +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst= >> 10.1.2.0/24,actions=p2']) >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 >> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)']) >>> + >>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], >> [0], [ >>> >> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst= >> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2 >>> +]) >>> + >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 >> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)']) >>> +dnl Replace openflow rules, trigger the revalidation. >>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 >> actions=ct(commit)' | dnl >>> +ovs-ofctl --bundle replace-flows br0 -]) >>> +AT_CHECK([ovs-appctl revalidator/wait]) >>> + >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 >> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)']) >>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | >> strip_xout_keep_actions], [0], [ >>> >> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst= >> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never, >> actions:ct(commit) >>> >> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst= >> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, >> actions:ct(commit) >>> +]) >>> + >>> +dnl Hold the prefix 10.1.2.2/24 by another 10s. >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 >> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)']) >>> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24 >> tuple in pvector of subtable. >>> +for i in `seq 0 256`;do >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 >> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)']) >>> +done >>> + >>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' | >> dnl >>> +ovs-ofctl --bundle replace-flows br0 -]) >>> + >>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | >> strip_xout_keep_actions], [0], [ >>> >> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst= >> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2 >>> >> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst= >> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2 >>> +]) >>> + >>> +OVS_VSWITCHD_STOP >>> +AT_CLEANUP >>> -- >>> 2.25.1 >> >> > > -- > hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev