On 10/30/21 09:04, Sriharsha Basavapatna wrote:
> On Mon, Aug 9, 2021 at 6:28 PM Ilya Maximets <[email protected]> wrote:
>>
>> There are cases where users might want simple forwarding or drop rules
>> for all packets received from a specific port, e.g ::
>>
>> "in_port=1,actions=2"
>> "in_port=2,actions=IN_PORT"
>> "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
>> "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"
>>
>> There are also cases where complex OpenFlow rules can be simplified
>> down to datapath flows with very simple match criteria.
>>
>> In theory, for very simple forwarding, OVS doesn't need to parse
>> packets at all in order to follow these rules. "Simple match" lookup
>> optimization is intended to speed up packet forwarding in these cases.
>>
>> Design:
>>
>> Due to various implementation constraints userspace datapath has
>> following flow fields always in exact match (i.e. it's required to
>> match at least these fields of a packet even if the OF rule doesn't
>> need that):
>>
>> - recirc_id
>> - in_port
>> - packet_type
>> - dl_type
>> - vlan_tci (CFI + VID) - in most cases
>> - nw_frag - for ip packets
>>
>> Not all of these fields are related to packet itself. We already
>> know the current 'recirc_id' and the 'in_port' before starting the
>> packet processing. It also seems safe to assume that we're working
>> with Ethernet packets. So, for the simple OF rule we need to match
>> only on 'dl_type', 'vlan_tci' and 'nw_frag'.
>>
>> 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
>> combined in a single 64bit integer (mark) that can be used as a
>> hash in hash map. We are using only VID and CFI form the 'vlan_tci',
>> flows that need to match on PCP will not qualify for the optimization.
>> Workaround for matching on non-existence of vlan updated to match on
>> CFI and VID only in order to qualify for the optimization. CFI is
>> always set by OVS if vlan is present in a packet, so there is no need
>> to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside
>> the simple match mark.
>>
>> New per-PMD flow table 'simple_match_table' introduced to store
>> simple match flows only. 'dp_netdev_flow_add' adds flow to the
>> usual 'flow_table' and to the 'simple_match_table' if the flow
>> meets following constraints:
>>
>> - 'recirc_id' in flow match is 0.
>> - 'packet_type' in flow match is Ethernet.
>> - Flow wildcards contains only minimal set of non-wildcarded fields
>> (listed above).
>>
>> If the number of flows for current 'in_port' in a regular 'flow_table'
>> equals number of flows for current 'in_port' in a 'simple_match_table',
>> we may use simple match optimization, because all the flows we have
>> are simple match flows. This means that we only need to parse
>> 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
>> Now we make the unique flow mark from the 'in_port', 'dl_type',
>> 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
>> On successful lookup we don't need to run full 'miniflow_extract()'.
>>
>> Unsuccessful lookup technically means that we have no suitable flow
>> in the datapath and upcall will be required. So, in this case EMC and
>> SMC lookups are disabled. We may optimize this path in the future by
>> bypassing the dpcls lookup too.
>>
>> Performance improvement of this solution on a 'simple match' flows
>> should be comparable with partial HW offloading, because it parses same
>> packet fields and uses similar flow lookup scheme.
>> However, unlike partial HW offloading, it works for all port types
>> including virtual ones.
>>
>> Performance results when compared to EMC:
>>
>> Test setup:
>>
>> virtio-user OVS virtio-user
>> Testpmd1 ------------> pmd1 ------------> Testpmd2
>> (txonly) x<------ pmd2 <------------ (mac swap)
>>
>> Single stream of 64byte packets. Actions:
>> in_port=vhost0,actions=vhost1
>> in_port=vhost1,actions=vhost0
>>
>> Stats collected from pmd1 and pmd2, so there are 2 scenarios:
>> Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2.
>> Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1.
>> Here the packet sent from pmd2 to Testpmd1 is always dropped, because
>> the virtqueue is full since Testpmd1 is in txonly mode and doesn't
>> receive any packets. This should be closer to the performance of a
>> VM-to-Phy scenario.
>>
>> Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
>> Table below represents improvement in throughput when compared to EMC.
>>
>> +----------------+------------------------+------------------------+
>> | | Default (-g -O2) | "-Ofast -march=native" |
>> | Scenario +------------+-----------+------------+-----------+
>> | | GCC | Clang | GCC | Clang |
>> +----------------+------------+-----------+------------+-----------+
>> | Virt-to-Virt | +18.9% | +25.5% | +10.8% | +16.7% |
>> | Virt-to-NoCopy | +24.3% | +33.7% | +14.9% | +22.0% |
>> +----------------+------------+-----------+------------+-----------+
>>
>> For Phy-to-Phy case performance improvement should be even higher, but
>> it's not the main use-case for this functionality. Performance
>> difference for the non-simple flows is within a margin of error.
>>
>> Signed-off-by: Ilya Maximets <[email protected]>
>> ---
>>
>> Version 3:
>> - Removed artificial restriction on flow actions. Any set of actions
>> is supported. Hence the patch re-named from "direct output" to
>> "simple match".
>> - VLAN VID+CFI added to the mark. Previous implementation from v1/v2
>> was not correct in a way that it ignored match on VLAN if match on
>> non-existence of VLAN was requested. Workaround for netlink
>> expression of the match on non-existance of VLAN improved to avoid
>> match on PCP, so we have extra bits inside the mark to fit
>> fragmentation flags. Technically, we can keep only match on CFI,
>> but we have enough bits, so we don't need to reduce the mask
>> further. We parsed VLAN header from a packet anyway in v1 and v2,
>> so this doesn't really affect performance of the solution, but
>> increases the variety of flows that can be optimized.
>> - Since we're matching on VLAN, the optimization now works for flows
>> with VLAN matches (VID+CFI, no PCP matches).
>> - Added statistics, documentation and performance test results.
>>
>> Version 2 (Harsha):
>> - Rebase on 2.15
>> - Added a coverage counter.
>> -
>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>>
>> Documentation/topics/dpdk/bridge.rst | 24 ++
>> NEWS | 3 +
>> lib/dpif-netdev-avx512.c | 3 +-
>> lib/dpif-netdev-perf.c | 41 ++--
>> lib/dpif-netdev-perf.h | 1 +
>> lib/dpif-netdev-private-flow.h | 5 +-
>> lib/dpif-netdev-private-thread.h | 13 +-
>> lib/dpif-netdev-unixctl.man | 12 +-
>> lib/dpif-netdev.c | 316 +++++++++++++++++++++++----
>> lib/flow.c | 22 +-
>> lib/flow.h | 3 +-
>> lib/netdev-offload-dpdk.c | 2 +-
>> tests/dpif-netdev.at | 16 +-
>> tests/nsh.at | 2 +-
>> tests/pmd.at | 6 +-
>> 15 files changed, 386 insertions(+), 83 deletions(-)
>>
<snip>
>
> Hi Ilya,
>
> We are trying to test this feature and running into an issue in which
> the ofctl rule (in_port=1,actions=2) is resulting in a
> datapath rule with the 'dl_dst' inserted in the match field. So the
> datapath rule ends up failing the simple match criteria.
>
> The test case involves forwarding traffic from the VM (virtio) via the
> vhost-user port and VxLAN tunnel in OVS.
>
> br-vhost br-vxlan
> br-phy
> vhost-user<-->VF VF-Rep<-->VxLAN uplink-port
>
> The simple match ofctl rule is added to the br-vhost bridge to forward
> traffic to/from the vhost-user port. The problem disappears if the
> vxlan bridge is removed.
>
> Debugging this further, I see this code in
> xlate_actions() .... --> compose_output_action__():
>
> } else if (terminate_native_tunnel(ctx, flow, wc,
> &odp_tnl_port)) {
>
> In terminate_native_tunnel():
>
> if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)) {
> *tnl_port = tnl_port_map_lookup(flow, wc);
>
> When I dump wc->masks.dl_dst before and after tnl_port_map_lookup(),
> it changes from all zeros to all 0xff (wc to em).
>
> So it seems like while there's a tunnel configuration, though the
> ctx->xbridge in question is the vhost bridge,
> ovs_native_tunneling_is_on() returns true and in tnl_port_map_lookup()
> the wc mask gets updated for dl_dst.
>
> Thanks,
> -Harsha
>
Hi. Thanks for reporting the issue. I think, I have a solution for it.
Please, take a look at the following patch:
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
Bets regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev