On Mon, Jan 29, 2018 at 02:59:42PM +0800, Yuanhan Liu wrote:
> Hi,
> 
> Here is a joint work from Mellanox and Napatech, to enable the flow hw
> offload with the DPDK generic flow interface (rte_flow).
> 
> The basic idea is to associate the flow with a mark id (a unit32_t number).
> Later, we then get the flow directly from the mark id, which could bypass
> some heavy CPU operations, including but not limiting to mini flow extract,
> emc lookup, dpcls lookup, etc.
> 
> The association is done with CMAP in patch 1. The CPU workload bypassing
> is done in patch 2. The flow offload is done in patch 3, which mainly does
> two things:
> 
> - translate the ovs match to DPDK rte flow patterns
> - bind those patterns with a RSS + MARK action.
> 
> Patch 5 makes the offload work happen in another thread, for leaving the
> datapath as light as possible.
> 
> A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and 1
> million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than 260%
> performance boost.
> 
> Note that it's disabled by default, which can be enabled by:

Hi,

First of all, thanks for working on this feature.

I have some general comments I'd like to discuss before going deeper
on it.

The documentation is too simple.  It should mention the HW requirements
in order to use this feature. Also some important limitations, like no
support for IP frags, MPLS or conntrack, for instance.

It seems it would be possible to leave the HW offloading code outside
of dpif-netdev.c which is quite long already. I hope it will improve
isolation and code clarity too.

So far there is no synchronization between PMDs in the fast path.
However, we got a new mutex to sync PMDs and a new thread to manage.
Even without the patch adding the thread, there would be a new mutex
in the fast path.  It seems the slow path today causes issues, so maybe
the whole upcall processing could be pushed to another thread. I
realize this is outside of the scope of this patchset, but it is
something we should consider.

As an alternative solution, maybe we could use a DPDK ring to have a
lockless way to push flows to the auxiliary thread.

There are some memory allocations and deallocations in the fast path
using OVS functions.  Perhaps it is better to use rte_* functions
instead (another reason to split the code out of dpif-netdev.c)

I am curious to know why there is no flow dump or flush?

The function to help debugging (dump_flow_pattern) should use an
initial condition to return asap if debug is not enabled.
E.g.:
    if (VLOG_DROP_DBG(rl)) {
        return;
    }   

I am still wrapping my head around the RSS+MARK action and rte_flow
API, so I can't really comment those yet.

Thanks!
fbl

> 
>     $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> 
> v7: - fixed wrong hash for mark_to_flow that has been refactored in v6
>     - set the rss_conf for rss action to NULL, to workaround a mlx5 change
>       in DPDK v17.11. Note that it will obey the rss settings OVS-DPDK has
>       set in the beginning. Thus, nothing should be effected.
> 
> v6: - fixed a sparse warning
>     - added documentation
>     - used hash_int to compute mark to flow hash
>     - added more comments
>     - added lock for pot lookup
>     - rebased on top of the latest code
> 
> v5: - fixed an issue that it took too long if we do flow add/remove
>       repeatedly.
>     - removed an unused mutex lock
>     - turned most of the log level to DBG
>     - rebased on top of the latest code
> 
> v4: - use RSS action instead of QUEUE action with MARK
>     - make it work with multiple queue (see patch 1)
>     - rebased on top of latest code
> 
> v3: - The mark and id association is done with array instead of CMAP.
>     - Added a thread to do hw offload operations
>     - Removed macros completely
>     - dropped the patch to set FDIR_CONF, which is a workround some
>       Intel NICs.
>     - Added a debug patch to show all flow patterns we have created.
>     - Misc fixes
> 
> v2: - workaround the queue action issue
>     - fixed the tcp_flags being skipped issue, which also fixed the
>       build warnings
>     - fixed l2 patterns for Intel nic
>     - Converted some macros to functions
>     - did not hardcode the max number of flow/action
>     - rebased on top of the latest code
> 
> Thanks.
> 
>     --yliu
> 
> ---
> Finn Christensen (1):
>   netdev-dpdk: implement flow offload with rte flow
> 
> Yuanhan Liu (5):
>   dpif-netdev: associate flow with a mark id
>   dpif-netdev: retrieve flow directly from the flow mark
>   netdev-dpdk: add debug for rte flow patterns
>   dpif-netdev: do hw flow offload in a thread
>   Documentation: document ovs-dpdk flow offload
> 
>  Documentation/howto/dpdk.rst |  17 +
>  NEWS                         |   1 +
>  lib/dp-packet.h              |  13 +
>  lib/dpif-netdev.c            | 495 ++++++++++++++++++++++++++++-
>  lib/flow.c                   | 155 +++++++--
>  lib/flow.h                   |   1 +
>  lib/netdev-dpdk.c            | 736 
> ++++++++++++++++++++++++++++++++++++++++++-
>  lib/netdev.h                 |   6 +
>  8 files changed, 1385 insertions(+), 39 deletions(-)
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Flavio

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to