On 9/25/17, 10:37 PM, "Yuanhan Liu" <[email protected]> wrote:

    Some hightlights in v3
    ======================
    
    Here is the v3, with 2 major changes and further testings (including
    many flows). This took more effort than I thought, thus v3 publication
    has been delayed for a while.
    
    The first major change is the mark and id association is done with array
    instead of CMAP now. This gives us further performance gain: it could
    be up to 70% now (please see the exact number below).
    
    This change also make the code a bit more complex though, due to the
    lock issue. RCU is used (not quite sure it's been used rightly though).
    For now, RCU only protects the array base address update (due to
    reallocate), it doesn't protect the array item (array[i] = xx]) change.
    I think it's buggy and I need rethink about it.
    
    The second major change is there is a thread introduced to do the real
    flow offload. This is for diminishing the overhead by hw flow offload
    installation/deletion at data path. See patch 9 for more detailed info.

[Darrell] There might be other options to handle this such as dividing time
to HWOL in a given PMD. 
Another option to have PMD isolation.
    
    In the last discussion, RSS action was suggested to use to get rid of
    the QUEUE action workaround. Unfortunately, it didn't work. The flow
    creation failed with MLX5 PMD driver, and that's the only driver in
    DPDK that supports RSS action so far.


[Darrell] 
I wonder if we should take a pause before jumping into the next version of the 
code.

If workarounds are required in OVS code, then so be it as long as they are not
overly disruptive to the existing code and hard to maintain.
In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option
to avoid some unpleasant OVS workarounds.
This would make a significant difference in the code paths, if we supported it, 
so
we need to be sure as early as possible.
The support needed would be in some drivers and seems reasonably doable. 
Moreover, this was discussed in the last dpdk meeting and the support was
indicated as existing?, although I only verified the MLX code, myself.

I had seen the MLX code supporting _RSS action and there are some checks for
supported cases; when you say “it didn't work”, what was the issue ?
Let us have a discussion also about the Intel nic side and the Napatech side.
It seems reasonable to ask where the disconnect is and whether this support
can be added and then make a decision based on the answers. 

What do you think?

    
    I also tested many flows this time. The result is more exciting: it
    could be up to 267% boost, with 512 mega flows (with each has another
    512 exact matching flows, thus it's 512*512=256K flows in total), one
    core and one queue doing PHY-PHY forwarding. For the offload case, the
    performance keeps the same as with one flow only: because the cost of
    the mark to flow translation is constant, no matter how many flows
    are inserted (as far as they are all offloaded). However, for the
    vanilla ovs-dpdk, the more flows, the worse the performance is. In
    another word, the more flows, the bigger difference we will see.
    
    There were too many discussions in last version. I'm sorry if I missed
    some comments and didn't do the corresponding changes in v3. Please let
    me know if I made such mistakes.
    
    And below are the formal cover letter introduction, for someone who
    is the first time to see this patchset.
    
    ---
    Hi,
    
    Here is a joint work from Mellanox and Napatech, to enable the flow hw
    offload with the DPDK generic flow interface (rte_flow).
    
    The basic idea is to associate the flow with a mark id (a unit32_t number).
    Later, we then get the flow directly from the mark id, bypassing the heavy
    emc processing, including miniflow_extract.
    
    The association is done with array in patch 1. It also reuses the flow
    APIs introduced while adding the tc offloads. The emc bypassing is done
    in patch 2. The flow offload is done in patch 4, which mainly does two
    things:
    
    - translate the ovs match to DPDK rte flow patterns
    - bind those patterns with a MARK action.
    
    Afterwards, the NIC will set the mark id in every pkt's mbuf when it
    matches the flow. That's basically how we could get the flow directly
    from the received mbuf.
    
    While testing with PHY-PHY forwarding with one core, one queue and one
    flow, I got about 70% performance boost. For PHY-vhost forwarding, I got
    about 50% performance boost. It's basically the performance I got with v1,
    when the tcp_flags is the ignored. In summary, the CMPA to array change
    gives up yet another 16% performance boost.
    
    The major issue mentioned in v1 is also workarounded: the queue index
    is never set to 0 blindly anymore, but set to the rxq that first
    receives the upcall pkt.
    
    Note that it's disabled by default, which can be enabled by:
    
        $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
    
    v3: - The mark and id association is done with array instead of CMAP.
        - Added a thread to do hw offload operations
        - Removed macros completely
        - dropped the patch to set FDIR_CONF, which is a workround some
          Intel NICs.
        - Added a debug patch to show all flow patterns we have created.
        - Misc fixes
    
    v2: - workaround the queue action issue
        - fixed the tcp_flags being skipped issue, which also fixed the
          build warnings
        - fixed l2 patterns for Intel nic
        - Converted some macros to functions
        - did not hardcode the max number of flow/action
        - rebased on top of the latest code
    
    Thanks.
    
        --yliu
    
    ---
    Finn Christensen (2):
      netdev-dpdk: implement flow put with rte flow
      netdev-dpdk: retry with queue action
    
    Shachar Beiser (1):
      dpif-netdev: record rx queue id for the upcall
    
    Yuanhan Liu (6):
      dpif-netdev: associate flow with a mark id
      dpif-netdev: retrieve flow directly from the flow mark
      netdev-dpdk: convert ufid to dpdk flow
      netdev-dpdk: remove offloaded flow on deletion
      netdev-dpdk: add debug for rte flow patterns
      dpif-netdev: do hw flow offload in another thread
    
     lib/dp-packet.h   |  14 +
     lib/dpif-netdev.c | 421 ++++++++++++++++++++++++++++-
     lib/flow.c        | 155 ++++++++---
     lib/flow.h        |   1 +
     lib/netdev-dpdk.c | 776 
+++++++++++++++++++++++++++++++++++++++++++++++++++++-
     lib/netdev.c      |   1 +
     lib/netdev.h      |   7 +
     7 files changed, 1331 insertions(+), 44 deletions(-)
    
    -- 
    2.7.4
    
    

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to