Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

Eli Britstein Fri, 25 Jun 2021 06:57:17 -0700


On 6/25/2021 3:09 PM, Balazs Nemeth wrote:

*External email: Use caution opening links or attachments*



Ilya,

Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
is called once per batch. I don't expect any issue with that.


Thanks Balazs.


Regards,
Balazs

On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets <[email protected]<mailto:[email protected]>> wrote:


    On 6/25/21 1:04 PM, Ferriter, Cian wrote:
    > Hi Eli,
    >
    > I have some concerns about how we plug the packet state recover
    logic into dfc_processing() here. My concerns are inline below.
    >
    > I'm concerned that we are hurting performance in the partial
    HWOL case, as this patchset is introducing new direct (non-inline)
    function calls per packet to the software datapath. We can reduce
    performance impact in this area, see specific suggestions below.
    >
    > Thanks,
    > Cian
    >
    >> -----Original Message-----
    >> From: Eli Britstein <[email protected] <mailto:[email protected]>>
    >> Sent: Wednesday 23 June 2021 16:53
    >> To: [email protected] <mailto:[email protected]>; Ilya
    Maximets <[email protected] <mailto:[email protected]>>
    >> Cc: Gaetan Rivet <[email protected]
    <mailto:[email protected]>>; Majd Dibbiny <[email protected]
    <mailto:[email protected]>>; Sriharsha Basavapatna
    >> <[email protected]
    <mailto:[email protected]>>; Hemal Shah
    <[email protected] <mailto:[email protected]>>; Ivan Malov
    >> <[email protected] <mailto:[email protected]>>;
    Ferriter, Cian <[email protected]
    <mailto:[email protected]>>; Eli Britstein <[email protected]
    <mailto:[email protected]>>;
    >> Finn, Emma <[email protected] <mailto:[email protected]>>;
    Kovacevic, Marko <[email protected]
    <mailto:[email protected]>>
    >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state
    recover logic.
    >>
    >> Recover the packet if it was partially processed by the HW.
    Fallback to
    >> lookup flow by mark association.
    >>
    >> Signed-off-by: Eli Britstein <[email protected]
    <mailto:[email protected]>>
    >> Reviewed-by: Gaetan Rivet <[email protected]
    <mailto:[email protected]>>
    >> Acked-by: Sriharsha Basavapatna
    <[email protected]
    <mailto:[email protected]>>
    >> Tested-by: Emma Finn <[email protected]
    <mailto:[email protected]>>
    >> Tested-by: Marko Kovacevic <[email protected]
    <mailto:[email protected]>>
    >> Signed-off-by: Ilya Maximets <[email protected]
    <mailto:[email protected]>>
    >> ---
    >>  lib/dpif-netdev.c | 45
    +++++++++++++++++++++++++++++++++++++++++----
    >>  1 file changed, 41 insertions(+), 4 deletions(-)
    >>
    >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    >> index 8fa7eb6d4..d5b7d5b73 100644
    >> --- a/lib/dpif-netdev.c
    >> +++ b/lib/dpif-netdev.c
    >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
    >>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
    >>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
    >>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
    >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
    >>
    >>  /* Protects against changes to 'dp_netdevs'. */
    >>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
    >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct
    dp_netdev_pmd_thread *pmd,
    >> pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT,
    n_smc_hit);
    >>  }
    >>
    >> +static struct tx_port * pmd_send_port_cache_lookup(
    >> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
    >> +
    >> +static inline int
    >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
    >> +                  odp_port_t port_no,
    >> +                  struct dp_packet *packet,
    >> +                  struct dp_netdev_flow **flow)
    >> +{
    >> +    struct tx_port *p;
    >> +    uint32_t mark;
    >> +
    >
    >
    > Start of full HWOL recovery block
    >
    >> +    /* Restore the packet if HW processing was terminated
    before completion. */
    >> +    p = pmd_send_port_cache_lookup(pmd, port_no);
    >> +    if (OVS_LIKELY(p)) {
    >> +        int err =
    netdev_hw_miss_packet_recover(p->port->netdev, packet);
    >> +
    >> +        if (err && err != EOPNOTSUPP) {
    >> + COVERAGE_INC(datapath_drop_hw_miss_recover);
    >> +            return -1;
    >> +        }
    >> +    }
    >
    > End of full HWOL recovery block
    >
    > IIUC, the above is only relevant to full HWOL where the packet
    is partially processed by the HW. In a partial HWOL testcase, we
    see a performance drop as a result of the above code. The
    performance after the patchset is applied is 0.94x what the
    performance was before.

General speaking, adding new code in the datapath is probable to havesome degradation affect, that cannot be avoided completely.

I think performance optimizations for the partial offloads (or to SWdatapath in general, even w/o offloads), can be done on top, like theone suggested by Balazs above, on top of it.

    While reviewing the patch set I noticed this part too, but
    this code was tested twice by Intel engineers, so I figured
    that it doesn't hurt performance of partial offloading.

    In general, it should be easy to re-order partial and full
    offloading checks like this (didn't test):

    diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    index c5ab35d2a..36a5976f2 100644
    --- a/lib/dpif-netdev.c
    +++ b/lib/dpif-netdev.c
    @@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct
    dp_netdev_pmd_thread *pmd,
         struct tx_port *p;
         uint32_t mark;

    +    /* If no mark, no flow to find. */
    +    if (dp_packet_has_flow_mark(packet, &mark)) {
    +        *flow = mark_to_flow_find(pmd, mark);
    +        return 0;
    +    }
    +
    +    *flow = NULL;
    +
         /* Restore the packet if HW processing was terminated before
    completion. */
         p = pmd_send_port_cache_lookup(pmd, port_no);
         if (OVS_LIKELY(p)) {
    @@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct
    dp_netdev_pmd_thread *pmd,
                 return -1;
             }
         }
    -
    -    /* If no mark, no flow to find. */
    -    if (!dp_packet_has_flow_mark(packet, &mark)) {
    -        *flow = NULL;
    -        return 0;
    -    }
    -
    -    *flow = mark_to_flow_find(pmd, mark);
         return 0;
     }
    ---

    If everyone agrees, I can do this change.

No, this is wrong.

mlx5 PMD uses mark (internally set) for the recover. Doing it like thiswill discover a mark (the internal one by the PMD), but won't find anyflow associated with it.


    >
    > We should branch over this code in the partial HWOL scenario,
    where we don't need to get the call to
    pmd_send_port_cache_lookup() and netdev_hw_miss_packet_recover().
    We want this branch to be transparent to the user. Since both full
    and partial HWOL falls under the other_config:hw-offload=true
    switch, we might need a configure time check NIC capabilities
    solution or something similar to branch over full HWOL packet
    recovery code. Does this make sense?

    Compile time check of capabilities doesn't make sense as it's unknown
    in vast majority of cases on which HW the package will run. Code
    should
    be as generic as possible.

    >
    >
    > perf top showing cycles spent per function in my partial HWOL
    scenario. We can see netdev_hw_miss_packet_recover(),
    netdev_offload_dpdk_hw_miss_packet_recover() and
    netdev_is_flow_api_enabled() taking cycles:
    >     28.79%  pmd-c01/id:8  ovs-vswitchd        [.] dp_netdev_input__
    >     13.72%  pmd-c01/id:8  ovs-vswitchd        [.] parse_tcp_flags
    >     11.07%  pmd-c01/id:8  ovs-vswitchd        [.]
    i40e_recv_pkts_vec_avx2
    >     10.94%  pmd-c01/id:8  ovs-vswitchd        [.]
    i40e_xmit_fixed_burst_vec_avx2
    >      7.48%  pmd-c01/id:8  ovs-vswitchd        [.] cmap_find
    >      4.94%  pmd-c01/id:8  ovs-vswitchd        [.]
    netdev_hw_miss_packet_recover
    >      4.77%  pmd-c01/id:8  ovs-vswitchd        [.]
    dp_execute_output_action
    >      3.81%  pmd-c01/id:8  ovs-vswitchd        [.]
    dp_netdev_pmd_flush_output_on_port
    >      3.40%  pmd-c01/id:8  ovs-vswitchd        [.] netdev_send
    >      2.49%  pmd-c01/id:8  ovs-vswitchd        [.]
    netdev_dpdk_eth_send
    >      1.16%  pmd-c01/id:8  ovs-vswitchd        [.]
    netdev_dpdk_rxq_recv
    >      0.90%  pmd-c01/id:8  ovs-vswitchd        [.]
    pmd_perf_end_iteration
    >      0.75%  pmd-c01/id:8  ovs-vswitchd        [.]
    dp_netdev_process_rxq_port
    >      0.68%  pmd-c01/id:8  ovs-vswitchd        [.]
    netdev_is_flow_api_enabled
    >      0.55%  pmd-c01/id:8  ovs-vswitchd        [.]
    netdev_offload_dpdk_hw_miss_packet_recover
    >
    >> +
    >> +    /* If no mark, no flow to find. */
    >> +    if (!dp_packet_has_flow_mark(packet, &mark)) {
    >> +        *flow = NULL;
    >> +        return 0;
    >> +    }
    >> +
    >> +    *flow = mark_to_flow_find(pmd, mark);
    >> +    return 0;
    >> +}
    >> +
    >>  /* Try to process all ('cnt') the 'packets' using only the
    datapath flow cache
    >>   * 'pmd->flow_cache'. If a flow is not found for a packet
    'packets[i]', the
    >>   * miniflow is copied into 'keys' and the packet pointer is
    moved at the
    >> @@ -7106,7 +7140,6 @@ dfc_processing(struct
    dp_netdev_pmd_thread *pmd,
    >>
    >>      DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
    >>          struct dp_netdev_flow *flow;
    >> -        uint32_t mark;
    >>
    >>          if (OVS_UNLIKELY(dp_packet_size(packet) <
    ETH_HEADER_LEN)) {
    >>              dp_packet_delete(packet);
    >> @@ -7125,9 +7158,13 @@ dfc_processing(struct
    dp_netdev_pmd_thread *pmd,
    >>              pkt_metadata_init(&packet->md, port_no);
    >>          }
    >>
    >> -        if ((*recirc_depth_get() == 0) &&
    >> -            dp_packet_has_flow_mark(packet, &mark)) {
    >> -            flow = mark_to_flow_find(pmd, mark);
    >> +        if (netdev_is_flow_api_enabled() &&
    *recirc_depth_get() == 0) {
    >
    > Here we have a per packet call to netdev_is_flow_api_enabled().
    I think that netdev_is_flow_api_enabled() should be inlined if
    it's going to be called per packet. We can see from the above
    "perf top" that it isn't inlined since it shows up as a separate
    function.

    I'd consider "inlining" and moving a lot of stuff to headers harmful
    for the code base as it makes it less readable and it's really hard
    to preserve this kind of things during code modification.
    It's much better to fix the logic instead of hammering the code with
    blind low level optimizations.

    For this particular issue we already have a solution here:
    
https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnem...@redhat.com/
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnem...@redhat.com/>
    In short, we only need to check once per batch.  I think, Balazs
    will be able to re-base his patch on top of this series including
    the check for netdev_is_flow_api_enabled().

    Balazs, will you?

    Best regards, Ilya Maximets.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

Reply via email to