On 6/25/21 3:56 PM, Eli Britstein wrote: > > On 6/25/2021 3:09 PM, Balazs Nemeth wrote: >> *External email: Use caution opening links or attachments* >> >> >> Ilya, >> >> Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled() >> is called once per batch. I don't expect any issue with that.
Thanks, Balazs! The series is applied now, so you can work on updated patch. > > Thanks Balazs. > >> >> Regards, >> Balazs >> >> On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 6/25/21 1:04 PM, Ferriter, Cian wrote: >> > Hi Eli, >> > >> > I have some concerns about how we plug the packet state recover logic >> into dfc_processing() here. My concerns are inline below. >> > >> > I'm concerned that we are hurting performance in the partial HWOL >> case, as this patchset is introducing new direct (non-inline) function calls >> per packet to the software datapath. We can reduce performance impact in >> this area, see specific suggestions below. >> > >> > Thanks, >> > Cian >> > >> >> -----Original Message----- >> >> From: Eli Britstein <[email protected] <mailto:[email protected]>> >> >> Sent: Wednesday 23 June 2021 16:53 >> >> To: [email protected] <mailto:[email protected]>; Ilya Maximets >> <[email protected] <mailto:[email protected]>> >> >> Cc: Gaetan Rivet <[email protected] <mailto:[email protected]>>; >> Majd Dibbiny <[email protected] <mailto:[email protected]>>; Sriharsha >> Basavapatna >> >> <[email protected] >> <mailto:[email protected]>>; Hemal Shah >> <[email protected] <mailto:[email protected]>>; Ivan Malov >> >> <[email protected] <mailto:[email protected]>>; Ferriter, >> Cian <[email protected] <mailto:[email protected]>>; Eli >> Britstein <[email protected] <mailto:[email protected]>>; >> >> Finn, Emma <[email protected] <mailto:[email protected]>>; >> Kovacevic, Marko <[email protected] >> <mailto:[email protected]>> >> >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state >> recover logic. >> >> >> >> Recover the packet if it was partially processed by the HW. Fallback >> to >> >> lookup flow by mark association. >> >> >> >> Signed-off-by: Eli Britstein <[email protected] >> <mailto:[email protected]>> >> >> Reviewed-by: Gaetan Rivet <[email protected] >> <mailto:[email protected]>> >> >> Acked-by: Sriharsha Basavapatna <[email protected] >> <mailto:[email protected]>> >> >> Tested-by: Emma Finn <[email protected] >> <mailto:[email protected]>> >> >> Tested-by: Marko Kovacevic <[email protected] >> <mailto:[email protected]>> >> >> Signed-off-by: Ilya Maximets <[email protected] >> <mailto:[email protected]>> >> >> --- >> >> lib/dpif-netdev.c | 45 +++++++++++++++++++++++++++++++++++++++++---- >> >> 1 file changed, 41 insertions(+), 4 deletions(-) >> >> >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> >> index 8fa7eb6d4..d5b7d5b73 100644 >> >> --- a/lib/dpif-netdev.c >> >> +++ b/lib/dpif-netdev.c >> >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port); >> >> COVERAGE_DEFINE(datapath_drop_invalid_bond); >> >> COVERAGE_DEFINE(datapath_drop_invalid_tnl_port); >> >> COVERAGE_DEFINE(datapath_drop_rx_invalid_packet); >> >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover); >> >> >> >> /* Protects against changes to 'dp_netdevs'. */ >> >> static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER; >> >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread >> *pmd, >> >> pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, >> n_smc_hit); >> >> } >> >> >> >> +static struct tx_port * pmd_send_port_cache_lookup( >> >> + const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no); >> >> + >> >> +static inline int >> >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd, >> >> + odp_port_t port_no, >> >> + struct dp_packet *packet, >> >> + struct dp_netdev_flow **flow) >> >> +{ >> >> + struct tx_port *p; >> >> + uint32_t mark; >> >> + >> > >> > >> > Start of full HWOL recovery block >> > >> >> + /* Restore the packet if HW processing was terminated before >> completion. */ >> >> + p = pmd_send_port_cache_lookup(pmd, port_no); >> >> + if (OVS_LIKELY(p)) { >> >> + int err = netdev_hw_miss_packet_recover(p->port->netdev, >> packet); >> >> + >> >> + if (err && err != EOPNOTSUPP) { >> >> + COVERAGE_INC(datapath_drop_hw_miss_recover); >> >> + return -1; >> >> + } >> >> + } >> > >> > End of full HWOL recovery block >> > >> > IIUC, the above is only relevant to full HWOL where the packet is >> partially processed by the HW. In a partial HWOL testcase, we see a >> performance drop as a result of the above code. The performance after the >> patchset is applied is 0.94x what the performance was before. >> > General speaking, adding new code in the datapath is probable to have some > degradation affect, that cannot be avoided completely. > > I think performance optimizations for the partial offloads (or to SW datapath > in general, even w/o offloads), can be done on top, like the one suggested by > Balazs above, on top of it. +1 Anyway, I applied the v7 as-is. Performance optimizations could be done on top of it. > > >> While reviewing the patch set I noticed this part too, but >> this code was tested twice by Intel engineers, so I figured >> that it doesn't hurt performance of partial offloading. >> >> In general, it should be easy to re-order partial and full >> offloading checks like this (didn't test): >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index c5ab35d2a..36a5976f2 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct >> dp_netdev_pmd_thread *pmd, >> struct tx_port *p; >> uint32_t mark; >> >> + /* If no mark, no flow to find. */ >> + if (dp_packet_has_flow_mark(packet, &mark)) { >> + *flow = mark_to_flow_find(pmd, mark); >> + return 0; >> + } >> + >> + *flow = NULL; >> + >> /* Restore the packet if HW processing was terminated before >> completion. */ >> p = pmd_send_port_cache_lookup(pmd, port_no); >> if (OVS_LIKELY(p)) { >> @@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct >> dp_netdev_pmd_thread *pmd, >> return -1; >> } >> } >> - >> - /* If no mark, no flow to find. */ >> - if (!dp_packet_has_flow_mark(packet, &mark)) { >> - *flow = NULL; >> - return 0; >> - } >> - >> - *flow = mark_to_flow_find(pmd, mark); >> return 0; >> } >> --- >> >> If everyone agrees, I can do this change. >> > No, this is wrong. > > mlx5 PMD uses mark (internally set) for the recover. Doing it like this will > discover a mark (the internal one by the PMD), but won't find any flow > associated with it. Oh, and netdev_offload_dpdk_hw_miss_packet_recover() resets offloads for that reason, right? Thanks for explanation. I think, it would be great if you can prepare a separate patch with some comments in netdev_offload_dpdk_hw_miss_packet_recover() regarding offloads reset and, maybe, a small comment to the dp_netdev_hw_flow() so we will not re-order checks in the future (it's not obvious that there is a dependency). Something like: "Flow marks can be used for HW miss recovery and could be re-set in the process, therefore flow mark check should always be done after the netdev_hw_miss_packet_recover()." What do you think? > >> >> > >> > We should branch over this code in the partial HWOL scenario, where we >> don't need to get the call to pmd_send_port_cache_lookup() and >> netdev_hw_miss_packet_recover(). We want this branch to be transparent to >> the user. Since both full and partial HWOL falls under the >> other_config:hw-offload=true switch, we might need a configure time check >> NIC capabilities solution or something similar to branch over full HWOL >> packet recovery code. Does this make sense? >> >> Compile time check of capabilities doesn't make sense as it's unknown >> in vast majority of cases on which HW the package will run. Code should >> be as generic as possible. >> >> > >> > >> > perf top showing cycles spent per function in my partial HWOL >> scenario. We can see netdev_hw_miss_packet_recover(), >> netdev_offload_dpdk_hw_miss_packet_recover() and >> netdev_is_flow_api_enabled() taking cycles: >> > 28.79% pmd-c01/id:8 ovs-vswitchd [.] dp_netdev_input__ >> > 13.72% pmd-c01/id:8 ovs-vswitchd [.] parse_tcp_flags >> > 11.07% pmd-c01/id:8 ovs-vswitchd [.] >> i40e_recv_pkts_vec_avx2 >> > 10.94% pmd-c01/id:8 ovs-vswitchd [.] >> i40e_xmit_fixed_burst_vec_avx2 >> > 7.48% pmd-c01/id:8 ovs-vswitchd [.] cmap_find >> > 4.94% pmd-c01/id:8 ovs-vswitchd [.] >> netdev_hw_miss_packet_recover >> > 4.77% pmd-c01/id:8 ovs-vswitchd [.] >> dp_execute_output_action >> > 3.81% pmd-c01/id:8 ovs-vswitchd [.] >> dp_netdev_pmd_flush_output_on_port >> > 3.40% pmd-c01/id:8 ovs-vswitchd [.] netdev_send >> > 2.49% pmd-c01/id:8 ovs-vswitchd [.] netdev_dpdk_eth_send >> > 1.16% pmd-c01/id:8 ovs-vswitchd [.] netdev_dpdk_rxq_recv >> > 0.90% pmd-c01/id:8 ovs-vswitchd [.] >> pmd_perf_end_iteration >> > 0.75% pmd-c01/id:8 ovs-vswitchd [.] >> dp_netdev_process_rxq_port >> > 0.68% pmd-c01/id:8 ovs-vswitchd [.] >> netdev_is_flow_api_enabled >> > 0.55% pmd-c01/id:8 ovs-vswitchd [.] >> netdev_offload_dpdk_hw_miss_packet_recover >> > >> >> + >> >> + /* If no mark, no flow to find. */ >> >> + if (!dp_packet_has_flow_mark(packet, &mark)) { >> >> + *flow = NULL; >> >> + return 0; >> >> + } >> >> + >> >> + *flow = mark_to_flow_find(pmd, mark); >> >> + return 0; >> >> +} >> >> + >> >> /* Try to process all ('cnt') the 'packets' using only the datapath >> flow cache >> >> * 'pmd->flow_cache'. If a flow is not found for a packet >> 'packets[i]', the >> >> * miniflow is copied into 'keys' and the packet pointer is moved at >> the >> >> @@ -7106,7 +7140,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, >> >> >> >> DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { >> >> struct dp_netdev_flow *flow; >> >> - uint32_t mark; >> >> >> >> if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) { >> >> dp_packet_delete(packet); >> >> @@ -7125,9 +7158,13 @@ dfc_processing(struct dp_netdev_pmd_thread >> *pmd, >> >> pkt_metadata_init(&packet->md, port_no); >> >> } >> >> >> >> - if ((*recirc_depth_get() == 0) && >> >> - dp_packet_has_flow_mark(packet, &mark)) { >> >> - flow = mark_to_flow_find(pmd, mark); >> >> + if (netdev_is_flow_api_enabled() && *recirc_depth_get() == >> 0) { >> > >> > Here we have a per packet call to netdev_is_flow_api_enabled(). I >> think that netdev_is_flow_api_enabled() should be inlined if it's going to >> be called per packet. We can see from the above "perf top" that it isn't >> inlined since it shows up as a separate function. >> >> I'd consider "inlining" and moving a lot of stuff to headers harmful >> for the code base as it makes it less readable and it's really hard >> to preserve this kind of things during code modification. >> It's much better to fix the logic instead of hammering the code with >> blind low level optimizations. >> >> For this particular issue we already have a solution here: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnem...@redhat.com/ >> >> <https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnem...@redhat.com/> >> In short, we only need to check once per batch. I think, Balazs >> will be able to re-base his patch on top of this series including >> the check for netdev_is_flow_api_enabled(). >> >> Balazs, will you? >> >> Best regards, Ilya Maximets. >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
