Hi Levi, Very impressive work. few points I can think of: I think iperf3 is single threaded and this means that working with parallel mode will not increase the overall throughput on the sending side, it will help OVS workload. maybe the iperf3 is a bottleneck here.
High throughput is achieved by big packet sizes, and this reduces the PPS. OVS SW (like HW), has a work per packet, so lower packet sizes might show the difference here between HW and SW. The packet rate on 25G with mtu of 1500 is about 2500000, if for example you will use 128 bytes it will be 24MPPS. In this simple scenario HW can do with no problem while SW I guess won't (with 4 cores as you described). Another point is concurrency, OVS-DPDK has EMC that has a very nice impact on a low concurrency. high concurrency even thousands (I think) will drop the performance. The last point is complexity of the processing, for example if you have to decap and modify mac it will have higher impact on SW performance, but I guess in this blog you wanted to focus on the most simplified flow. On Mon, May 3, 2021 at 1:47 PM Levente Csikor <levente.csi...@gmail.com> wrote: > > Hi, > > I have been playing around with OvS(-DPDK) for a while, and nowadays, I > am investigating its performance on SmartNICs. > More precisely, the recent SoC-based Mellanox / NVIDIA Bluefield-2 > SmartNIC (or DPU how NVIDIA started to call its product) heavily uses > an OVS(-DPDK) running on its ARM cores when it processes packets from > and to the host system (in SmartNIC mode). > > On this SmartNIC, the OVS kernel datapath can be offloaded to the > hardware with TC flowers. If OVS-DPDK is running on the SmartNIC, it > can also be offloaded to the hardware - essentially, it is done via > DPDK rte_flow (according to this OVS Conf talk - > https://www.openvswitch.org/support/ovscon2019/day2/0951-hw_offload_ovs_con_19-Oz-Mellanox.pdf > ) > > So, even though the different offloading approaches, when the datapath > is offloaded to the hardware and all packets are processed by the > hardware exclusively, the performance should be the same, right? Basically yes, however OVS-DPDK and OVS-Kernel are a bit different in some use cases. For example when using VXLAN the data-path rules will look different. In the presented example it should be the same AFAIK. > In other words, while OVS-DPDK performs much better than the kernel > datapath running on a host system, once offloaded, they are essentially > the same as the same "hardware block" that implements the corresponding > (part of the) datapath. Does this interpretation make sense? > Yes. it works the same on the basic stuff. > Or, since the megaflow cache algorithm is slightly different in each > implementation, in some corner cases (like in the discrepancy of the > megaflow cache presentation - > https://www.youtube.com/watch?v=DSC3m-Bww64), the DPDK-based offloading > should perform better? > The caches only affect the SW, so if a flow is offloaded and you generate an attack, I should not affect the throughput at all, because all is in HW. Thanks, Roni > > More information about my measurements (from which this question has > been born) can be found in a blogpost: > https://medium.com/codex/nvidia-mellanox-bluefield-2-smartnic-hands-on-tutorial-rig-for-dive-part-vii-1417e2e625bf > > Thank you, > Levi > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss