That's definitely interesting. Hopefully someone from mellanox can comment on the performance impact since I haven't seen it qualified.
On Thu, Feb 28, 2019, 18:57 Arvind Narayanan <[email protected]> wrote: > > On Thu, Feb 28, 2019, 8:23 PM Cliff Burdick <[email protected]> wrote: > >> What size packets are you using? I've only steered to 2 rx queues by IP >> dst match, and was able to hit 100Gbps. That's with a 4KB jumboframe. >> > > 64 bytes. Agreed this is small, what seems interesting is l3fwd is able to > handle 64B but rte_flow suffers (a lot) - suggesting offloading is > expensive?! > > I'm doing something similar, steering to different queues based off > dst_ip. However, my tests have around 80 rules, each rule steering to one > of the 20 rx_queues. I have a one-to-one rx_queue-to-core_id mapping. > > Arvind > > > >> On Thu, Feb 28, 2019, 17:42 Arvind Narayanan <[email protected]> >> wrote: >> >>> Hi, >>> >>> I am using DPDK 18.11 on Ubuntu 18.04, with Mellanox Connect X-5 100G >>> EN (MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64). >>> Packet generator: t-rex 2.49 running on another machine. >>> >>> I am able to achieve 100G line rate with l3fwd application (fr sz 64B) >>> using the parameters suggested in their performance report. >>> ( >>> https://fast.dpdk.org/doc/perf/DPDK_18_11_Mellanox_NIC_performance_report.pdf >>> ) >>> >>> However, as soon as I install rte_flow rules to steer packets to >>> different queues and/or use rte_flow's mark action, the throughput >>> reduces to ~41G. I also modified DPDK's flow_filtering example >>> application, and am getting the same reduced throughput of around 41G >>> out of 100G. But without rte_flow, it goes to 100G. >>> >>> I didn't change any OS/Kernel parameters to test l3fwd or the >>> application that uses rte_flow. I also ensure the application is >>> numa-aware and use 20 cores to handle 100G traffic. >>> >>> Upon further investigation (using Mellanox NIC counters), the drop in >>> throughput is due to mbuf allocation errors. >>> >>> Is such performance degradation normal when performing hw-acceleration >>> using rte_flow? >>> Has anyone tested throughput performance using rte_flow @ 100G? >>> >>> Its surprising to see hardware offloading is degrading the >>> performance, unless I am doing something wrong. >>> >>> Thanks, >>> Arvind >>> >>
