On Tue, Jun 30, 2020 at 2:26 AM Yanqin Wei <yanqin....@arm.com> wrote: > > Hi, every contributor > > These patches could significantly improve multi-flow throughput of userspace > datapath. If you feel it will take too much time to review all patches, I > suggest you could look at the 2nd/3rd first, which have the major improvement > in these patches. > [ovs-dev][PATCH v1 2/6] dpif-netdev: add tunnel_valid flag to skip ip/ipv6 > address comparison > [ovs-dev][PATCH v1 3/6] dpif-netdev: improve emc lookup performance by > contiguous storage of hash value. > > Any comments from anyone are appreciated. > > Best Regards, > Wei Yanqin > > > -----Original Message----- > > From: Yanqin Wei <yanqin....@arm.com> > > Sent: Tuesday, June 2, 2020 3:10 PM > > To: d...@openvswitch.org > > Cc: nd <n...@arm.com>; i.maxim...@ovn.org; u9012...@gmail.com; Malvika > > Gupta <malvika.gu...@arm.com>; Lijian Zhang <lijian.zh...@arm.com>; > > Ruifeng Wang <ruifeng.w...@arm.com>; Lance Yang > > <lance.y...@arm.com>; Yanqin Wei <yanqin....@arm.com> > > Subject: [ovs-dev][PATCH v1 0/6] Memory access optimization for flow > > scalability of userspace datapath. > > > > OVS userspace datapath is a program with heavy memory access. It needs to > > load/store a large number of memory, including packet header, metadata, > > EMC/SMC/DPCLS tables and so on. It causes a lot of cache line missing and > > refilling, which has a great impact on flow scalability. And in some cases, > > EMC > > has a negative impact on the overall performance. It is difficult for user > > to > > dynamically manage the enabling of EMC. > > > > This series of patches improve memory access of userspace datapath as > > follows: > > 1. Reduce the number of metadata cache line accessed by non-tunnel traffic. > > 2. Decrease unnecessary memory load/store for batch/flow. > > 3. Modify the layout of EMC data struct. Centralize the storage of hash > > value. > > > > In the NIC2NIC traffic tests, the overall performance improvement is > > observed, > > especially in multi-flow cases. > > Flows delta > > 1-1K flows 5-10% > > 10K flows 20% > > 100K flows 40% > > EMC disable 10%
Thanks for submitting the patch series. I apply the series and I do see the above performance improvement you describe above. btw, is your number on ARM server or x86? Below is my number using single flow and drop action on Intel(R) Xeon(R) CPU @ 2.00GHz In summary I see around 10% improvement using 1flow. === master === root@instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show pmd thread numa_id 0 core_id 0: packets received: 96269888 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 87513839 smc hits: 0 megaflow hits: 8755584 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 432 avg. packets per output batch: 0.00 idle cycles: 0 (0.00%) processing cycles: 20083008856 (100.00%) avg cycles per packet: 208.61 (20083008856/96269888) avg processing cycles per packet: 208.61 (20083008856/96269888) === master without EMC === pmd thread numa_id 0 core_id 1: packets received: 90775936 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 0 smc hits: 0 megaflow hits: 90775424 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 479 avg. packets per output batch: 0.00 idle cycles: 0 (0.00%) processing cycles: 21239087946 (100.00%) avg cycles per packet: 233.97 (21239087946/90775936) avg processing cycles per packet: 233.97 (21239087946/90775936) === yanqin v1: === pmd thread numa_id 0 core_id 1: packets received: 156582112 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 142344109 smc hits: 0 megaflow hits: 14237554 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 448 avg. packets per output batch: 0.00 idle cycles: 4320112 (0.01%) processing cycles: 30503055968 (99.99%) avg cycles per packet: 194.83 (30507376080/156582112) avg processing cycles per packet: 194.81 (30503055968/156582112) === yanqin v1 without EMC: === pmd thread numa_id 0 core_id 0: packets received: 48441664 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 0 smc hits: 0 megaflow hits: 48441182 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 449 avg. packets per output batch: 0.00 idle cycles: 0 (0.00%) processing cycles: 10513468302 (100.00%) avg cycles per packet: 217.03 (10513468302/48441664) avg processing cycles per packet: 217.03 (10513468302/48441664) _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev