>-----Original Message----- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Monday, November 14, 2016 10:45 PM >To: Bodireddy, Bhanuprakash <bhanuprakash.bodire...@intel.com> >Cc: d...@openvswitch.org; Jarno Rajahalme <ja...@ovn.org> >Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK >classifier. > > > >2016-11-14 4:10 GMT-08:00 Bodireddy, Bhanuprakash ><bhanuprakash.bodire...@intel.com>: >Hello daniele, > >Did you get a chance to review v4 of the remaining 4 patches in this >series? Also I have sent v5 of patch "dpcls: Use 32 packet batches for >lookups" separately based on your comments. > >Bhanu Prakash. > >Hi Bhanu, >I merged almost everything to master, with minor style fixes (I had to convert >some tabs to spaces). >Thanks for your work! >The only patch I left out is "cmap: Remove prefetching in cmap_find_batch()." >With the patch applied (and emc disabled), compared to current master I see: >* A very small improvement with a single megaflow, single stream of 64 bytes >UDP packets >* A more significant drop with 1000 megaflows, 1000 streams of 64 bytes UDP >packets >I just remembered that we also have a benchmark for the cmap: > >tests/ovstest test-cmap benchmark 2000000 1 0.1 32 >I run it with and without the patch and the patch seems to make the >benchmark slower (in particular I'm talking about the "batch search:" row)
Thanks Daniele for merging the patches. We did some benchmarks with tens of flows and found no drop in performance. However its fine to drop this patch if you see a different behavior in your benchmarks. I don't know that we have a benchmark test for cmap, that’s really helpful indeed. Regards, Bhanu Prakash. > >As expected, it appears that prefetching introduces overhead for small cmaps >(1 or 2 flows) but it makes the performance better with bigger cmaps. >What do you think? Have you tried this with bigger flow tables? >Thanks, >Daniele > > >>-----Original Message----- >>From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Bodireddy, >>Bhanuprakash >>Sent: Tuesday, October 18, 2016 5:24 PM >>To: Daniele Di Proietto <diproiet...@ovn.org> >>Cc: d...@openvswitch.org >>Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK >>classifier. >> >>Thanks daniele. Will send on the remaining patches with appropriate tags. >> >>Regards, >>Bhanu Prakash. >> >>>-----Original Message----- >>>From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >>>Sent: Tuesday, October 18, 2016 4:04 AM >>>To: Bodireddy, Bhanuprakash <bhanuprakash.bodire...@intel.com> >>>Cc: d...@openvswitch.org >>>Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS- >DPDK >>>classifier. >>> >>>Thanks for the series, I applied most of it to master. >>>I sent some comments on the few remaining patches. >>>Thanks again, >>>Daniele >>> >>>2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy >>><bhanuprakash.bodire...@intel.com>: >>>This patch series is aimed at improving the performance of OVS-DPDK >>>dpcls. >>> >>>With few thousand flows installed, the EMC becomes inefficient due to >>>thrashing and the bottleneck moves to the dpcls. In EMC disabled case, >>>through VTune we found that significant performance degradation is due >>>to LLC thrashing, memory latency, machine clears and expensive hash >>>computation. >>> >>>This first patch-set improves the dpcls performance by 15% (+1 Mpps) >>>when EMC is disabled and OVS-DPDK built with CFLAGS="-O2 -g". >>> >>>Bhanuprakash Bodireddy (12): >>> dpcls: Use 32 packet batches for lookups. >>> Comment: ~120k performance throughput improvement. >>> >>> flow: Add comments to mf_get_next_in_map(). >>> Comment: Add comments to the function. >>> >>> flow: Skip invoking expensive count_1bits() with zero input. >>> Comment: ~630k performance throughput improvement. >>> >>> hash: Skip invoking mhash_add__() with zero input. >>> Comment: ~150k performance throughput improvement. >>> >>> dpif-netdev: Add comments to dp_netdev_input__(). >>> Comment: Add comments to the function. >>> >>> cmap: Remove prefetching in cmap_find_batch(). >>> Comment: ~39k performance throughput improvement. >>> >>> dpif-netdev: Cache align netdev_flow_keys. >>> Comment: ~170k performance throughput improvement in EMC >>>enabled case. >>> >>> dpif-netdev: Reorder elements in dp_netdev_port structure. >>> dpif: Reorder elements in dpif_upcall structure. >>> ovsdb: Reorder elements in ovsdb_table_schema structure. >>> netlink-socket: Reorder elements in nl_dump structure. >>> timeval: Reorder elements in clock structure. >>> Comment: Reorder memeber variables of the structures to reduce >>> pad bytes and there by the memory footprint. >>> >>> lib/cmap.c | 8 +--- >>> lib/dpif-netdev.c | 123 >>>+++++++++++++++++++++++---------------------------- >>> lib/dpif.h | 5 ++- >>> lib/flow.h | 47 +++++++++++++++----- >>> lib/hash.h | 5 +++ >>> lib/netlink-socket.h | 6 +-- >>> lib/timeval.c | 4 +- >>> ovsdb/table.h | 4 +- >>> 8 files changed, 111 insertions(+), 91 deletions(-) >>> >>>-- >>>2.4.11 >>> >>>_______________________________________________ >>>dev mailing list >>>d...@openvswitch.org >>>http://openvswitch.org/mailman/listinfo/dev >> >>_______________________________________________ >>dev mailing list >>d...@openvswitch.org >>http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev