On Thu, Jun 13, 2019 at 06:38:07PM +0800, Yanqin Wei wrote: > Userspace datapath needs to traverse through miniflow values many times. In > this process, 'count_1bits' operation for 'Flowmap' significantly impact > performance. On arm, this function was defined by portable implementation > because gcc for arm does not support popcnt feature. > But in the aarch64, VCNT neon instruction can accelerate "count_1bits". > From Gcc-7, the built-in function is implemented with neon intruction. > In this patch, count_1bits function will be impelmented with gcc built-in > from gcc-7 on, and with neon intrinsics in gcc-6. > Performance test was run in two aarch64 machines. In the NIC2NIC test, one > tuple dpcls lookup case achieves around 4% throughput improvement and > 10(average) tuples case achieves around 5% improvement. > > Tested-by: Malvika Gupta <[email protected]> > Signed-off-by: Yanqin Wei <[email protected]>
Thanks! I applied this to master. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
