TL;DR: 1) As per v7, the function pointer rework for dpcls 2) Last 2 patches include specialized scalar optimizations
v9: use count_1bits(), ALWAYS_INLINE, and rebased. v8: fixed variable-lenght array issues. Running with Eth/IPv4/UDP traffic should show performance improvements, with EMC/SMC disabled (so just DPCLS traffic), on a simple test case there is a > 15% speedup. Please test this patchset, and report back numbers! Patchset Details; The code is split into 5 patches to make the code traceable during review, as the resulting code is quite different to today's dpcls_lookup. Checkpatch flags two warnings, which I believe to not be sanely fixable due to the way MACROs accept arguments. Running TESTSUITE shows all passing, with ~22 tests being skipped, which is the same as before this patchset. I've tried to get sparse running to check locally, however I'm having issues getting that working. I'll dig in more, however didn't want to delay sending of this patch-set. As the VLAs have been removed (the only warnings I saw in the sparse output) I think this should be clean now. Per patch details: 1) Refactor dpcls_lookup and the subtable for flexibility. In particular, add a function pointer to the subtable structure, which enables "plugging-in" a lookup function at runtime. This enables a number of optimizations in future. 2) and 3) With the function pointer in place, we refactor the existing dpcls_lookup matching code into its own function, and later its own file. To split it to its own file requires making various dpcls data-structures available in the dpif-netdev.h header. 4) Refactor the existing code, to favour compute of flat arrays of miniflows, instead of the MACRO based iteration. This simplifies the code itself, and makes future optimizations possible due to simplified loop structures, and loop trip counts pass in via function arguments. See commit message for more details. 5) This patch implements a select few specialized functions, for handling miniflows with 5-1, 4-1, and 4-0 miniflow unit bit patterns. More of these types of functions can (and should) be added to accelerate other patterns of subtable lookups! See commit message for more details. As always: feedback, suggestions, performance numbers all welcome! Regards -Harry Harry van Haaren (5): dpif-netdev: implement function pointers/subtable dpif-netdev: move dpcls lookup structures to .h dpif-netdev: split out generic lookup function dpif-netdev: refactor generic implementation dpif-netdev: add specialized generic scalar functions lib/automake.mk | 1 + lib/dpif-netdev-lookup-generic.c | 298 +++++++++++++++++++++++++++++++ lib/dpif-netdev.c | 195 ++++++++++---------- lib/dpif-netdev.h | 94 ++++++++++ 4 files changed, 491 insertions(+), 97 deletions(-) create mode 100644 lib/dpif-netdev-lookup-generic.c -- 2.17.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev