> -----Original Message----- > From: Malvika Gupta [mailto:malvika.gu...@arm.com] > Sent: Wednesday, June 19, 2019 5:17 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com> > Cc: Yanqin Wei (Arm Technology China) <yanqin....@arm.com>; nd <n...@arm.com>; > d...@openvswitch.org > Subject: RE: [ovs-dev] [PATCH v9 5/5] dpif-netdev: add specialized generic > scalar functions > > Hi Harry,
Hi Malvika, > I would like to test your patch on ARM platforms and report any performance > improvement. Can you please tell me the test vectors you used to match the > subtables that call the specialized lookup functions, i.e., > dpcls_subtable_lookup_mf_u0w5_u1w1, dpcls_subtable_lookup_mf_u0w4_u1w1, and > dpcls_subtable_lookup_mf_u0w4_u1w0? And how did you make that distinction > with traffic patterns, i.e., which traffic pattern would match with which > subtable lookup? I was just running Eth/IPv4/UDP data, and using OvS to do Phy to Phy forwarding. ./utilities/ovs-ofctl add-flow br0 in_port=1,action=output:2 ./utilities/ovs-ofctl add-flow br0 in_port=2,action=output:1 This is an extremely simple setup, however it is enough to showcase value of the patches here. By changing traffic to Eth/IPv4 and Eth alone, you can change the # of miniflow items, and hit the different u0_w5_u1_w0 cases :) If you have suggestions for other subtables to special case, please post the u0_wX_u1_wY values that you'd like to see, and I'll add them to the patchset. > Thank you, > Malvika Gupta Testing welcomed, thanks! -Harry > > -----Original Message----- > > From: ovs-dev-boun...@openvswitch.org <ovs-dev- > > boun...@openvswitch.org> On Behalf Of Harry van Haaren > > Sent: Wednesday, May 8, 2019 11:13 PM > > To: ovs-dev@openvswitch.org > > Cc: i.maxim...@samsung.com > > Subject: [ovs-dev] [PATCH v9 5/5] dpif-netdev: add specialized generic > scalar > > functions > > > > This commit adds a number of specialized functions, that handle common > > miniflow fingerprints. This enables compiler optimization, resulting in > higher > > performance. Below a quick description of how this optimization actually > > works; > > > > "Specialized functions" are "instances" of the generic implementation, but > > the compiler is given extra context when compiling. In the case of > iterating > > miniflow datastructures, the most interesting value to enable compile time > > optimizations is the loop trip count per unit. > > > > In order to create a specialized function, there is a generic > implementation, > > which uses a for() loop without the compiler knowing the loop trip count > at > > compile time. The loop trip count is passed in as an argument to the > function: > > > > uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count) { > > for(uint32_t i = 0; i < loop_count; i++) > > // do work > > } > > > > In order to "specialize" the function, we call the generic implementation > with > > hard-coded numbers - these are compile time constants! > > > > uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count) { > > // use hard coded constant for compile-time constant-propogation > > return miniflow_impl_generic(mf, 5); } > > > > Given the compiler is aware of the loop trip count at compile time, it can > > perform an optimization known as "constant propogation". Combined with > > inlining of the miniflow_impl_generic() function, the compiler is now > enabled > > to *compile time* unroll the loop 5x, and produce "flat" code. > > > > The last step to using the specialized functions is to utilize a function- > pointer > > to choose the specialized (or generic) implementation. > > The selection of the function pointer is performed at subtable creation > time, > > when miniflow fingerprint of the subtable is known. This technique is > known > > as "multiple dispatch" in some literature, as it uses multiple items of > > information (miniflow bit counts) to select the dispatch function. > > > > By pointing the function pointer at the optimized implementation, OvS > > benefits from the compile time optimizations at runtime. > > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > > > --- > > > > v8: > > - Rework to use blocks_cache from the dpcls instance, to avoid variable > > lenght arrays in the data-path. > > --- > > lib/dpif-netdev-lookup-generic.c | 83 +++++++++++++++++++++++++++++--- > > lib/dpif-netdev.c | 9 +++- > > lib/dpif-netdev.h | 8 +++ > > 3 files changed, 90 insertions(+), 10 deletions(-) > > > > diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup- > > generic.c > > index 901d28ff0..ba2d024cc 100644 > > --- a/lib/dpif-netdev-lookup-generic.c > > +++ b/lib/dpif-netdev-lookup-generic.c > > @@ -100,11 +100,11 @@ netdev_flow_key_flatten(const struct > > netdev_flow_key * restrict key, > > > > /* Unit 0 flattening */ > > netdev_flow_key_flatten_unit(&pkt_blocks[0], > > - &tbl_blocks[0], > > - &mf_masks[0], > > - &blocks_scratch[0], > > - pkt_bits_u0, > > - u0_count); > > + &tbl_blocks[0], > > + &mf_masks[0], > > + &blocks_scratch[0], > > + pkt_bits_u0, > > + u0_count); > > > > /* Unit 1 flattening: > > * Move the pointers forward in the arrays based on u0 offsets, NOTE: > > @@ -225,7 +225,74 @@ dpcls_subtable_lookup_generic(struct > > dpcls_subtable *subtable, > > const struct netdev_flow_key *keys[], > > struct dpcls_rule **rules) { > > - return lookup_generic_impl(subtable, blocks_scratch, keys_map, > keys, > > - rules, subtable->mf_bits_set_unit0, > > - subtable->mf_bits_set_unit1); > > + /* Here the runtime subtable->mf_bits counts are used, which forces > the > > + * compiler to iterate normal for() loops. Due to this limitation in > the > > + * compilers available optimizations, this function has lower > performance > > + * than the below specialized functions. > > + */ > > + return lookup_generic_impl(subtable, blocks_scratch, keys_map, keys, > > rules, > > + subtable->mf_bits_set_unit0, > > + subtable->mf_bits_set_unit1); } > > + > > +static uint32_t > > +dpcls_subtable_lookup_mf_u0w5_u1w1(struct dpcls_subtable *subtable, > > + uint64_t *blocks_scratch, > > + uint32_t keys_map, > > + const struct netdev_flow_key *keys[], > > + struct dpcls_rule **rules) { > > + /* hard coded bit counts - enables compile time loop unrolling, and > > + * generating of optimized code-sequences due to loop unrolled code. > > + */ > > + return lookup_generic_impl(subtable, blocks_scratch, keys_map, keys, > > rules, > > + 5, 1); > > +} > > + > > +static uint32_t > > +dpcls_subtable_lookup_mf_u0w4_u1w1(struct dpcls_subtable *subtable, > > + uint64_t *blocks_scratch, > > + uint32_t keys_map, > > + const struct netdev_flow_key *keys[], > > + struct dpcls_rule **rules) { > > + return lookup_generic_impl(subtable, blocks_scratch, keys_map, keys, > > rules, > > + 4, 1); > > +} > > + > > +static uint32_t > > +dpcls_subtable_lookup_mf_u0w4_u1w0(struct dpcls_subtable *subtable, > > + uint64_t *blocks_scratch, > > + uint32_t keys_map, > > + const struct netdev_flow_key *keys[], > > + struct dpcls_rule **rules) { > > + return lookup_generic_impl(subtable, blocks_scratch, keys_map, keys, > > rules, > > + 4, 0); > > +} > > + > > +/* Probe function to lookup an available specialized function. > > + * If capable to run the requested miniflow fingerprint, this function > > +returns > > + * the most optimal implementation for that miniflow fingerprint. > > + * @retval FunctionAddress A valid function to handle the miniflow bit > > +pattern > > + * @retval 0 The requested miniflow is not supported here, NULL is > > +returned */ dpcls_subtable_lookup_func > > +dpcls_subtable_generic_probe(uint32_t u0_bits, uint32_t u1_bits) { > > + dpcls_subtable_lookup_func f = NULL; > > + > > + if (u0_bits == 5 && u1_bits == 1) { > > + f = dpcls_subtable_lookup_mf_u0w5_u1w1; > > + } else if (u0_bits == 4 && u1_bits == 1) { > > + f = dpcls_subtable_lookup_mf_u0w4_u1w1; > > + } else if (u0_bits == 4 && u1_bits == 0) { > > + f = dpcls_subtable_lookup_mf_u0w4_u1w0; > > + } > > + > > + if (f) { > > + VLOG_INFO("Subtable using Generic Optimized for u0 %d, u1 %d\n", > > + u0_bits, u1_bits); > > + } > > + return f; > > } > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > 33b93cfdf..23fc5b7a6 > > 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -7623,8 +7623,13 @@ dpcls_create_subtable(struct dpcls *cls, const > > struct netdev_flow_key *mask) > > cls->blocks_scratch_size = blocks_required_per_pkt; > > } > > > > - /* Assign the generic lookup - this works with any miniflow > fingerprint */ > > - subtable->lookup_func = dpcls_subtable_lookup_generic; > > + /* Probe for an optimmized variant */ > > + subtable->lookup_func = dpcls_subtable_generic_probe(unit0, unit1); > > + > > + /* If not set, assign generic lookup - this works with any miniflow > */ > > + if (!subtable->lookup_func) { > > + subtable->lookup_func = dpcls_subtable_lookup_generic; > > + } > > > > cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash); > > /* Add the new subtable at the end of the pvector (with no hits yet) > */ diff > > --git a/lib/dpif-netdev.h b/lib/dpif-netdev.h index 9263256a9..123eabad7 > > 100644 > > --- a/lib/dpif-netdev.h > > +++ b/lib/dpif-netdev.h > > @@ -70,6 +70,14 @@ typedef uint32_t > > (*dpcls_subtable_lookup_func)(struct dpcls_subtable *subtable, > > const struct netdev_flow_key *keys[], > > struct dpcls_rule **rules); > > > > +/* Probe function to select a specialized version of the generic lookup > > + * implementation. This provides performance benefit due to > > +compile-time > > + * optimizations such as loop-unrolling. These are enabled by the > > +compile-time > > + * constants in the specific function implementations. > > + */ > > +dpcls_subtable_lookup_func > > +dpcls_subtable_generic_probe(uint32_t u0_bit_count, uint32_t > > +u1_bit_count); > > + > > /* Prototype for generic lookup func, using same code path as before */ > > uint32_t dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable, > > -- > > 2.17.1 > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev