Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
Hi All, thanks for your feedback. We published a patchset v1 at http://patchwork.ozlabs.org/patch/775505/ please feel free to review. Thanks, Antonio > -Original Message- > From: Wang, Yipeng1 > Sent: Wednesday, May 3, 2017 12:04 AM > To: Darrell Ball <db...@vmware.com>; d...@openvswitch.org; ja...@ovn.org; > jan.scheur...@ericsson.com > Cc: Tai, Charlie <charlie@intel.com>; Wang, Ren <ren.w...@intel.com>; > Gobriel, Sameh <sameh.gobr...@intel.com>; Fischetti, Antonio > <antonio.fische...@intel.com> > Subject: RE: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to > Accelerate Megaflow Search > > Thank you Darrell for the comment, we collect some data with the scalar > version, please see my reply inlined. Our newest results show good > speedup for both scalar and AVX version. > > We are still waiting for more feedback before implementing version 2. > Please feel free to comment on the patch. > > Thank you. > > > -Original Message- > > From: Darrell Ball [mailto:db...@vmware.com] > > Sent: Wednesday, April 26, 2017 10:04 PM > > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org > > Cc: Tai, Charlie <charlie....@intel.com>; Wang, Ren > <ren.w...@intel.com>; > > Gobriel, Sameh <sameh.gobr...@intel.com> > > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor > to > > Accelerate Megaflow Search > > > > > > > > On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote: > > > > Thank you Darrell for the comments. Please take a look at my reply > inlined. > > > > > > > > > -----Original Message----- > > > > > From: Darrell Ball [mailto:db...@vmware.com] > > > > > Sent: Thursday, April 13, 2017 10:36 PM > > > > > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org > > > > > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo > Distributor > > to > > > > > Accelerate Megaflow Search > > > > > > > > > > > > > > > > > > > > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of > > > > > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on > > behalf of > > > > > yipeng1.w...@intel.com> wrote: > > > > > > > > > > From: Yipeng Wang <yipeng1.w...@intel.com> > > > > > > > > > > The Datapath Classifier uses tuple space search for flow > classification. > > > > > The rules are arranged into a set of tuples/subtables (each > with a > > > > > distinct mask). Each subtable is implemented as a hash table > and > > lookup > > > > > is done with flow keys formed by selecting the bits from the > packet > > header > > > > > based on each subtable's mask. Tuple space search will > sequentially > > search > > > > > each subtable until a match is found. With a large number of > subtables, > > a > > > > > sequential search of the subtables could consume a lot of CPU > cycles. > > In > > > > > a testbench with a uniform traffic pattern equally distributed > across 20 > > > > > subtables, we measured that up to 65% of total execution time > is > > attributed > > > > > to the megaflow cache lookup. > > > > > > > > > > This patch presents the idea of the two-layer hierarchical > lookup, > > where a > > > > > low overhead first level of indirection is accessed first, we > call this > > > > > level cuckoo distributor (CD). If a flow key has been inserted > in the flow > > > > > table the first level will indicate with high probability that > which > > > > > subtable to look into. A lookup is performed on the second > level (the > > > > > target subtable) to retrieve the result. If the key doesn’t > have a match, > > > > > then we revert back to the sequential search of subtables. > > > > > > > > > > This patch can improve the already existing Subtable Ranking > when > > traffic > > > > > data has high entropy. Subtable Ranking helps minimize the > number of > >
Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
Thank you Darrell for the comment, we collect some data with the scalar version, please see my reply inlined. Our newest results show good speedup for both scalar and AVX version. We are still waiting for more feedback before implementing version 2. Please feel free to comment on the patch. Thank you. > -Original Message- > From: Darrell Ball [mailto:db...@vmware.com] > Sent: Wednesday, April 26, 2017 10:04 PM > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org > Cc: Tai, Charlie <charlie@intel.com>; Wang, Ren <ren.w...@intel.com>; > Gobriel, Sameh <sameh.gobr...@intel.com> > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to > Accelerate Megaflow Search > > > > On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote: > > Thank you Darrell for the comments. Please take a look at my reply > inlined. > > > > > -Original Message- > > > From: Darrell Ball [mailto:db...@vmware.com] > > > Sent: Thursday, April 13, 2017 10:36 PM > > > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org > > > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor > to > > > Accelerate Megaflow Search > > > > > > > > > > > > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of > > > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on > behalf of > > > yipeng1.w...@intel.com> wrote: > > > > > > From: Yipeng Wang <yipeng1.w...@intel.com> > > > > > > The Datapath Classifier uses tuple space search for flow > classification. > > > The rules are arranged into a set of tuples/subtables (each with a > > > distinct mask). Each subtable is implemented as a hash table and > lookup > > > is done with flow keys formed by selecting the bits from the packet > header > > > based on each subtable's mask. Tuple space search will sequentially > search > > > each subtable until a match is found. With a large number of > subtables, > a > > > sequential search of the subtables could consume a lot of CPU > cycles. > In > > > a testbench with a uniform traffic pattern equally distributed > across 20 > > > subtables, we measured that up to 65% of total execution time is > attributed > > > to the megaflow cache lookup. > > > > > > This patch presents the idea of the two-layer hierarchical lookup, > where a > > > low overhead first level of indirection is accessed first, we call > this > > > level cuckoo distributor (CD). If a flow key has been inserted in > the flow > > > table the first level will indicate with high probability that which > > > subtable to look into. A lookup is performed on the second level > (the > > > target subtable) to retrieve the result. If the key doesn’t have a > match, > > > then we revert back to the sequential search of subtables. > > > > > > This patch can improve the already existing Subtable Ranking when > traffic > > > data has high entropy. Subtable Ranking helps minimize the number of > > > traversed subtables when most of the traffic hit the same subtable. > > > However, in the case of high entropy traffic such as traffic coming > from > > > a physical port, multiple subtables could be hit with a similar > frequency. > > > In this case the average subtable lookups per hit would be much > greater > > > than 1. In addition, CD can adaptively turn off when it finds the > traffic > > > mostly hit one subtable. Thus, CD will not be an overhead when > Subtable > > > Ranking works well. > > > > > > Scheme: > > > > > > --- > > > | CD | > > > --- > > >\ > > > \ > > > - - - > > > |sub ||sub |...|sub | > > > |table||table| |table| > > > - - - > > > > > > Evaluation: > > > > > > We create set of rules with various src IP. We feed traffic > containi
Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote: Thank you Darrell for the comments. Please take a look at my reply inlined. > -Original Message- > From: Darrell Ball [mailto:db...@vmware.com] > Sent: Thursday, April 13, 2017 10:36 PM > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to > Accelerate Megaflow Search > > > > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of > yipeng1.w...@intel.com> wrote: > > From: Yipeng Wang <yipeng1.w...@intel.com> > > The Datapath Classifier uses tuple space search for flow classification. > The rules are arranged into a set of tuples/subtables (each with a > distinct mask). Each subtable is implemented as a hash table and lookup > is done with flow keys formed by selecting the bits from the packet header > based on each subtable's mask. Tuple space search will sequentially search > each subtable until a match is found. With a large number of subtables, a > sequential search of the subtables could consume a lot of CPU cycles. In > a testbench with a uniform traffic pattern equally distributed across 20 > subtables, we measured that up to 65% of total execution time is attributed > to the megaflow cache lookup. > > This patch presents the idea of the two-layer hierarchical lookup, where a > low overhead first level of indirection is accessed first, we call this > level cuckoo distributor (CD). If a flow key has been inserted in the flow > table the first level will indicate with high probability that which > subtable to look into. A lookup is performed on the second level (the > target subtable) to retrieve the result. If the key doesn’t have a match, > then we revert back to the sequential search of subtables. > > This patch can improve the already existing Subtable Ranking when traffic > data has high entropy. Subtable Ranking helps minimize the number of > traversed subtables when most of the traffic hit the same subtable. > However, in the case of high entropy traffic such as traffic coming from > a physical port, multiple subtables could be hit with a similar frequency. > In this case the average subtable lookups per hit would be much greater > than 1. In addition, CD can adaptively turn off when it finds the traffic > mostly hit one subtable. Thus, CD will not be an overhead when Subtable > Ranking works well. > > Scheme: > > --- > | CD | > --- >\ > \ > - - - > |sub ||sub |...|sub | > |table||table| |table| > - - - > > Evaluation: > > We create set of rules with various src IP. We feed traffic containing 1 > million flows with various src IP and dst IP. All the flows hit 10/20/30 > rules creating 10/20/30 subtables. > > The table below shows the preliminary continuous testing results (full line > speed test) we collected with a uni-directional port-to-port setup. The > machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS > runs with 1 PMD. We use Spirent as the hardware traffic generator. > > no.subtable: 10 20 30 > cd-ovs 3895961 3170530 2968555 > orig-ovs 2683455 1646227 1240501 > speedup 1.45x 1.92x 2.39x > > > I have a few initial comments. > 1) Can you present the numbers with and without __AVX2__ “enabled”.' [Wang, Yipeng] We mainly test with AVX2 to find the upper-bound performance speedup of the design. Throughput-wise, we have not optimized for the scalar version thus we did not present the results. If people are interested in this patch, we will update the implementation to consider the performance for bo
Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
Thank you Darrell for the comments. Please take a look at my reply inlined. > -Original Message- > From: Darrell Ball [mailto:db...@vmware.com] > Sent: Thursday, April 13, 2017 10:36 PM > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to > Accelerate Megaflow Search > > > > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of > yipeng1.w...@intel.com> wrote: > > From: Yipeng Wang <yipeng1.w...@intel.com> > > The Datapath Classifier uses tuple space search for flow classification. > The rules are arranged into a set of tuples/subtables (each with a > distinct mask). Each subtable is implemented as a hash table and lookup > is done with flow keys formed by selecting the bits from the packet header > based on each subtable's mask. Tuple space search will sequentially search > each subtable until a match is found. With a large number of subtables, a > sequential search of the subtables could consume a lot of CPU cycles. In > a testbench with a uniform traffic pattern equally distributed across 20 > subtables, we measured that up to 65% of total execution time is > attributed > to the megaflow cache lookup. > > This patch presents the idea of the two-layer hierarchical lookup, where a > low overhead first level of indirection is accessed first, we call this > level cuckoo distributor (CD). If a flow key has been inserted in the flow > table the first level will indicate with high probability that which > subtable to look into. A lookup is performed on the second level (the > target subtable) to retrieve the result. If the key doesn’t have a match, > then we revert back to the sequential search of subtables. > > This patch can improve the already existing Subtable Ranking when traffic > data has high entropy. Subtable Ranking helps minimize the number of > traversed subtables when most of the traffic hit the same subtable. > However, in the case of high entropy traffic such as traffic coming from > a physical port, multiple subtables could be hit with a similar frequency. > In this case the average subtable lookups per hit would be much greater > than 1. In addition, CD can adaptively turn off when it finds the traffic > mostly hit one subtable. Thus, CD will not be an overhead when Subtable > Ranking works well. > > Scheme: > > --- > | CD | > --- >\ > \ > - - - > |sub ||sub |...|sub | > |table||table| |table| > - - - > > Evaluation: > > We create set of rules with various src IP. We feed traffic containing 1 > million flows with various src IP and dst IP. All the flows hit 10/20/30 > rules creating 10/20/30 subtables. > > The table below shows the preliminary continuous testing results (full > line > speed test) we collected with a uni-directional port-to-port setup. The > machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS > runs with 1 PMD. We use Spirent as the hardware traffic generator. > > no.subtable: 10 20 30 > cd-ovs 3895961 3170530 2968555 > orig-ovs 2683455 1646227 1240501 > speedup 1.45x 1.92x 2.39x > > > I have a few initial comments. > 1) Can you present the numbers with and without __AVX2__ “enabled”.' [Wang, Yipeng] We mainly test with AVX2 to find the upper-bound performance speedup of the design. Throughput-wise, we have not optimized for the scalar version thus we did not present the results. If people are interested in this patch, we will update the implementation to consider the performance for both AVX and scalar in Version 2 and report the results. We may design different structure (mainly different entry count per bucket) for scalar and AVX to optimize the performance. > 2) Can you present the numbers with say 2 and say 10 flows for some > comparison. [Wang, Yipeng] As long as flows cannot all fit in EMC, CD should benefit. Generally, CD benefit more when there are more flows fall out of EMC. We collect the new results and report them as following: 2 flows: no.subtable: 10 20 30 cd-ovs 4267332 3478251 3126763 orig-ovs 3260883 2174551 1689981 speedup 1.31x 1.60x1.85x 10 flows: no.subtable: 10 20 30 cd-ovs 40157833276100 297064
Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
Thank you Jarno for pointing it out. I think we have been using default master head for comparison with conditional EMC insert for both orig and cd-ovs. To double-check I explicitly set emc_insert_inv_prob=100 and test again. The results are similar to the previous one. CD should benefit throughput as long as there are EMC misses. Since our test case includes 1M flows, most of flows will miss in EMC no matter what the insertion rate is. Thanks Yipeng -Original Message- From: Jarno Rajahalme [mailto:ja...@ovn.org] Sent: Tuesday, April 11, 2017 2:45 PM To: Fischetti, Antonio <antonio.fische...@intel.com> Cc: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org; jan.scheur...@ericsson.com Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search Thanks for your contribution! I haven’t looked at the patch yet, but based on the description I’d ask you to characterize the difference and/or feature interaction with the Exact Match Cache (EMC). Particularly, does “orig-ovs” and/or “cd-ovs” have the conditional EMC feature (merged to master on Feb 16th, 2017), and what values of “emc-insert-inv-prob” were used for comparison? Jarno > On Apr 11, 2017, at 12:21 AM, Fischetti, Antonio > <antonio.fische...@intel.com> wrote: > > Any comment on this patch? > As this is a sort of an addition to Subtable Ranking, maybe Jarno > and/or Jan could have some comments? > > Thanks, > Antonio > >> -Original Message- >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >> boun...@openvswitch.org] On Behalf Of yipeng1.w...@intel.com >> Sent: Thursday, April 6, 2017 10:48 PM >> To: d...@openvswitch.org >> Subject: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to >> Accelerate Megaflow Search >> >> From: Yipeng Wang <yipeng1.w...@intel.com> >> >> The Datapath Classifier uses tuple space search for flow classification. >> The rules are arranged into a set of tuples/subtables (each with a >> distinct mask). Each subtable is implemented as a hash table and >> lookup is done with flow keys formed by selecting the bits from the >> packet header based on each subtable's mask. Tuple space search will >> sequentially search each subtable until a match is found. With a >> large number of subtables, a sequential search of the subtables could >> consume a lot of CPU cycles. In a testbench with a uniform traffic >> pattern equally distributed across 20 subtables, we measured that up >> to 65% of total execution time is attributed to the megaflow cache >> lookup. >> >> This patch presents the idea of the two-layer hierarchical lookup, >> where a low overhead first level of indirection is accessed first, we >> call this level cuckoo distributor (CD). If a flow key has been >> inserted in the flow table the first level will indicate with high >> probability that which subtable to look into. A lookup is performed >> on the second level (the target subtable) to retrieve the result. If >> the key doesn’t have a match, then we revert back to the sequential search >> of subtables. >> >> This patch can improve the already existing Subtable Ranking when >> traffic data has high entropy. Subtable Ranking helps minimize the >> number of traversed subtables when most of the traffic hit the same subtable. >> However, in the case of high entropy traffic such as traffic coming >> from a physical port, multiple subtables could be hit with a similar >> frequency. >> In this case the average subtable lookups per hit would be much >> greater than 1. In addition, CD can adaptively turn off when it finds >> the traffic mostly hit one subtable. Thus, CD will not be an overhead >> when Subtable Ranking works well. >> >> Scheme: >> >> --- >>| CD | >> --- >> \ >>\ >> - - - >> |sub ||sub |...|sub | >> |table||table| |table| >> - - - >> >> Evaluation: >> >> We create set of rules with various src IP. We feed traffic >> containing 1 million flows with various src IP and dst IP. All the >> flows hit 10/20/30 rules creating 10/20/30 subtables. >> >> The table below shows the preliminary continuous testing results >> (full line speed test) we collected with a uni-directional >> port-to-port setup. The machine we tested on is a Xeon E5 server >> running with 2.2GHz cores. OvS runs with 1 PMD. We use Spirent as the >> hardware traffic generator. >> >> no.subtable: 10 20 30 >> c
Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
Thanks for your contribution! I haven’t looked at the patch yet, but based on the description I’d ask you to characterize the difference and/or feature interaction with the Exact Match Cache (EMC). Particularly, does “orig-ovs” and/or “cd-ovs” have the conditional EMC feature (merged to master on Feb 16th, 2017), and what values of “emc-insert-inv-prob” were used for comparison? Jarno > On Apr 11, 2017, at 12:21 AM, Fischetti, Antonio > <antonio.fische...@intel.com> wrote: > > Any comment on this patch? > As this is a sort of an addition to Subtable Ranking, maybe > Jarno and/or Jan could have some comments? > > Thanks, > Antonio > >> -Original Message- >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >> boun...@openvswitch.org] On Behalf Of yipeng1.w...@intel.com >> Sent: Thursday, April 6, 2017 10:48 PM >> To: d...@openvswitch.org >> Subject: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to >> Accelerate Megaflow Search >> >> From: Yipeng Wang <yipeng1.w...@intel.com> >> >> The Datapath Classifier uses tuple space search for flow classification. >> The rules are arranged into a set of tuples/subtables (each with a >> distinct mask). Each subtable is implemented as a hash table and lookup >> is done with flow keys formed by selecting the bits from the packet header >> based on each subtable's mask. Tuple space search will sequentially search >> each subtable until a match is found. With a large number of subtables, a >> sequential search of the subtables could consume a lot of CPU cycles. In >> a testbench with a uniform traffic pattern equally distributed across 20 >> subtables, we measured that up to 65% of total execution time is >> attributed >> to the megaflow cache lookup. >> >> This patch presents the idea of the two-layer hierarchical lookup, where a >> low overhead first level of indirection is accessed first, we call this >> level cuckoo distributor (CD). If a flow key has been inserted in the flow >> table the first level will indicate with high probability that which >> subtable to look into. A lookup is performed on the second level (the >> target subtable) to retrieve the result. If the key doesn’t have a match, >> then we revert back to the sequential search of subtables. >> >> This patch can improve the already existing Subtable Ranking when traffic >> data has high entropy. Subtable Ranking helps minimize the number of >> traversed subtables when most of the traffic hit the same subtable. >> However, in the case of high entropy traffic such as traffic coming from >> a physical port, multiple subtables could be hit with a similar frequency. >> In this case the average subtable lookups per hit would be much greater >> than 1. In addition, CD can adaptively turn off when it finds the traffic >> mostly hit one subtable. Thus, CD will not be an overhead when Subtable >> Ranking works well. >> >> Scheme: >> >> --- >>| CD | >> --- >> \ >>\ >> - - - >> |sub ||sub |...|sub | >> |table||table| |table| >> - - - >> >> Evaluation: >> >> We create set of rules with various src IP. We feed traffic containing 1 >> million flows with various src IP and dst IP. All the flows hit 10/20/30 >> rules creating 10/20/30 subtables. >> >> The table below shows the preliminary continuous testing results (full >> line >> speed test) we collected with a uni-directional port-to-port setup. The >> machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS >> runs with 1 PMD. We use Spirent as the hardware traffic generator. >> >> no.subtable: 10 20 30 >> cd-ovs 3895961 3170530 2968555 >> orig-ovs 2683455 1646227 1240501 >> speedup 1.45x 1.92x 2.39x >> >> Signed-off-by: Yipeng Wang >> Signed-off-by: Charlie Tai >> Co-authored-by: Charlie Tai >> Signed-off-by: Sameh Gobriel >> Co-authored-by: Sameh Gobriel >> Signed-off-by: Ren Wang >> Co-authored-by: Ren Wang >> Signed-off-by: Antonio Fischetti >> Co-authored-by: Antonio Fischetti >> --- >> lib/dpif-netdev.c | 654 >> -- >> tests/ofproto-dpif.at | 3 +- >> 2 files changed, 633 insertions(+), 24 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index a14a2eb..d9a883b 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netd
Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
Any comment on this patch? As this is a sort of an addition to Subtable Ranking, maybe Jarno and/or Jan could have some comments? Thanks, Antonio > -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of yipeng1.w...@intel.com > Sent: Thursday, April 6, 2017 10:48 PM > To: d...@openvswitch.org > Subject: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to > Accelerate Megaflow Search > > From: Yipeng Wang <yipeng1.w...@intel.com> > > The Datapath Classifier uses tuple space search for flow classification. > The rules are arranged into a set of tuples/subtables (each with a > distinct mask). Each subtable is implemented as a hash table and lookup > is done with flow keys formed by selecting the bits from the packet header > based on each subtable's mask. Tuple space search will sequentially search > each subtable until a match is found. With a large number of subtables, a > sequential search of the subtables could consume a lot of CPU cycles. In > a testbench with a uniform traffic pattern equally distributed across 20 > subtables, we measured that up to 65% of total execution time is > attributed > to the megaflow cache lookup. > > This patch presents the idea of the two-layer hierarchical lookup, where a > low overhead first level of indirection is accessed first, we call this > level cuckoo distributor (CD). If a flow key has been inserted in the flow > table the first level will indicate with high probability that which > subtable to look into. A lookup is performed on the second level (the > target subtable) to retrieve the result. If the key doesn’t have a match, > then we revert back to the sequential search of subtables. > > This patch can improve the already existing Subtable Ranking when traffic > data has high entropy. Subtable Ranking helps minimize the number of > traversed subtables when most of the traffic hit the same subtable. > However, in the case of high entropy traffic such as traffic coming from > a physical port, multiple subtables could be hit with a similar frequency. > In this case the average subtable lookups per hit would be much greater > than 1. In addition, CD can adaptively turn off when it finds the traffic > mostly hit one subtable. Thus, CD will not be an overhead when Subtable > Ranking works well. > > Scheme: > > --- > | CD | > --- >\ > \ > - - - > |sub ||sub |...|sub | > |table||table| |table| > - - - > > Evaluation: > > We create set of rules with various src IP. We feed traffic containing 1 > million flows with various src IP and dst IP. All the flows hit 10/20/30 > rules creating 10/20/30 subtables. > > The table below shows the preliminary continuous testing results (full > line > speed test) we collected with a uni-directional port-to-port setup. The > machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS > runs with 1 PMD. We use Spirent as the hardware traffic generator. > > no.subtable: 10 20 30 > cd-ovs 3895961 3170530 2968555 > orig-ovs 2683455 1646227 1240501 > speedup 1.45x 1.92x 2.39x > > Signed-off-by: Yipeng Wang > Signed-off-by: Charlie Tai > Co-authored-by: Charlie Tai > Signed-off-by: Sameh Gobriel > Co-authored-by: Sameh Gobriel > Signed-off-by: Ren Wang > Co-authored-by: Ren Wang > Signed-off-by: Antonio Fischetti > Co-authored-by: Antonio Fischetti > --- > lib/dpif-netdev.c | 654 > -- > tests/ofproto-dpif.at | 3 +- > 2 files changed, 633 insertions(+), 24 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index a14a2eb..d9a883b 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -79,11 +79,23 @@ > > VLOG_DEFINE_THIS_MODULE(dpif_netdev); > > +/* Length of Subtable table for cuckoo distributor to index subtables. > + * The size of the table is at most 256 entires because the CD's entry > only > + * provides 1 byte for indexing. > + */ > +#define SUBTABLE_TABLE_LENGTH 256 > + > #define FLOW_DUMP_MAX_BATCH 50 > /* Use per thread recirc_depth to prevent recirculation loop. */ > #define MAX_RECIRC_DEPTH 5 > DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0) > > + > +#define CD_DEBUG 0 > +#define debug_print(...) \ > +do { if (CD_DEBUG) fprintf(stderr, __VA_ARGS__); } while (0) > + > + > /* Configuration parameters. */ > enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. > */ > enum { MAX_METERS = 65536 };/* Maxim
[ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search
From: Yipeng WangThe Datapath Classifier uses tuple space search for flow classification. The rules are arranged into a set of tuples/subtables (each with a distinct mask). Each subtable is implemented as a hash table and lookup is done with flow keys formed by selecting the bits from the packet header based on each subtable's mask. Tuple space search will sequentially search each subtable until a match is found. With a large number of subtables, a sequential search of the subtables could consume a lot of CPU cycles. In a testbench with a uniform traffic pattern equally distributed across 20 subtables, we measured that up to 65% of total execution time is attributed to the megaflow cache lookup. This patch presents the idea of the two-layer hierarchical lookup, where a low overhead first level of indirection is accessed first, we call this level cuckoo distributor (CD). If a flow key has been inserted in the flow table the first level will indicate with high probability that which subtable to look into. A lookup is performed on the second level (the target subtable) to retrieve the result. If the key doesn’t have a match, then we revert back to the sequential search of subtables. This patch can improve the already existing Subtable Ranking when traffic data has high entropy. Subtable Ranking helps minimize the number of traversed subtables when most of the traffic hit the same subtable. However, in the case of high entropy traffic such as traffic coming from a physical port, multiple subtables could be hit with a similar frequency. In this case the average subtable lookups per hit would be much greater than 1. In addition, CD can adaptively turn off when it finds the traffic mostly hit one subtable. Thus, CD will not be an overhead when Subtable Ranking works well. Scheme: --- | CD | --- \ \ - - - |sub ||sub |...|sub | |table||table| |table| - - - Evaluation: We create set of rules with various src IP. We feed traffic containing 1 million flows with various src IP and dst IP. All the flows hit 10/20/30 rules creating 10/20/30 subtables. The table below shows the preliminary continuous testing results (full line speed test) we collected with a uni-directional port-to-port setup. The machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS runs with 1 PMD. We use Spirent as the hardware traffic generator. no.subtable: 10 20 30 cd-ovs 3895961 3170530 2968555 orig-ovs 2683455 1646227 1240501 speedup 1.45x 1.92x 2.39x Signed-off-by: Yipeng Wang Signed-off-by: Charlie Tai Co-authored-by: Charlie Tai Signed-off-by: Sameh Gobriel Co-authored-by: Sameh Gobriel Signed-off-by: Ren Wang Co-authored-by: Ren Wang Signed-off-by: Antonio Fischetti Co-authored-by: Antonio Fischetti --- lib/dpif-netdev.c | 654 -- tests/ofproto-dpif.at | 3 +- 2 files changed, 633 insertions(+), 24 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index a14a2eb..d9a883b 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -79,11 +79,23 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev); +/* Length of Subtable table for cuckoo distributor to index subtables. + * The size of the table is at most 256 entires because the CD's entry only + * provides 1 byte for indexing. + */ +#define SUBTABLE_TABLE_LENGTH 256 + #define FLOW_DUMP_MAX_BATCH 50 /* Use per thread recirc_depth to prevent recirculation loop. */ #define MAX_RECIRC_DEPTH 5 DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0) + +#define CD_DEBUG 0 +#define debug_print(...) \ +do { if (CD_DEBUG) fprintf(stderr, __VA_ARGS__); } while (0) + + /* Configuration parameters. */ enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */ enum { MAX_METERS = 65536 };/* Maximum number of meters. */ @@ -163,6 +175,44 @@ struct emc_cache { int sweep_idx;/* For emc_cache_slow_sweep(). */ }; + +/* Cuckoo distributor (CD) is a 2-hash function hash table. + * For now, the design does not allow desplacing items when bucket is full, + * which is different from the behavior of a cuckoo hash table. + * The advantage is that we do not need to store two sigantures so that + * the struct will be more compact. We use 16 entries per bucket for the + * usage of AVX. + * + * Each classifier has its own cuckoo distributor. It is NOT thread-safe + */ +#define CD_NUM_BUCKETS (1<<16) +#define CD_BUCKET_MASK (CD_NUM_BUCKETS-1) +#define CD_ENTRIES 16 + +/* These two seeds are used for hashing two bucket locations */ +#define CD_PRIM_BUCKET_SEED 10 +#define CD_SEC_BUCKET_SEED 20 + +/* This bit is used to choose which bucket to replace CD's entry in cd_insert*/ +#define CD_CHOOSE_SEC_BUCKT_BIT (1 << CD_ENTRIES) + +typedef uint16_t simple_sig_store_t; + + +/* The bucket struct for