Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-06-23 Thread Fischetti, Antonio
Hi All,
thanks for your feedback. We published a patchset v1 at

http://patchwork.ozlabs.org/patch/775505/

please feel free to review.

Thanks,
Antonio

> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, May 3, 2017 12:04 AM
> To: Darrell Ball <db...@vmware.com>; d...@openvswitch.org; ja...@ovn.org;
> jan.scheur...@ericsson.com
> Cc: Tai, Charlie <charlie@intel.com>; Wang, Ren <ren.w...@intel.com>;
> Gobriel, Sameh <sameh.gobr...@intel.com>; Fischetti, Antonio
> <antonio.fische...@intel.com>
> Subject: RE: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to
> Accelerate Megaflow Search
> 
> Thank you Darrell for the comment, we collect some data with the scalar
> version,  please see my reply inlined.  Our newest results show good
> speedup for both scalar and AVX version.
> 
> We are still waiting for more feedback before implementing version 2.
> Please feel free to comment on the patch.
> 
> Thank you.
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Wednesday, April 26, 2017 10:04 PM
> > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> > Cc: Tai, Charlie <charlie....@intel.com>; Wang, Ren
> <ren.w...@intel.com>;
> > Gobriel, Sameh <sameh.gobr...@intel.com>
> > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor
> to
> > Accelerate Megaflow Search
> >
> >
> >
> > On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote:
> >
> > Thank you Darrell for the comments. Please take a look at my reply
> inlined.
> >
> >
> >
> > > -----Original Message-----
> >
> > > From: Darrell Ball [mailto:db...@vmware.com]
> >
> > > Sent: Thursday, April 13, 2017 10:36 PM
> >
> > > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> >
> > > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo
> Distributor
> > to
> >
> > > Accelerate Megaflow Search
> >
> > >
> >
> > >
> >
> > >
> >
> > > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of
> >
> > > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on
> > behalf of
> >
> > > yipeng1.w...@intel.com> wrote:
> >
> > >
> >
> > > From: Yipeng Wang <yipeng1.w...@intel.com>
> >
> > >
> >
> > > The Datapath Classifier uses tuple space search for flow
> classification.
> >
> > > The rules are arranged into a set of tuples/subtables (each
> with a
> >
> > > distinct mask).  Each subtable is implemented as a hash table
> and
> > lookup
> >
> > > is done with flow keys formed by selecting the bits from the
> packet
> > header
> >
> > > based on each subtable's mask. Tuple space search will
> sequentially
> > search
> >
> > > each subtable until a match is found. With a large number of
> subtables,
> > a
> >
> > > sequential search of the subtables could consume a lot of CPU
> cycles.
> > In
> >
> > > a testbench with a uniform traffic pattern equally distributed
> across 20
> >
> > > subtables, we measured that up to 65% of total execution time
> is
> > attributed
> >
> > > to the megaflow cache lookup.
> >
> > >
> >
> > > This patch presents the idea of the two-layer hierarchical
> lookup,
> > where a
> >
> > > low overhead first level of indirection is accessed first, we
> call this
> >
> > > level cuckoo distributor (CD). If a flow key has been inserted
> in the flow
> >
> > > table the first level will indicate with high probability that
> which
> >
> > > subtable to look into. A lookup is performed on the second
> level (the
> >
> > > target subtable) to retrieve the result. If the key doesn’t
> have a match,
> >
> > > then we revert back to the sequential search of subtables.
> >
> > >
> >
> > > This patch can improve the already existing Subtable Ranking
> when
> > traffic
> >
> > > data has high entropy. Subtable Ranking helps minimize the
> number of
> >

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-05-02 Thread Wang, Yipeng1
Thank you Darrell for the comment, we collect some data with the scalar 
version,  please see my reply inlined.  Our newest results show good speedup 
for both scalar and AVX version.

We are still waiting for more feedback before implementing version 2.  Please 
feel free to comment on the patch. 

Thank you.

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Wednesday, April 26, 2017 10:04 PM
> To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> Cc: Tai, Charlie <charlie@intel.com>; Wang, Ren <ren.w...@intel.com>;
> Gobriel, Sameh <sameh.gobr...@intel.com>
> Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to
> Accelerate Megaflow Search
> 
> 
> 
> On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote:
> 
> Thank you Darrell for the comments. Please take a look at my reply 
> inlined.
> 
> 
> 
> > -Original Message-
> 
> > From: Darrell Ball [mailto:db...@vmware.com]
> 
> > Sent: Thursday, April 13, 2017 10:36 PM
> 
> > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> 
> > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor
> to
> 
> > Accelerate Megaflow Search
> 
> >
> 
> >
> 
> >
> 
> > On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of
> 
> > yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on
> behalf of
> 
> > yipeng1.w...@intel.com> wrote:
> 
> >
> 
> > From: Yipeng Wang <yipeng1.w...@intel.com>
> 
> >
> 
> > The Datapath Classifier uses tuple space search for flow 
> classification.
> 
> > The rules are arranged into a set of tuples/subtables (each with a
> 
> > distinct mask).  Each subtable is implemented as a hash table and
> lookup
> 
> > is done with flow keys formed by selecting the bits from the packet
> header
> 
> > based on each subtable's mask. Tuple space search will sequentially
> search
> 
> > each subtable until a match is found. With a large number of 
> subtables,
> a
> 
> > sequential search of the subtables could consume a lot of CPU 
> cycles.
> In
> 
> > a testbench with a uniform traffic pattern equally distributed 
> across 20
> 
> > subtables, we measured that up to 65% of total execution time is
> attributed
> 
> > to the megaflow cache lookup.
> 
> >
> 
> > This patch presents the idea of the two-layer hierarchical lookup,
> where a
> 
> > low overhead first level of indirection is accessed first, we call 
> this
> 
> > level cuckoo distributor (CD). If a flow key has been inserted in 
> the flow
> 
> > table the first level will indicate with high probability that which
> 
> > subtable to look into. A lookup is performed on the second level 
> (the
> 
> > target subtable) to retrieve the result. If the key doesn’t have a 
> match,
> 
> > then we revert back to the sequential search of subtables.
> 
> >
> 
> > This patch can improve the already existing Subtable Ranking when
> traffic
> 
> > data has high entropy. Subtable Ranking helps minimize the number of
> 
> > traversed subtables when most of the traffic hit the same subtable.
> 
> > However, in the case of high entropy traffic such as traffic coming 
> from
> 
> > a physical port, multiple subtables could be hit with a similar 
> frequency.
> 
> > In this case the average subtable lookups per hit would be much
> greater
> 
> > than 1. In addition, CD can adaptively turn off when it finds the 
> traffic
> 
> > mostly hit one subtable. Thus, CD will not be an overhead when
> Subtable
> 
> > Ranking works well.
> 
> >
> 
> > Scheme:
> 
> >
> 
> >  ---
> 
> > |  CD   |
> 
> >  ---
> 
> >\
> 
> > \
> 
> >  -  - -
> 
> > |sub  ||sub  |...|sub  |
> 
> > |table||table|   |table|
> 
> >  -  - -
> 
> >
> 
> > Evaluation:
> 
> >
> 
> > We create set of rules with various src IP. We feed traffic 
> containi

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-04-26 Thread Darrell Ball


On 4/14/17, 6:10 PM, "Wang, Yipeng1" <yipeng1.w...@intel.com> wrote:

Thank you Darrell for the comments. Please take a look at my reply inlined.



> -Original Message-

> From: Darrell Ball [mailto:db...@vmware.com]

> Sent: Thursday, April 13, 2017 10:36 PM

> To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
    
    > Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to

> Accelerate Megaflow Search

> 

> 

> 

> On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of

> yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of

> yipeng1.w...@intel.com> wrote:

> 

> From: Yipeng Wang <yipeng1.w...@intel.com>

> 

> The Datapath Classifier uses tuple space search for flow 
classification.

> The rules are arranged into a set of tuples/subtables (each with a

> distinct mask).  Each subtable is implemented as a hash table and 
lookup

> is done with flow keys formed by selecting the bits from the packet 
header

> based on each subtable's mask. Tuple space search will sequentially 
search

> each subtable until a match is found. With a large number of 
subtables, a

> sequential search of the subtables could consume a lot of CPU cycles. 
In

> a testbench with a uniform traffic pattern equally distributed across 
20

> subtables, we measured that up to 65% of total execution time is 
attributed

> to the megaflow cache lookup.

> 

> This patch presents the idea of the two-layer hierarchical lookup, 
where a

> low overhead first level of indirection is accessed first, we call 
this

> level cuckoo distributor (CD). If a flow key has been inserted in the 
flow

> table the first level will indicate with high probability that which

> subtable to look into. A lookup is performed on the second level (the

> target subtable) to retrieve the result. If the key doesn’t have a 
match,

> then we revert back to the sequential search of subtables.

> 

> This patch can improve the already existing Subtable Ranking when 
traffic

> data has high entropy. Subtable Ranking helps minimize the number of

> traversed subtables when most of the traffic hit the same subtable.

> However, in the case of high entropy traffic such as traffic coming 
from

> a physical port, multiple subtables could be hit with a similar 
frequency.

> In this case the average subtable lookups per hit would be much 
greater

> than 1. In addition, CD can adaptively turn off when it finds the 
traffic

> mostly hit one subtable. Thus, CD will not be an overhead when 
Subtable

> Ranking works well.

> 

> Scheme:

> 

>  ---

> |  CD   |

>  ---

>\

> \

>  -  - -

> |sub  ||sub  |...|sub  |

> |table||table|   |table|

>  -  - -

> 

> Evaluation:

> 

> We create set of rules with various src IP. We feed traffic 
containing 1

> million flows with various src IP and dst IP. All the flows hit 
10/20/30

> rules creating 10/20/30 subtables.

> 

> The table below shows the preliminary continuous testing results 
(full line

> speed test) we collected with a uni-directional port-to-port setup. 
The

> machine we tested on is a Xeon E5 server running with 2.2GHz cores. 
OvS

> runs with 1 PMD. We use Spirent as the hardware traffic generator.

> 

> no.subtable: 10  20  30

> cd-ovs   3895961 3170530 2968555

> orig-ovs 2683455 1646227 1240501

> speedup  1.45x   1.92x   2.39x

> 

> 

> I have a few initial comments.

> 1) Can you present the numbers with and without __AVX2__ “enabled”.'

[Wang, Yipeng] We mainly test with AVX2 to find the upper-bound performance 
speedup of the design.  Throughput-wise, we have not optimized for the scalar 
version thus we did not present the results. If people are interested in this 
patch, we will update the implementation to consider the performance for bo

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-04-14 Thread Wang, Yipeng1
Thank you Darrell for the comments. Please take a look at my reply inlined.

> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Thursday, April 13, 2017 10:36 PM
> To: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to
> Accelerate Megaflow Search
> 
> 
> 
> On 4/6/17, 2:48 PM, "ovs-dev-boun...@openvswitch.org on behalf of
> yipeng1.w...@intel.com" <ovs-dev-boun...@openvswitch.org on behalf of
> yipeng1.w...@intel.com> wrote:
> 
> From: Yipeng Wang <yipeng1.w...@intel.com>
> 
> The Datapath Classifier uses tuple space search for flow classification.
> The rules are arranged into a set of tuples/subtables (each with a
> distinct mask).  Each subtable is implemented as a hash table and lookup
> is done with flow keys formed by selecting the bits from the packet header
> based on each subtable's mask. Tuple space search will sequentially search
> each subtable until a match is found. With a large number of subtables, a
> sequential search of the subtables could consume a lot of CPU cycles. In
> a testbench with a uniform traffic pattern equally distributed across 20
> subtables, we measured that up to 65% of total execution time is 
> attributed
> to the megaflow cache lookup.
> 
> This patch presents the idea of the two-layer hierarchical lookup, where a
> low overhead first level of indirection is accessed first, we call this
> level cuckoo distributor (CD). If a flow key has been inserted in the flow
> table the first level will indicate with high probability that which
> subtable to look into. A lookup is performed on the second level (the
> target subtable) to retrieve the result. If the key doesn’t have a match,
> then we revert back to the sequential search of subtables.
> 
> This patch can improve the already existing Subtable Ranking when traffic
> data has high entropy. Subtable Ranking helps minimize the number of
> traversed subtables when most of the traffic hit the same subtable.
> However, in the case of high entropy traffic such as traffic coming from
> a physical port, multiple subtables could be hit with a similar frequency.
> In this case the average subtable lookups per hit would be much greater
> than 1. In addition, CD can adaptively turn off when it finds the traffic
> mostly hit one subtable. Thus, CD will not be an overhead when Subtable
> Ranking works well.
> 
> Scheme:
> 
>  ---
> |  CD   |
>  ---
>\
> \
>  -  - -
> |sub  ||sub  |...|sub  |
> |table||table|   |table|
>  -  - -
> 
> Evaluation:
> 
> We create set of rules with various src IP. We feed traffic containing 1
> million flows with various src IP and dst IP. All the flows hit 10/20/30
> rules creating 10/20/30 subtables.
> 
> The table below shows the preliminary continuous testing results (full 
> line
> speed test) we collected with a uni-directional port-to-port setup. The
> machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS
> runs with 1 PMD. We use Spirent as the hardware traffic generator.
> 
> no.subtable: 10  20  30
> cd-ovs   3895961 3170530 2968555
> orig-ovs 2683455 1646227 1240501
> speedup  1.45x   1.92x   2.39x
> 
> 
> I have a few initial comments.
> 1) Can you present the numbers with and without __AVX2__ “enabled”.'
[Wang, Yipeng] We mainly test with AVX2 to find the upper-bound performance 
speedup of the design.  Throughput-wise, we have not optimized for the scalar 
version thus we did not present the results. If people are interested in this 
patch, we will update the implementation to consider the performance for both 
AVX and scalar in Version 2 and report the results. We may design different 
structure (mainly different entry count per bucket) for scalar and AVX to 
optimize the performance.

> 2) Can you present the numbers with say 2 and say 10 flows for some
> comparison.
[Wang, Yipeng] As long as flows cannot all fit in EMC, CD should benefit.  
Generally, CD benefit more when there are more flows fall out of EMC. We 
collect the new results and report them as following:

2 flows:
  no.subtable: 10  20  30
cd-ovs   4267332 3478251 3126763
orig-ovs 3260883 2174551 1689981
speedup  1.31x   1.60x1.85x

10 flows:
  no.subtable: 10  20  30
cd-ovs   40157833276100   297064

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-04-11 Thread Wang, Yipeng1
Thank you Jarno for pointing it out.

I think we have been using default master head for comparison with conditional 
EMC insert for both orig and cd-ovs. To double-check I explicitly set 
emc_insert_inv_prob=100 and test again. The results are similar to the previous 
one.

CD should benefit throughput as long as there are EMC misses. Since our test 
case includes 1M flows, most of flows will miss in EMC no matter what the 
insertion rate is. 

Thanks
Yipeng

-Original Message-
From: Jarno Rajahalme [mailto:ja...@ovn.org] 
Sent: Tuesday, April 11, 2017 2:45 PM
To: Fischetti, Antonio <antonio.fische...@intel.com>
Cc: Wang, Yipeng1 <yipeng1.w...@intel.com>; d...@openvswitch.org; 
jan.scheur...@ericsson.com
Subject: Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to 
Accelerate Megaflow Search

Thanks for your contribution!

I haven’t looked at the patch yet, but based on the description I’d ask you to 
characterize the difference and/or feature interaction with the Exact Match 
Cache (EMC). Particularly, does “orig-ovs” and/or “cd-ovs” have the conditional 
EMC feature (merged to master on Feb 16th, 2017), and what values of 
“emc-insert-inv-prob” were used for comparison?

  Jarno

> On Apr 11, 2017, at 12:21 AM, Fischetti, Antonio 
> <antonio.fische...@intel.com> wrote:
> 
> Any comment on this patch?
> As this is a sort of an addition to Subtable Ranking, maybe Jarno 
> and/or Jan could have some comments?
> 
> Thanks,
> Antonio
> 
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- 
>> boun...@openvswitch.org] On Behalf Of yipeng1.w...@intel.com
>> Sent: Thursday, April 6, 2017 10:48 PM
>> To: d...@openvswitch.org
>> Subject: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to 
>> Accelerate Megaflow Search
>> 
>> From: Yipeng Wang <yipeng1.w...@intel.com>
>> 
>> The Datapath Classifier uses tuple space search for flow classification.
>> The rules are arranged into a set of tuples/subtables (each with a 
>> distinct mask).  Each subtable is implemented as a hash table and 
>> lookup is done with flow keys formed by selecting the bits from the 
>> packet header based on each subtable's mask. Tuple space search will 
>> sequentially search each subtable until a match is found. With a 
>> large number of subtables, a sequential search of the subtables could 
>> consume a lot of CPU cycles. In a testbench with a uniform traffic 
>> pattern equally distributed across 20 subtables, we measured that up 
>> to 65% of total execution time is attributed to the megaflow cache 
>> lookup.
>> 
>> This patch presents the idea of the two-layer hierarchical lookup, 
>> where a low overhead first level of indirection is accessed first, we 
>> call this level cuckoo distributor (CD). If a flow key has been 
>> inserted in the flow table the first level will indicate with high 
>> probability that which subtable to look into. A lookup is performed 
>> on the second level (the target subtable) to retrieve the result. If 
>> the key doesn’t have a match, then we revert back to the sequential search 
>> of subtables.
>> 
>> This patch can improve the already existing Subtable Ranking when 
>> traffic data has high entropy. Subtable Ranking helps minimize the 
>> number of traversed subtables when most of the traffic hit the same subtable.
>> However, in the case of high entropy traffic such as traffic coming 
>> from a physical port, multiple subtables could be hit with a similar 
>> frequency.
>> In this case the average subtable lookups per hit would be much 
>> greater than 1. In addition, CD can adaptively turn off when it finds 
>> the traffic mostly hit one subtable. Thus, CD will not be an overhead 
>> when Subtable Ranking works well.
>> 
>> Scheme:
>> 
>> ---
>>|  CD   |
>> ---
>>   \
>>\
>> -  - -
>> |sub  ||sub  |...|sub  |
>> |table||table|   |table|
>> -  - -
>> 
>> Evaluation:
>> 
>> We create set of rules with various src IP. We feed traffic 
>> containing 1 million flows with various src IP and dst IP. All the 
>> flows hit 10/20/30 rules creating 10/20/30 subtables.
>> 
>> The table below shows the preliminary continuous testing results 
>> (full line speed test) we collected with a uni-directional 
>> port-to-port setup. The machine we tested on is a Xeon E5 server 
>> running with 2.2GHz cores. OvS runs with 1 PMD. We use Spirent as the 
>> hardware traffic generator.
>> 
>> no.subtable: 10  20  30
>> c

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-04-11 Thread Jarno Rajahalme
Thanks for your contribution!

I haven’t looked at the patch yet, but based on the description I’d ask you to 
characterize the difference and/or feature interaction with the Exact Match 
Cache (EMC). Particularly, does “orig-ovs” and/or “cd-ovs” have the conditional 
EMC feature (merged to master on Feb 16th, 2017), and what values of 
“emc-insert-inv-prob” were used for comparison?

  Jarno

> On Apr 11, 2017, at 12:21 AM, Fischetti, Antonio 
> <antonio.fische...@intel.com> wrote:
> 
> Any comment on this patch?
> As this is a sort of an addition to Subtable Ranking, maybe
> Jarno and/or Jan could have some comments?
> 
> Thanks,
> Antonio
> 
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>> boun...@openvswitch.org] On Behalf Of yipeng1.w...@intel.com
>> Sent: Thursday, April 6, 2017 10:48 PM
>> To: d...@openvswitch.org
>> Subject: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to
>> Accelerate Megaflow Search
>> 
>> From: Yipeng Wang <yipeng1.w...@intel.com>
>> 
>> The Datapath Classifier uses tuple space search for flow classification.
>> The rules are arranged into a set of tuples/subtables (each with a
>> distinct mask).  Each subtable is implemented as a hash table and lookup
>> is done with flow keys formed by selecting the bits from the packet header
>> based on each subtable's mask. Tuple space search will sequentially search
>> each subtable until a match is found. With a large number of subtables, a
>> sequential search of the subtables could consume a lot of CPU cycles. In
>> a testbench with a uniform traffic pattern equally distributed across 20
>> subtables, we measured that up to 65% of total execution time is
>> attributed
>> to the megaflow cache lookup.
>> 
>> This patch presents the idea of the two-layer hierarchical lookup, where a
>> low overhead first level of indirection is accessed first, we call this
>> level cuckoo distributor (CD). If a flow key has been inserted in the flow
>> table the first level will indicate with high probability that which
>> subtable to look into. A lookup is performed on the second level (the
>> target subtable) to retrieve the result. If the key doesn’t have a match,
>> then we revert back to the sequential search of subtables.
>> 
>> This patch can improve the already existing Subtable Ranking when traffic
>> data has high entropy. Subtable Ranking helps minimize the number of
>> traversed subtables when most of the traffic hit the same subtable.
>> However, in the case of high entropy traffic such as traffic coming from
>> a physical port, multiple subtables could be hit with a similar frequency.
>> In this case the average subtable lookups per hit would be much greater
>> than 1. In addition, CD can adaptively turn off when it finds the traffic
>> mostly hit one subtable. Thus, CD will not be an overhead when Subtable
>> Ranking works well.
>> 
>> Scheme:
>> 
>> ---
>>|  CD   |
>> ---
>>   \
>>\
>> -  - -
>> |sub  ||sub  |...|sub  |
>> |table||table|   |table|
>> -  - -
>> 
>> Evaluation:
>> 
>> We create set of rules with various src IP. We feed traffic containing 1
>> million flows with various src IP and dst IP. All the flows hit 10/20/30
>> rules creating 10/20/30 subtables.
>> 
>> The table below shows the preliminary continuous testing results (full
>> line
>> speed test) we collected with a uni-directional port-to-port setup. The
>> machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS
>> runs with 1 PMD. We use Spirent as the hardware traffic generator.
>> 
>> no.subtable: 10  20  30
>> cd-ovs   3895961 3170530 2968555
>> orig-ovs 2683455 1646227 1240501
>> speedup  1.45x   1.92x   2.39x
>> 
>> Signed-off-by: Yipeng Wang 
>> Signed-off-by: Charlie Tai 
>> Co-authored-by: Charlie Tai 
>> Signed-off-by: Sameh Gobriel 
>> Co-authored-by: Sameh Gobriel 
>> Signed-off-by: Ren Wang 
>> Co-authored-by: Ren Wang 
>> Signed-off-by: Antonio Fischetti 
>> Co-authored-by: Antonio Fischetti 
>> ---
>> lib/dpif-netdev.c | 654
>> --
>> tests/ofproto-dpif.at |   3 +-
>> 2 files changed, 633 insertions(+), 24 deletions(-)
>> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index a14a2eb..d9a883b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netd

Re: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-04-11 Thread Fischetti, Antonio
Any comment on this patch?
As this is a sort of an addition to Subtable Ranking, maybe
Jarno and/or Jan could have some comments?

Thanks,
Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of yipeng1.w...@intel.com
> Sent: Thursday, April 6, 2017 10:48 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to
> Accelerate Megaflow Search
> 
> From: Yipeng Wang <yipeng1.w...@intel.com>
> 
> The Datapath Classifier uses tuple space search for flow classification.
> The rules are arranged into a set of tuples/subtables (each with a
> distinct mask).  Each subtable is implemented as a hash table and lookup
> is done with flow keys formed by selecting the bits from the packet header
> based on each subtable's mask. Tuple space search will sequentially search
> each subtable until a match is found. With a large number of subtables, a
> sequential search of the subtables could consume a lot of CPU cycles. In
> a testbench with a uniform traffic pattern equally distributed across 20
> subtables, we measured that up to 65% of total execution time is
> attributed
> to the megaflow cache lookup.
> 
> This patch presents the idea of the two-layer hierarchical lookup, where a
> low overhead first level of indirection is accessed first, we call this
> level cuckoo distributor (CD). If a flow key has been inserted in the flow
> table the first level will indicate with high probability that which
> subtable to look into. A lookup is performed on the second level (the
> target subtable) to retrieve the result. If the key doesn’t have a match,
> then we revert back to the sequential search of subtables.
> 
> This patch can improve the already existing Subtable Ranking when traffic
> data has high entropy. Subtable Ranking helps minimize the number of
> traversed subtables when most of the traffic hit the same subtable.
> However, in the case of high entropy traffic such as traffic coming from
> a physical port, multiple subtables could be hit with a similar frequency.
> In this case the average subtable lookups per hit would be much greater
> than 1. In addition, CD can adaptively turn off when it finds the traffic
> mostly hit one subtable. Thus, CD will not be an overhead when Subtable
> Ranking works well.
> 
> Scheme:
> 
>  ---
> |  CD   |
>  ---
>\
> \
>  -  - -
> |sub  ||sub  |...|sub  |
> |table||table|   |table|
>  -  - -
> 
> Evaluation:
> 
> We create set of rules with various src IP. We feed traffic containing 1
> million flows with various src IP and dst IP. All the flows hit 10/20/30
> rules creating 10/20/30 subtables.
> 
> The table below shows the preliminary continuous testing results (full
> line
> speed test) we collected with a uni-directional port-to-port setup. The
> machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS
> runs with 1 PMD. We use Spirent as the hardware traffic generator.
> 
> no.subtable: 10  20  30
> cd-ovs   3895961 3170530 2968555
> orig-ovs 2683455 1646227 1240501
> speedup  1.45x   1.92x   2.39x
> 
> Signed-off-by: Yipeng Wang 
> Signed-off-by: Charlie Tai 
> Co-authored-by: Charlie Tai 
> Signed-off-by: Sameh Gobriel 
> Co-authored-by: Sameh Gobriel 
> Signed-off-by: Ren Wang 
> Co-authored-by: Ren Wang 
> Signed-off-by: Antonio Fischetti 
> Co-authored-by: Antonio Fischetti 
> ---
>  lib/dpif-netdev.c | 654
> --
>  tests/ofproto-dpif.at |   3 +-
>  2 files changed, 633 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index a14a2eb..d9a883b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -79,11 +79,23 @@
> 
>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> 
> +/* Length of Subtable table for cuckoo distributor to index subtables.
> + * The size of the table is at most 256 entires because the CD's entry
> only
> + * provides 1 byte for indexing.
> + */
> +#define SUBTABLE_TABLE_LENGTH 256
> +
>  #define FLOW_DUMP_MAX_BATCH 50
>  /* Use per thread recirc_depth to prevent recirculation loop. */
>  #define MAX_RECIRC_DEPTH 5
>  DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
> 
> +
> +#define CD_DEBUG 0
> +#define debug_print(...) \
> +do { if (CD_DEBUG) fprintf(stderr, __VA_ARGS__); } while (0)
> +
> +
>  /* Configuration parameters. */
>  enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table.
> */
>  enum { MAX_METERS = 65536 };/* Maxim

[ovs-dev] [PATCH RFC] dpif-netdev: Add Cuckoo Distributor to Accelerate Megaflow Search

2017-04-06 Thread yipeng1 . wang
From: Yipeng Wang 

The Datapath Classifier uses tuple space search for flow classification.
The rules are arranged into a set of tuples/subtables (each with a
distinct mask).  Each subtable is implemented as a hash table and lookup
is done with flow keys formed by selecting the bits from the packet header
based on each subtable's mask. Tuple space search will sequentially search
each subtable until a match is found. With a large number of subtables, a
sequential search of the subtables could consume a lot of CPU cycles. In
a testbench with a uniform traffic pattern equally distributed across 20
subtables, we measured that up to 65% of total execution time is attributed
to the megaflow cache lookup.

This patch presents the idea of the two-layer hierarchical lookup, where a
low overhead first level of indirection is accessed first, we call this
level cuckoo distributor (CD). If a flow key has been inserted in the flow
table the first level will indicate with high probability that which
subtable to look into. A lookup is performed on the second level (the
target subtable) to retrieve the result. If the key doesn’t have a match,
then we revert back to the sequential search of subtables.

This patch can improve the already existing Subtable Ranking when traffic
data has high entropy. Subtable Ranking helps minimize the number of
traversed subtables when most of the traffic hit the same subtable.
However, in the case of high entropy traffic such as traffic coming from
a physical port, multiple subtables could be hit with a similar frequency.
In this case the average subtable lookups per hit would be much greater
than 1. In addition, CD can adaptively turn off when it finds the traffic
mostly hit one subtable. Thus, CD will not be an overhead when Subtable
Ranking works well.

Scheme:

 ---
|  CD   |
 ---
   \
\
 -  - -
|sub  ||sub  |...|sub  |
|table||table|   |table|
 -  - -

Evaluation:

We create set of rules with various src IP. We feed traffic containing 1
million flows with various src IP and dst IP. All the flows hit 10/20/30
rules creating 10/20/30 subtables.

The table below shows the preliminary continuous testing results (full line
speed test) we collected with a uni-directional port-to-port setup. The
machine we tested on is a Xeon E5 server running with 2.2GHz cores. OvS
runs with 1 PMD. We use Spirent as the hardware traffic generator.

no.subtable: 10  20  30
cd-ovs   3895961 3170530 2968555
orig-ovs 2683455 1646227 1240501
speedup  1.45x   1.92x   2.39x

Signed-off-by: Yipeng Wang 
Signed-off-by: Charlie Tai 
Co-authored-by: Charlie Tai 
Signed-off-by: Sameh Gobriel 
Co-authored-by: Sameh Gobriel 
Signed-off-by: Ren Wang 
Co-authored-by: Ren Wang 
Signed-off-by: Antonio Fischetti 
Co-authored-by: Antonio Fischetti 
---
 lib/dpif-netdev.c | 654 --
 tests/ofproto-dpif.at |   3 +-
 2 files changed, 633 insertions(+), 24 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a14a2eb..d9a883b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -79,11 +79,23 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Length of Subtable table for cuckoo distributor to index subtables.
+ * The size of the table is at most 256 entires because the CD's entry only
+ * provides 1 byte for indexing.
+ */
+#define SUBTABLE_TABLE_LENGTH 256
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
 #define MAX_RECIRC_DEPTH 5
 DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
 
+
+#define CD_DEBUG 0
+#define debug_print(...) \
+do { if (CD_DEBUG) fprintf(stderr, __VA_ARGS__); } while (0)
+
+
 /* Configuration parameters. */
 enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
 enum { MAX_METERS = 65536 };/* Maximum number of meters. */
@@ -163,6 +175,44 @@ struct emc_cache {
 int sweep_idx;/* For emc_cache_slow_sweep(). */
 };
 
+
+/* Cuckoo distributor (CD) is a 2-hash function hash table.
+ * For now, the design does not allow desplacing items when bucket is full,
+ * which is different from the behavior of a cuckoo hash table.
+ * The advantage is that we do not need to store two sigantures so that
+ * the struct will be more compact. We use 16 entries per bucket for the
+ * usage of AVX.
+ *
+ * Each classifier has its own cuckoo distributor. It is NOT thread-safe
+ */
+#define CD_NUM_BUCKETS (1<<16)
+#define CD_BUCKET_MASK (CD_NUM_BUCKETS-1)
+#define CD_ENTRIES 16
+
+/* These two seeds are used for hashing two bucket locations */
+#define CD_PRIM_BUCKET_SEED 10
+#define CD_SEC_BUCKET_SEED 20
+
+/* This bit is used to choose which bucket to replace CD's entry in cd_insert*/
+#define CD_CHOOSE_SEC_BUCKT_BIT (1 << CD_ENTRIES)
+
+typedef uint16_t simple_sig_store_t;
+
+
+/* The bucket struct for