That's what I mean.

It could go in the 'tests' directory.

On Tue, Nov 07, 2017 at 05:01:04PM +0000, Wang, Yipeng1 wrote:
> Thanks Ben,
> 
> Do you mean to include TRex script into the repo? Could you suggest more 
> details like where would be a suitable place to put such kind of test scripts?
> 
> Thanks
> 
> > -----Original Message-----
> > From: Ben Pfaff [mailto:[email protected]]
> > Sent: Friday, November 3, 2017 10:59 AM
> > To: Wang, Yipeng1 <[email protected]>
> > Cc: [email protected]
> > Subject: Re: [ovs-dev] [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor
> > implementation
> > 
> > Is this something that should be included in the repo?
> > 
> > On Fri, Nov 03, 2017 at 04:14:56PM +0000, Wang, Yipeng1 wrote:
> > > To make it easier for the code reviewers to build and test the patchset, a
> > TREX profile that presents a very simple synthetic test case of random 
> > traffic
> > with 20 different IP src and 50K different IP dst is attached. It can be 
> > used
> > together with the rule set we mentioned in cover letter to generate uniform
> > distribution of hits among the 20 subtables. This synthetic traffic pattern
> > represents the worst-case scenario for the current subtable ranking method.
> > We observe about 2x speedup vs. the original OvS in this case. Note that the
> > patchset automatically detects if there is benefit to turn CD on or off to
> > accommodate any traffic pattern, so when the subtable ranking works
> > perfectly, CD will not be enabled and will not harm the performance.
> > >
> > > One can change the dstip and srcip_cnt variables to generate different
> > number of flows and subtable count scenarios.
> > >
> > >  ----
> > > import locale, sys, time
> > > from signal import *
> > >
> > > import stl_path
> > > from trex_stl_lib.api import *
> > >
> > > [tx_port, rx_port] = my_ports = [0, 1]
> > > tx_ports = [tx_port]
> > > rx_ports = [rx_port]
> > >
> > > global c
> > >
> > > # dst IP vary from 0.0.0.0 to 0.0.195.255 is about 50k flows.
> > > # src IP vary from 1.0.0.0 to 20.0.0.0 is 20 flows.
> > > # 50k * 20 is about 1M total flows
> > > dstip = "0.0.195.255"
> > > srcip_cnt = 20
> > > size = 64
> > >
> > > #create stream blocks. Each stream has one srcIP with various dstIP.
> > > #There are in total of 20 different srcIP.
> > > def make_streams():
> > >     streams = [
> > >         {"base_pkt":Ether()/IP(src="{}.0.0.0".format(i), tos=0x20)/UDP(),
> > >          "vm":[
> > >
> > STLVmFlowVar(name="ip_dst",min_value="0.0.0.0",max_value=dstip,size=4
> > ,op="random"),
> > >             STLVmWrFlowVar(fv_name="ip_dst",pkt_offset="IP.dst"),
> > >             ]
> > >         }
> > >         for i in range(1,srcip_cnt + 1)
> > >     ]
> > >     return streams
> > >
> > > if __name__ == "__main__":
> > >
> > >         c = STLClient(verbose_level = LoggerApi.VERBOSE_QUIET)
> > >         c.connect()
> > >
> > >         c.reset(ports = my_ports)
> > >         new_streams = make_streams()
> > >
> > >         for s in new_streams:
> > >             # 64 - 4 for FCS
> > >             pad = max(0, size - 4 - len(s["base_pkt"])) * 'x'
> > >             s["base_pkt"] = s["base_pkt"]/pad
> > >
> > >         pkts = [STLPktBuilder(pkt = s["base_pkt"], vm = s["vm"]) for s in
> > new_streams]
> > >
> > >         #generate contiguous traffic. Each stream has equal bandwidth.
> > >         final_streams = [STLStream(packet = pkt, mode =
> > STLTXCont(percentage = 100.0/len(pkts))) for pkt in pkts]
> > >         c.add_streams(final_streams, ports=[tx_port])
> > >         c.set_port_attr(my_ports, promiscuous = True)
> > >
> > >         #start the traffic
> > >         c.start(ports = tx_ports)
> > >         #wait for 20 seconds
> > >         time.sleep(20)
> > >         #write rx pps to stdio
> > >         sys.stdout.write(str("RX PPS:
> > ")+str(int(c.get_stats(my_ports)[1]["rx_pps"])) + str("\n"))
> > >         #stop the traffic
> > >         c.stop(ports=my_ports)
> > >         c.disconnect()
> > >         c = None
> > >  ----
> > >
> > >
> > > > -----Original Message-----
> > > > From: Wang, Yipeng1
> > > > Sent: Tuesday, October 31, 2017 4:40 PM
> > > > To: [email protected]
> > > > Cc: Wang, Yipeng1 <[email protected]>; Gobriel, Sameh
> > > > <[email protected]>; Fischetti, Antonio
> > > > <[email protected]>; [email protected];
> > > > [email protected]
> > > > Subject: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor implementation
> > > >
> > > > The Datapath Classifier uses tuple space search for flow classification.
> > > > The rules are arranged into a set of tuples/subtables (each with a
> > > > distinct mask).  Each subtable is implemented as a hash table and lookup
> > > > is done with flow keys formed by selecting the bits from the packet
> > header
> > > > based on each subtable's mask. Tuple space search will sequentially
> > search
> > > > each subtable until a match is found. With a large number of subtables, 
> > > > a
> > > > sequential search of the subtables could consume a lot of CPU cycles. In
> > > > a testbench with a uniform traffic pattern equally distributed across 20
> > > > subtables, we measured that up to 65% of total execution time is
> > attributed
> > > > to the megaflow cache lookup.
> > > >
> > > > This patch presents the idea of the two-layer hierarchical lookup, 
> > > > where a
> > > > low overhead first level of indirection is accessed first, we call this
> > > > level cuckoo distributor (CD). If a flow key has been inserted in the 
> > > > flow
> > > > table the first level will indicate with high probability that which
> > > > subtable to look into. A lookup is performed on the second level (the
> > > > target subtable) to retrieve the result. If the key doesn’t have a 
> > > > match,
> > > > then we revert back to the sequential search of subtables. The patch is
> > > > partially inspired by earlier concepts proposed in "simTable"[1] and
> > > > "Cuckoo Filter"[2], and DPDK's Cuckoo Hash implementation.
> > > >
> > > > This patch can improve the already existing Subtable Ranking when 
> > > > traffic
> > > > data has high entropy. Subtable Ranking helps minimize the number of
> > > > traversed subtables when most of the traffic hit the same subtable.
> > > > However, in the case of high entropy traffic such as traffic coming from
> > > > a physical port, multiple subtables could be hit with a similar 
> > > > frequency.
> > > > In this case the average subtable lookups per hit would be much greater
> > > > than 1. In addition, CD can adaptively turn off when it finds the 
> > > > traffic
> > > > mostly hit one subtable. Thus, CD will not be an overhead when Subtable
> > > > Ranking works well.
> > > >
> > > > Scheme:
> > > > CD is in front of the subtables. Packets are directed to corresponding
> > > > subtable
> > > > if hit in CD instead of searching each subtable sequentially.
> > > >  -------
> > > > |  CD   |
> > > >  -------
> > > >        \
> > > >         \
> > > >  -----  -----     -----
> > > > |sub  ||sub  |...|sub  |
> > > > |table||table|   |table|
> > > >  -----  -----     -----
> > > >
> > > >  Evaluation:
> > > >  ----------
> > > > We create a set of rules with various src IP. We feed traffic containing
> > various
> > > > numbers of flows with various src IP and dst IP. All the flows hit 
> > > > 10/20/30
> > > > rules creating 10/20/30 subtables. We will explain the rule/traffic 
> > > > setup
> > > > in detail later.
> > > >
> > > > The table below shows the preliminary continuous testing results (full 
> > > > line
> > > > speed test) we collected with a uni-directional phy-to-phy setup. OvS
> > > > runs with 1 PMD. We use Spirent as the hardware traffic generator.
> > > >
> > > >  Before v2 rebase:
> > > >  ----
> > > > AVX2 data:
> > > > 20k flows:
> > > > no.subtable: 10          20          30
> > > > cd-ovs       4267332     3478251     3126763
> > > > orig-ovs     3260883     2174551     1689981
> > > > speedup      1.31x       1.60x       1.85x
> > > >
> > > > 100k flows:
> > > > no.subtable: 10          20          30
> > > > cd-ovs       4015783     3276100     2970645
> > > > orig-ovs     2692882     1711955     1302321
> > > > speedup      1.49x       1.91x       2.28x
> > > >
> > > > 1M flows:
> > > > no.subtable: 10          20          30
> > > > cd-ovs       3895961     3170530     2968555
> > > > orig-ovs     2683455     1646227     1240501
> > > > speedup      1.45x       1.92x       2.39x
> > > >
> > > > Scalar data:
> > > > 1M flows:
> > > > no.subtable: 10          20          30
> > > > cd-ovs       3658328     3028111     2863329
> > > > orig_ovs     2683455     1646227     1240501
> > > > speedup      1.36x       1.84x       2.31x
> > > >
> > > >  After v2 rebase:
> > > >  ----
> > > > After rebase for v1, we tested 1M flows, 20 table cases, the results 
> > > > still
> > hold.
> > > > 1M flows:
> > > > no.subtable:   20
> > > > cd-ovs         3066483
> > > > orig-ovs       1588049
> > > > speedup        1.93x
> > > >
> > > >
> > > >  Test rules/traffic setup:
> > > >  ----
> > > > To setup a test case with 20 subtables, the rule set we use is like 
> > > > below:
> > > > tcp,nw_src=1.0.0.0/8, actions=output:1
> > > > udp,nw_src=2.0.0.0/9, actions=output:1
> > > > udp,nw_src=3.0.0.0/10,actions=output:1
> > > > udp,nw_src=4.0.0.0/11,actions=output:1
> > > > ...
> > > > udp,nw_src=18.0.0.0/25,actions=output:1
> > > > udp,nw_src=19.0.0.0/26,actions=output:1
> > > > udp,nw_src=20.0.0.0/27,actions=output:1
> > > >
> > > > Then for the traffic generator, we generate corresponding traffics with
> > > > src_ip varying from 1.0.0.0 to 20.0.0.0. For each src_ip, we change
> > > > dst_ip for 50000 different values. This will effectively generate 1M
> > > > different flows hitting the 20 rules we created. And because the 
> > > > different
> > > > wildcarding bits in nw_src, the 20 rules will belong to 20 subtables.
> > > > We use 64 Bytes packet across all tests.
> > > >
> > > > How to check if CD works or not for your use case:
> > > >  ----
> > > > CD cannot improve throughput for all use cases. It targets on use cases
> > when
> > > > multiple subtables exist and when the top-ranked subtable is not hit by
> > the
> > > > vast majority of the traffic.
> > > >
> > > > One can use $OVS_DIR/utilities/ovs-appctl dpif-netdev/pmd-stats-show
> > > > command to check CD statistics: hit/miss.
> > > > Another statistic also shown is: "avg. subtable lookups per hit".
> > > > In our test case, the original OvS will have an average subtable lookups
> > value
> > > > as 10, because there are in total of 20 subtables, and on average, a hit
> > > > happens
> > > > after iterating half of them. In such case, iterating 10 subtables are
> > > > very expensive.
> > > >
> > > > By using CD, this value will be close to 1, which means on average only 
> > > > 1
> > > > subtable needs to be iterated to hit the rule, which reduces a lot of
> > overhead.
> > > >
> > > > Other statistics to notice about is "megaflow hits" and "emc hits".
> > > > If most packets hit EMC, CD does not improve much of the throughput
> > > > since CD is used to optimize megaflow search instead of EMC lookup. If
> > your
> > > > test
> > > > case has less than 8k flows, all of them may be EMC hit.
> > > >
> > > > Note that CD is adaptively turned on/off according to the number of
> > > > subtables and
> > > > their iterated pattern. If it finds there is not much benefit, CD will 
> > > > turn off
> > > > itself automatically.
> > > >
> > > >
> > > >  References:
> > > >  ----------
> > > > [1] H. Lee and B. Lee, Approaches for improving tuple space search-based
> > > > table lookup, ICTC '15
> > > > [2] B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher,
> > > > Cuckoo Filter: Practically Better Than Bloom, CoNEXT '14
> > > >
> > > > The previous RFC on mailing list are at:
> > > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/330570.html
> > > >
> > > > v2: Rebase to master head.
> > > >     Add more testing details in cover letter.
> > > >     Change commit messages.
> > > >     Minor style changes to code.
> > > >     Fix build errors happens without AVX and DPDK library.
> > > >
> > > > Yipeng Wang (5):
> > > >   dpif-netdev: Basic CD feature with scalar lookup.
> > > >   dpif-netdev: Add AVX2 implementation for CD lookup.
> > > >   dpif-netdev: Add CD statistics
> > > >   dpif-netdev: Add adaptive CD mechanism
> > > >   unit-test: Add a delay for CD initialization.
> > > >
> > > >  lib/dpif-netdev.c     | 567
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  tests/ofproto-dpif.at |   3 +
> > > >  2 files changed, 560 insertions(+), 10 deletions(-)
> > > >
> > > > --
> > > > 2.7.4
> > >
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to