Hi all, Before I move on to the next big ticket (multiple-tx queue support), here is the performance I currently got as of git 2aa7f7f.
Quick summary, the IFQ packets staging mechanism gives me: +80Kpps for 2 bidirectional normal IP forwarding (now 4.20Mpps) +30Kpps for 2 bidirectional fast forwarding (now 5.07Mpps) Detailed information, please read the following inline comment. On Thu, Dec 20, 2012 at 3:03 PM, Sepherosa Ziehau <[email protected]> wrote: > On Fri, Dec 14, 2012 at 5:47 PM, Sepherosa Ziehau <[email protected]> wrote: >> Hi all, >> >> This email serves as the base performance measurement for further >> network stack optimization (as of git 107282b). > > Since bidirectional fast IP forwarding is already max out the GigE > limit, I increase the measurement strength a bit. The new measurement > is against git 7e1fbcf > >> >> >> The hardware: >> mobo ASUS P867H-M >> 4x4G DDR3 memory >> CPU i7-2600 (w/ HT and Turbo Boost enabled, 4C/8T) >> Forwarding NIC Intel 82576EB dual copper > > The forwarding NIC is now changed to 82580EB quad copper. > >> Packet generator NICs Intel 82571EB dual copper >> >> >> A emx1 <---> igb0 forwarder igb1 <---> emx1 B > > The testing topology is changed into following configure: > +---+ +-----------+ +---+ > | | emx1 <---> igb0 | | igb1 <---> emx1 | | > | A | | forwarder | | B | > | | emx2 <---> igb2 | | igb3 <---> emx2 | | > +---+ +-----------+ +---+ > > Streams: > A.emx1 <---> B.emx1 (bidirectional) > A.emx2 <---> B.emx2 (bidirectional) > >> >> A and "forwarder", B and "forwarder" are directly connected using CAT6 >> cables. >> Polling(4) is enabled on igb1 and igb0 on "forwarder". Following >> tunables are in /boot/loader.conf: >> kern.ipc.nmbclusters="524288" >> net.ifpoll.user_frac="10" >> net.ifpoll.status_frac="1000" net.link.ifq_stage_cntmax="8" >> Following sysctl is changed before putting igb1 into polling mode: >> sysctl hw.igb1.npoll_txoff=4 > > sysctl hw.igb1.npoll_txoff=1 > sysctl hw.igb2.npoll_txoff=2 > sysctl hw.igb3.npoll_txoff=3 sysctl hw.igb0.tx_wreg_nsegs=16 sysctl hw.igb1.tx_wreg_nsegs=16 sysctl hw.igb2.tx_wreg_nsegs=16 sysctl hw.igb3.tx_wreg_nsegs=16 > >> >> >> First for the users that are only interested in the bulk forwarding >> performance: The 32 netperf TCP_STREAMs running on A could do >> 941Mbps. >> >> >> Now the tiny packets forwarding performance: >> >> A and B generate 18 bytes UDP datagrams using >> tools/tools/netrate/pktgen. The destination addresses of the UDP >> datagrams are selected that the generated UDP datagrams will be evenly >> distributed the to the 8 RX queues, which should be common in the >> production environment. >> >> Bidirectional normal IP forwarding: >> 1.42Mpps in each direction, so total 2.84Mpps are forwarded. >> CPU usage: >> On CPUs that are doing TX in addition to RX: 85% ~ 90% (max allowed by >> polling's user_frac) >> On CPUs that are only doing RX: 40% ~ 50% > > Two sets of bidirectional normal IP forwarding: > 1.03Mpps in each direction, so total 4.12Mpps are forwarded. 1.05+Mpps in each direction, so total 4.20Mpps are forwarded. > CPU usage: > On CPUs that are doing TX in addition to RX: 90% (max allowed by > polling's user_frac) > On CPUs that are only doing RX: 70% ~ 80% Not much improvement on CPU usage. > IPI rate on CPUs that are doing TX in addition to RX: ~10K/s IPI rate on CPUs that are doing TX in addition to RX: ~4.5K/s > >> >> Bidirectional fast IP forwarding: (net.inet.ip.fastforwarding=1) >> 1.48Mpps in each direction, so total 2.96Mpps are forwarded. >> CPU usage: >> On CPUs that are doing TX in addition to RX: 65% ~ 70% >> On CPUs that are doing RX: 30% ~ 40% > > Two sets of bidirectional fast IP forwarding: (net.inet.ip.fastforwarding=1) > 1.26Mpps in each direction, so total 5.04Mpps are forwarded. ~1.27Mpps in each direction, so total 5.07Mpps are forwarded. > CPU usage: > On CPUs that are doing TX in addition to RX: 90% (max allowed by > polling's user_frac) > On CPUs that are only doing RX: 60% ~ 70% Not much improvement on CPU usage. > IPI rate on CPUs that are doing TX in addition to RX: ~10K/s IPI rate on CPUs that are doing TX in addition to RX: ~5K/s Best Regards, sephe -- Tomorrow Will Never Die
