On Fri, Mar 14, 2014 at 6:10 PM, Dongsheng Song <[email protected]> wrote: > On Thu, Mar 13, 2014 at 9:26 PM, Sepherosa Ziehau <[email protected]> wrote: >> >> Hi all, >> >> Following stats are for folks interested in DragonFly's TCP netperf >> performance on 10G network (as of 9f1b012): >> >> Testing system hardware: >> Host: i7-3770 w/ hyperthreading enabled, dual channel DDR3-1600 memory (8GB >> x 2) >> NIC: Intel 82599ES (connected w/ Intel XDACBL1M direct attach cable) >> >> TSO burst size is default to 12000B for DragonFly's ix. >> >> +-------+ +-------+ >> | | | | >> | | ix0 ---- ix0 | | >> | A | | B | >> | | ix1 ---- ix1 | | >> | | | | >> +-------+ +-------+ >> >> B runs 'netserver -N' >> >> 1) TCP_STREAM (total 18840Mbps, 2 ports, 5 run average): >> >> tcp_stream -H B0 -i 64 -l 60 & >> tcp_stream -H B1 -i 64 -l 60 >> >> The above commands starts 128 netperf TCP_STREAM tests to B0 and B1. >> >> The results: >> ~9424Mbps on for each set of test, i.e. total 18840Mbps (5 run average). >> Jain's fairness index for each set of test > 0.85 (1.0 is the best). >> >> CPU usage statistics: >> On TX side (A): ~25% sys, ~2% user, ~7% intr. Almost no contention. >> On RX side (B): ~35% sys, ~3% user, ~10% intr. Mainly contended on >> rcvtok. >> Interrupt rate is ~16000 on each CPU (interrupt moderation is >> default to 8000hz for DragonFly's ix) >> >> 2) TCP_STREAM + TCP_MAERTS (total 37279Mbps, 2 ports, 5 run average): >> >> tcp_stream -H B0 -i 32 -l 60 & >> tcp_stream -H B1 -i 32 -l 60 & >> tcp_stream -H B0 -i 32 -l 60 -r & >> tcp_stream -H B1 -i 32 -l 60 -r >> >> The above commands starts 64 netperf TCP_STREAM and 64 TCP_MAERTS >> tests to B0 and B1. >> >> The results: >> ~9220Mbps - ~9400Mbps for each set of test, i.e. total 37279Mbps (5 >> runs average) >> Jain's fairness index for each set of test > 0.80 (1.0 is the best). >> >> CPU usage statistics: >> ~75% sys, ~4% user, ~20% intr. Mainly contended on rcvtok. The >> tests are CPU limited. System is still responsive during the test. >> Interrupt rate is ~16000 on each CPU (interrupt moderation is >> default to 8000hz for DragonFly's ix) >> >> Best Regards, >> sephe >> >> -- >> Tomorrow Will Never Die > > Thanks, could you post TCP_RR data ?
I am not sure whether TCP_RR is really useful, since each process is working on one socket. However, I have some statistics for tools/tools/netrate/accept_connect/kq_connect_client. It is doing 273Kconns/s (tcp connections, 8 processes, each tries to create 128 connections). The server side is tools/tools/netrate/accept_connect/kq_accept_server (run w/ -r, i.e. SO_REUSEPORT). MSL is set to 10ms for the testing network and net.inet.ip.portrange.last is set to 40000. When doing 273Kconns/s, client side consumes 100% cpu (system is still responsive though), mainly contended on tcp_port_token (350K contentions/s on each CPU). Server side has ~45% idle time on each CPU; contention is pretty low, mainly ip_id spinlock. The tcp_port_token contention is one of the major causes that we can't push 335Kconns/s by _one_ client. Another cause is computational cost of software toeplitz on client side. On server side, toeplitz hash is calculated by hardware. I am currently working on reducing tcp_port_token contention. Best Regards, sephe -- Tomorrow Will Never Die
