On Tue, Sep 25, 2018 at 10:51:05AM +0200, Matteo Croce wrote: > When using the kernel datapath, OVS allocates a pool of sockets to handle > netlink events. The number of sockets is: ports * n-handler-threads, where > n-handler-threads is user configurable and defaults to 3/4*number of cores. > > This because vswitchd starts n-handler-threads threads, each one with a > netlink socket for every port of the switch. Every thread then, starts > listening on events on its set of sockets with epoll(). > > On setup with lot of CPUs and ports, the number of sockets easily hits > the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE. > > Change the number of allocated sockets to just one per port by moving > the socket array from a per handler structure to a per datapath one, > and let all the handlers share the same sockets by using EPOLLEXCLUSIVE > epoll flag which avoids duplicate events, on systems that support it. > > The patch was tested on a 56 core machine running Linux 4.18 and latest > Open vSwitch. A bridge was created with 2000+ ports, some of them being > veth interfaces with the peer outside the bridge. The latency of the upcall > is measured by setting a single 'action=controller,local' OpenFlow rule to > force all the packets going to the slow path and then to the local port. > A tool[1] injects some packets to the veth outside the bridge, and measures > the delay until the packet is captured on the local port. The rx timestamp > is get from the socket ancillary data in the attribute SO_TIMESTAMPNS, to > avoid having the scheduler delay in the measured time. > > The first test measures the average latency for an upcall generated from > a single port. To measure it 100k packets, one every msec, are sent to a > single port and the latencies are measured. > > The second test is meant to check latency fairness among ports, namely if > latency is equal between ports or if some ports have lower priority. > The previous test is repeated for every port, the average of the average > latencies and the standard deviation between averages is measured. > > The third test serves to measure responsiveness under load. Heavy traffic > is sent through all ports, latency and packet loss is measured > on a single idle port. > > The fourth test is all about fairness. Heavy traffic is injected in all > ports but one, latency and packet loss is measured on the single idle port. > > This is the test setup: > > # nproc > 56 > # ovs-vsctl show |grep -c Port > 2223 > # ovs-ofctl dump-flows ovs_upc_br > cookie=0x0, duration=4.827s, table=0, n_packets=0, n_bytes=0, > actions=CONTROLLER:65535,LOCAL > # uname -a > Linux fc28 4.18.7-200.fc28.x86_64 #1 SMP Mon Sep 10 15:44:45 UTC 2018 > x86_64 x86_64 x86_64 GNU/Linux > > And these are the results of the tests: > > Stock OVS Patched > netlink sockets > in use by vswitchd > lsof -p $(pidof ovs-vswitchd) \ > |grep -c GENERIC 91187 2227 > > Test 1 > one port latency > min/avg/max/mdev (us) 2.7/6.6/238.7/1.8 1.6/6.8/160.6/1.7 > > Test 2 > all port > avg latency/mdev (us) 6.51/0.97 6.86/0.17 > > Test 3 > single port latency > under load > avg/mdev (us) 7.5/5.9 3.8/4.8 > packet loss 95 % 62 % > > Test 4 > idle port latency > under load > min/avg/max/mdev (us) 0.8/1.5/210.5/0.9 1.0/2.1/344.5/1.2 > packet loss 94 % 4 % > > CPU and RAM usage seems not to be affected, the resource usage of vswitchd > idle with 2000+ ports is unchanged: > > # ps u $(pidof ovs-vswitchd) > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > openvsw+ 5430 54.3 0.3 4263964 510968 pts/1 RLl+ 16:20 0:50 > ovs-vswitchd > > Additionally, to check if vswitchd is thread safe with this patch, the > following test was run for circa 48 hours: on a 56 core machine, a > bridge with kernel datapath is filled with 2200 dummy interfaces and 22 > veth, then 22 traffic generators are run in parallel piping traffic into > the veths peers outside the bridge. > To generate as many upcalls as possible, all packets were forced to the > slowpath with an openflow rule like 'action=controller,local' and packet > size was set to 64 byte. Also, to avoid overflowing the FDB early and > slowing down the upcall processing, generated mac addresses were restricted > to a small interval. vswitchd ran without problems for 48+ hours, > obviously with all the handler threads with almost 99% CPU usage. > > [1] https://github.com/teknoraver/network-tools/blob/master/weed.c > > Signed-off-by: Matteo Croce <[email protected]> > ---
Acked-by: Flavio Leitner <[email protected]> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
