On Tue, Dec 10, 2019 at 8:13 PM David Ahern <[email protected]> wrote: > > Hi Matteo: > > On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a > thundering herd wake up problem. Every packet punted to userspace wakes > up every one of the handler threads. On a box with 96 cpus, there are 71 > handler threads which means 71 process wakeups for every packet punted. > > This is really easy to see, just watch sched:sched_wakeup tracepoints. > With a few extra probes: > > perf probe sock_def_readable sk=%di > perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx > perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx > wake_flags=%cx key=%8 > > you can see there is a single netlink socket and its wait queue contains > an entry for every handler thread. > > This does not happen with the 2.7.3 version. Roaming commits it appears > that the change in behavior comes from this commit: > > commit 69c51582ff786a68fc325c1c50624715482bc460 > Author: Matteo Croce <[email protected]> > Date: Tue Sep 25 10:51:05 2018 +0200 > > dpif-netlink: don't allocate per thread netlink sockets > > > Is this a known problem? > > David >
Hi David, before my patch, vswitchd created NxM sockets, being N the ports and M the active cores, because every thread opens a netlink socket per port. With my patch, a pool is created with N socket, one per port, and all the threads polls the same list with the EPOLLEXCLUSIVE flag. As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one of the waiting threads. I'm not aware of this problem, but it goes against the intended behaviour of EPOLLEXCLUSIVE. Such flag exists since Linux 4.5, can you check that it's passed correctly to epoll()? Bye, -- Matteo Croce per aspera ad upstream _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
