On 12/10/19 12:37 PM, Matteo Croce wrote:
> On Tue, Dec 10, 2019 at 8:13 PM David Ahern <[email protected]> wrote:
>>
>> Hi Matteo:
>>
>> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
>> thundering herd wake up problem. Every packet punted to userspace wakes
>> up every one of the handler threads. On a box with 96 cpus, there are 71
>> handler threads which means 71 process wakeups for every packet punted.
>>
>> This is really easy to see, just watch sched:sched_wakeup tracepoints.
>> With a few extra probes:
>>
>> perf probe sock_def_readable sk=%di
>> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
>> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
>> wake_flags=%cx key=%8
>>
>> you can see there is a single netlink socket and its wait queue contains
>> an entry for every handler thread.
>>
>> This does not happen with the 2.7.3 version. Roaming commits it appears
>> that the change in behavior comes from this commit:
>>
>> commit 69c51582ff786a68fc325c1c50624715482bc460
>> Author: Matteo Croce <[email protected]>
>> Date:   Tue Sep 25 10:51:05 2018 +0200
>>
>>     dpif-netlink: don't allocate per thread netlink sockets
>>
>>
>> Is this a known problem?
>>
>> David
>>
> 
> Hi David,
> 
> before my patch, vswitchd created NxM sockets, being N the ports and M
> the active cores,
> because every thread opens a netlink socket per port.
> 
> With my patch, a pool is created with N socket, one per port, and all
> the threads polls the same list
> with the EPOLLEXCLUSIVE flag.
> As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> of the waiting threads.
> 
> I'm not aware of this problem, but it goes against the intended
> behaviour of EPOLLEXCLUSIVE.
> Such flag exists since Linux 4.5, can you check that it's passed
> correctly to epoll()?
> 

I get the theory, but the reality is that all threads are awakened.
Also, it is not limited to the 4.14 kernel; I see the same behavior with
5.4.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to