On 12/10/19 5:20 PM, David Ahern wrote:
> On 12/10/19 3:09 PM, Jason Baron wrote:
>> Hi David,
>>
>> The idea is that we try and queue new work to 'idle' threads in an
>> attempt to distribute a workload. Thus, once we find an 'idle' thread we
>> stop waking up other threads. While we are searching the wakeup list for
>> idle threads, we do queue an epoll event to the non-idle threads, this
>> doesn't mean they are woken up. It just means that when they go to
>> epoll_wait() to harvest events from the kernel, if the event is still
>> available it will be reported. If the condition for the event is no
>> longer true (because another thread consumed it), they the event
>> wouldn't be visible. So its a way of load balancing a workload while
>> also reducing the number of wakeups. Its 'exclusive' in the sense that
>> it will stop after it finds the first idle thread.
>>
>> We certainly can employ other wakeup strategies - there was interest
>> (and patches) for a strict 'round robin' but that has not been merged
>> upstream.
>>
>> I would like to better understand the current usecase. It sounds like
>> each thread as an epoll file descriptor. And each epoll file descriptor
>> is attached the name netlink socket. But when that netlink socket gets a
>> packet it causes all the threads to wakeup? Are you sure there is just 1
>> netlink socket that all epoll file desciptors are are attached to?
>>
> 
> Thanks for the response.
> 
> This is the code in question:
> 
> https://github.com/openvswitch/ovs/blob/branch-2.11/lib/dpif-netlink.c#L492
> 
> Yes, prior to finding the above code reference I had traced it to a
> single socket with all handler threads (71 threads on this 96 cpu box)
> on the wait queue.
> 
> The ovs kernel module is punting a packet to userspace. It generates a
> netlink message and invokes netlink_unicast. This the stack trace:
> 
>         ffffffffad09cc02 ttwu_do_wakeup+0x92 ([kernel.kallsyms])
>         ffffffffad09d945 try_to_wake_up+0x1d5 ([kernel.kallsyms])
>         ffffffffad257275 pollwake+0x75 ([kernel.kallsyms])
>         ffffffffad0b58a4 __wake_up_common+0x74 ([kernel.kallsyms])
>         ffffffffad0b59cc __wake_up_common_lock+0x7c ([kernel.kallsyms])
>         ffffffffad289ecc ep_poll_wakeup_proc+0x1c ([kernel.kallsyms])
>         ffffffffad28a4bc ep_call_nested.constprop.18+0xbc
> ([kernel.kallsyms])
>         ffffffffad28b0f2 ep_poll_callback+0x172 ([kernel.kallsyms])
>         ffffffffad0b58a4 __wake_up_common+0x74 ([kernel.kallsyms])
>         ffffffffad0b59cc __wake_up_common_lock+0x7c ([kernel.kallsyms])
>         ffffffffad794af9 sock_def_readable+0x39 ([kernel.kallsyms])
>         ffffffffad7e846e __netlink_sendskb+0x3e ([kernel.kallsyms])
>         ffffffffad7eb11a netlink_unicast+0x20a ([kernel.kallsyms])
>         ffffffffc07abd44 queue_userspace_packet+0x2d4 ([kernel.kallsyms])
>         ffffffffc07ac330 ovs_dp_upcall+0x50 ([kernel.kallsyms])
> 
> 
> A probe on sock_def_readable shows it is a single socket that the wait
> queue is processed. Eventually ttwu_do_wakeup is invoked 71 times (once
> for each thread). In some cases I see the awakened threads immediately
> running on the target_cpu, while the queue walk continues to wake up all
> threads. Only the first one is going to handle the packet so the rest of
> the wakeups are just noise.
> 
> On this system in just a 1-second interval I see this sequence play out
> 400+ times.
> 


That stack trace is interesting. I *think* it means you are doing
select() or poll() on the epoll file descriptors? I say that because I
see 'pollwake()' in the stack. Notice that pollwake() is only in
fs/select.c.

Thanks,

-jason
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to