Re: [ovs-dev] [PATCH] netdev-dpdk: Fix deadlock due to virtqueue stats retrieval.

Ilya Maximets Mon, 16 Jan 2023 06:01:13 -0800

On 1/16/23 14:41, David Marchand wrote:
> On Fri, Jan 13, 2023 at 4:57 PM Ilya Maximets <[email protected]> wrote:
>>>>> @@ -5766,6 +5804,7 @@ static const struct netdev_class dpdk_vhost_class = 
>>>>> {
>>>>>  static const struct netdev_class dpdk_vhost_client_class = {
>>>>>      .type = "dpdkvhostuserclient",
>>>>>      NETDEV_DPDK_CLASS_COMMON,
>>>>> +    .run = netdev_dpdk_vhost_run,
>>>>
>>>> This part is also needed for the 'server' type.
>>>>
>>>> However, there is no guarantee that run() will be called in any reasonable
>>>> time, because we don't have a wait() method that would schedule a wake up
>>>> for the main thread.   The main thread wakes up for other reasons like
>>>> stats update, but that happens not that frequently and we can't rely on 
>>>> that
>>>> anyway.  For example, default stats update interval is 5 seconds.
>>>>
>>>> The problem is that we can't really iterate over devices and take 
>>>> dev->mutex
>>>> in the context of a callback.  Meaning we can't really implement a good
>>>> wait() method that would trigger some seqno change that would wake up the
>>>> main thread.  We could trigger a global connectivity seqno, I guess.  Not
>>>> sure how safe it is, since sequence numbers use internal mutex.
>>>>
>>>> OTOH, maybe we can employ dpdk-watchdog thread to perform vhost 
>>>> housekeeping?
>>>> Currently it's doing a very similar thing for physical ports, i.e. updates
>>>> the link state periodically.  It's sleep interval is a bit large though,
>>>> but maybe we could tweak that, or update phy link states with the same
>>>> interval as it is now, but check the vhost event queue a bit more 
>>>> frequently.
>>>> Maybe even have something like a backoff in dp_netdev_flow_offload_main().
>>>
>>> Having a dedicated thread seems safer, for the same reason as Maxime
>>> mentionned for OVS  main thread: vhost events may be triggered by a
>>> malicious guest and OVS would be blind to some phy link updates.
>>>
>>>
>>> Another thought on my patch... is it safe to have an unbound queue wrt
>>> to memory consumption?
>>> Again, a malicious guest could make OVS consume a lot of memory (in my
>>> tests, OVS complained that heap increased to 500M).
> 
> *Note* heap was at ~400M before starting the test. So it was an
> increase of around 100M.
> 
> The reason was likely that .run() was not run often enough and the
> queue was growing with pending updates.
> Now that I went with the dedicated thread approach, all I see is an
> increase of 1-2M, maximum.
> 
> I think we can avoid going with your suggestion.
> WDYT?


OK.  Sounds OK then.

Maybe add a rate-limited warning message if the queue exceeds
1000 entries or so?

> 
> 
>>
>> We're entering a gross over-engineering territory, but the solution
>> might be following:
>>
>> RCU-list with nodes with following elements:
>>  - ifname
>>  - qid
>>  - atomic_bool enabled
>>  - atomic_bool seen
>>  - timestamp
> 
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix deadlock due to virtqueue stats retrieval.

Reply via email to