On 2/28/23 17:14, Ilya Maximets wrote:
> On 2/28/23 17:05, Maxime Coquelin wrote:
>>
>>
>> On 2/28/23 16:37, Ilya Maximets wrote:
>>> On 2/28/23 16:16, Maxime Coquelin wrote:
>>>> Hi,
>>>>
>>>> On 2/28/23 10:33, Wan Junjie wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Feb 28, 2023 at 4:03 PM David Marchand
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> On Fri, Feb 24, 2023 at 2:55 AM Han Ding <[email protected]> wrote:
>>>>>>>> On 2/23/23 10:35, Han Ding wrote:
>>>>>>>>>
>>>>>>>>> When use ovs-vsctl to delete vhostuserclient port while qemu destroy 
>>>>>>>>> virtio,
>>>>>>>>> there is a deadlock between OVS main thread and the vhost-events 
>>>>>>>>> thread.
>>>>>>>>>
>>>>>>>>> openvswitch is 2.14 and dpdk is 20.11
>>>>>>>>
>>>>>>>> FWIW, 2.14 supposed to be used with 19.11.
>>>>>>
>>>>>> Indeed.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks for reply.
>>>>>>>
>>>>>>> The code flow path of openvswitch 3.0.1 and dpdk 21.11, dpdk 22.11 is 
>>>>>>> same
>>>>>>> for destroying vhostuserclient device. So, I think  there is a same bug 
>>>>>>> in new version.
>>>>>>
>>>>>> Please double check the dpdk versions you are using, then have a look
>>>>>> at my comment below.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Main thread:
>>>>>>>>> ofport_remove
>>>>>>>>>     -> netdev_unref
>>>>>>>>>         -> netdev_dpdk_vhost_destruct
>>>>>>>>>              -> rte_vhost_driver_unregister (If fdentry is busy, 
>>>>>>>>> circle again until the fdentry is not busy)
>>>>>>>>>                  -> fdset_try_del  (fdentry is busy now, return -1. 
>>>>>>>>> Goto again)
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> vhost-nets thread:
>>>>>>>>> fdset_event_dispatch (set fdentry to busy. When destroy_device 
>>>>>>>>> fuction return set the fdentry to no busy)
>>>>>>>>>      -> vhost_user_read_cb
>>>>>>>>>        -> vhost_user_msg_handler
>>>>>>>>>            -> vhost_user_get_vring_base
>>>>>>>>>                ->destroy_device
>>>>>>>>>                   -> ovsrcu_synchronize (Wait for other threads to 
>>>>>>>>> quiesce)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> The  vhost-nets thread wait for main thread to quiesce, but the main 
>>>>>>>>> thread now is waiting for fdentry to no busy
>>>>>>>>> and circle all the time.
>>>>>>>
>>>>>
>>>>>
>>>>> I believe this issue exists in older versions, we have seen it before.
>>>>>
>>>>> When we create and destroy vm concurrently and continuously. This
>>>>> could be reproduced easily.
>>>>>
>>>>> We found a related patch from dpdk which is not accepted.
>>>>>
>>>>> https://patchwork.dpdk.org/project/dpdk/patch/[email protected]/
>>>>
>>>> Thanks for the pointer.
>>>>
>>>> I spent some time thinking about it, and I do not see other alternative
>>>> than doing what the patch you shared do, i.e. returning -EAGAIN and
>>>> expecting the application to handle it properly.
>>>>
>>>> In order not to break existing applications that expect
>>>> rte_vhost_driver_unregister to block, maybe we can introduce a new flag
>>>> passed at registration time specifying the application support non-
>>>> blocking unregistration.
>>>>
>>>> Without the flag passed, the lib would behave the same as currently,
>>>> i.e. block until fdet_try_del() succeeds.
>>>>
>>>> Would that solution work with OVS?
>>>
>>> We can't really fail the netdev_dpdk_vhost_destruct() in OVS as there
>>> is no mechanism to re-try it.
>>
>> Would it be possible to enter into a quiescent state before retrying as
>> the author of the initial patch suggests?
>> "
>> It is better to return -EAGAIN to application, who will decide how to handle
>> (eg OVS can call ovsrcu_quiesce() and then retry).
>> "
> 
> I didn't dig too deep, but that doesn't sound safe.
> We're using ofproto object on this path higher in the call stack
> and ofproto is RCU-protected.  Entering quiescent state here may
> result in ofproto object being freed.

Though I guess we already have this issue, because unregister
may trigger destroy_device() that calls ovsrcu_synchronize()...

> 
>>
>>>> Any other alternative in mind?
>>>
>>> We may postpone the rte_vhost_driver_unregister() call in OVS, but
>>> we'll need to make sure somehow that the same port is not re-created
>>> while it's not yet destroyed.
>>>
>>> Just a thought, I don't really know how to implement that as failing
>>> the netdev_open() might also be a problem.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>>>
>>>> Regards,
>>>> Maxime
>>>>
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Whether the ovsrcu_synchronize  is necessary in destrory_device
>>>>>>>>
>>>>>>>> Yes, we have to wait for all threads to stop using this device.
>>>>>>>>
>>>>>>>>> or the rte_vhost_driver_unregister is correct in fdset_try_del ?
>>>>>>>>
>>>>>>>> CC: David and Maxime.
>>>>>>
>>>>>> This could be a different issue, but there was a change for this area
>>>>>> of the vhost library in v21.11:
>>>>>> 451dc0fad83d ("vhost: fix crash on port deletion")
>>>>>>
>>>>>> I don't see it backported to 19.11 stable release, but it got to
>>>>>> v20.11.4 (used in 2.15 and 2.16 ovs branches).
>>>>>>
>>>>>> I would suggest backporting it to your 19.11 dpdk and retest with ovs 
>>>>>> 2.14.
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> David Marchand
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [email protected]
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to