>On 2/28/23 17:14, Ilya Maximets wrote:
>> On 2/28/23 17:05, Maxime Coquelin wrote:
>>>
>>>
>>> On 2/28/23 16:37, Ilya Maximets wrote:
>>>> On 2/28/23 16:16, Maxime Coquelin wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/28/23 10:33, Wan Junjie wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Feb 28, 2023 at 4:03 PM David Marchand
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Fri, Feb 24, 2023 at 2:55 AM Han Ding <[email protected]> 
>>>>>>> wrote:
>>>>>>>>> On 2/23/23 10:35, Han Ding wrote:
>>>>>>>>>>
>>>>>>>>>> When use ovs-vsctl to delete vhostuserclient port while qemu destroy 
>>>>>>>>>> virtio,
>>>>>>>>>> there is a deadlock between OVS main thread and the vhost-events 
>>>>>>>>>> thread.
>>>>>>>>>>
>>>>>>>>>> openvswitch is 2.14 and dpdk is 20.11
>>>>>>>>>
>>>>>>>>> FWIW, 2.14 supposed to be used with 19.11.
>>>>>>>
>>>>>>> Indeed.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for reply.
>>>>>>>>
>>>>>>>> The code flow path of openvswitch 3.0.1 and dpdk 21.11, dpdk 22.11 is 
>>>>>>>> same
>>>>>>>> for destroying vhostuserclient device. So, I think  there is a same 
>>>>>>>> bug in new version.
>>>>>>>
>>>>>>> Please double check the dpdk versions you are using, then have a look
>>>>>>> at my comment below.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Main thread:
>>>>>>>>>> ofport_remove
>>>>>>>>>>     -> netdev_unref
>>>>>>>>>>         -> netdev_dpdk_vhost_destruct
>>>>>>>>>>              -> rte_vhost_driver_unregister (If fdentry is busy, 
>>>>>>>>>> circle again until the fdentry is not busy)
>>>>>>>>>>                  -> fdset_try_del  (fdentry is busy now, return -1. 
>>>>>>>>>> Goto again)
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> vhost-nets thread:
>>>>>>>>>> fdset_event_dispatch (set fdentry to busy. When destroy_device 
>>>>>>>>>> fuction return set the fdentry to no busy)
>>>>>>>>>>      -> vhost_user_read_cb
>>>>>>>>>>        -> vhost_user_msg_handler
>>>>>>>>>>            -> vhost_user_get_vring_base
>>>>>>>>>>                ->destroy_device
>>>>>>>>>>                   -> ovsrcu_synchronize (Wait for other threads to 
>>>>>>>>>> quiesce)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> The  vhost-nets thread wait for main thread to quiesce, but the main 
>>>>>>>>>> thread now is waiting for fdentry to no busy
>>>>>>>>>> and circle all the time.
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> I believe this issue exists in older versions, we have seen it before.
>>>>>>
>>>>>> When we create and destroy vm concurrently and continuously. This
>>>>>> could be reproduced easily.
>>>>>>
>>>>>> We found a related patch from dpdk which is not accepted.
>>>>>>
>>>>>> https://patchwork.dpdk.org/project/dpdk/patch/[email protected]/
>>>>>
>>>>> Thanks for the pointer.
>>>>>
>>>>> I spent some time thinking about it, and I do not see other alternative
>>>>> than doing what the patch you shared do, i.e. returning -EAGAIN and
>>>>> expecting the application to handle it properly.
>>>>>
>>>>> In order not to break existing applications that expect
>>>>> rte_vhost_driver_unregister to block, maybe we can introduce a new flag
>>>>> passed at registration time specifying the application support non-
>>>>> blocking unregistration.
>>>>>
>>>>> Without the flag passed, the lib would behave the same as currently,
>>>>> i.e. block until fdet_try_del() succeeds.
>>>>>
>>>>> Would that solution work with OVS?
>>>>
>>>> We can't really fail the netdev_dpdk_vhost_destruct() in OVS as there
>>>> is no mechanism to re-try it.
>>>
>>> Would it be possible to enter into a quiescent state before retrying as
>>> the author of the initial patch suggests?
>>> "
>>> It is better to return -EAGAIN to application, who will decide how to handle
>>> (eg OVS can call ovsrcu_quiesce() and then retry).
>>> "
>> 
>> I didn't dig too deep, but that doesn't sound safe.
>> We're using ofproto object on this path higher in the call stack
>> and ofproto is RCU-protected.  Entering quiescent state here may
>> result in ofproto object being freed.
>
>Though I guess we already have this issue, because unregister
>may trigger destroy_device() that calls ovsrcu_synchronize()...
>

OVS save vhost_id to a list when rte_vhost_driver_unregister() return -EAGAIN. 
Then before next bridge_run, OVS
try to unregister driver according to the vhos_id node saved in list.

Can we use this way to fix the bug ?


>> 
>>>
>>>>> Any other alternative in mind?
>>>>
>>>> We may postpone the rte_vhost_driver_unregister() call in OVS, but
>>>> we'll need to make sure somehow that the same port is not re-created
>>>> while it's not yet destroyed.
>>>>
>>>> Just a thought, I don't really know how to implement that as failing
>>>> the netdev_open() might also be a problem.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>>>
>>>>> Regards,
>>>>> Maxime
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Whether the ovsrcu_synchronize  is necessary in destrory_device
>>>>>>>>>
>>>>>>>>> Yes, we have to wait for all threads to stop using this device.
>>>>>>>>>
>>>>>>>>>> or the rte_vhost_driver_unregister is correct in fdset_try_del ?
>>>>>>>>>
>>>>>>>>> CC: David and Maxime.
>>>>>>>
>>>>>>> This could be a different issue, but there was a change for this area
>>>>>>> of the vhost library in v21.11:
>>>>>>> 451dc0fad83d ("vhost: fix crash on port deletion")
>>>>>>>
>>>>>>> I don't see it backported to 19.11 stable release, but it got to
>>>>>>> v20.11.4 (used in 2.15 and 2.16 ovs branches).
>>>>>>>
>>>>>>> I would suggest backporting it to your 19.11 dpdk and retest with ovs 
>>>>>>> 2.14.
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> David Marchand
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>
>>>>>
>>>>
>>>
>> 
>


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to