On 08/08/2024 10:57, Xinxin Zhao wrote: > When the ovs control thread del vhost-user port and > the vhost-event thread process the vhost-user port down concurrently, > the main thread may fall into a deadlock. > > E.g., vhostuser port is created as client. > The ovs control thread executes the following process: > rte_vhost_driver_unregister->fdset_try_del. > At the same time, the vhost-event thread executes the following process: > fdset_event_dispatch->vhost_user_read_cb->destroy_device. > At this time, vhost-event will wait for rcu scheduling, > and the ovs control thread is waiting for pfdentry->busy to be 0. > The two threads are waiting for each other and fall into a deadlock. >
Hi Xinxin, Thanks for the patch. I managed to reproduced this with a little bit of hacking. Indeed, a deadlock can occur with some unlucky timing. Acked-by: Kevin Traynor <[email protected]> > Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.") > > Signed-off-by: Xinxin Zhao <[email protected]> > --- > lib/netdev-dpdk.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 02cef6e45..0c02357f5 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev > OVS_UNUSED, > OVS_EXCLUDED(dpdk_mutex) > OVS_EXCLUDED(dev->mutex) > { > - return rte_vhost_driver_unregister(vhost_id); > + int ret; > + /* Due to the rcu wait of the vhost-event thread, > + * rte_vhost_driver_unregister() may loop endlessly. > + * So the unregister action needs to be removed from the rcu_list. > + */ > + ovsrcu_quiesce_start(); > + ret = rte_vhost_driver_unregister(vhost_id); > + ovsrcu_quiesce_end(); > + > + return ret; > } > > static void _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
