On 08/08/2024 10:57, Xinxin Zhao wrote:
> When the ovs control thread del vhost-user port and
> the vhost-event thread process the vhost-user port down concurrently,
> the main thread may fall into a deadlock.
> 
> E.g., vhostuser port is created as client.
> The ovs control thread executes the following process:
> rte_vhost_driver_unregister->fdset_try_del.
> At the same time, the vhost-event thread executes the following process:
> fdset_event_dispatch->vhost_user_read_cb->destroy_device.
> At this time, vhost-event will wait for rcu scheduling,
> and the ovs control thread is waiting for pfdentry->busy to be 0.
> The two threads are waiting for each other and fall into a deadlock.
> 

Hi Xinxin,

Thanks for the patch. I managed to reproduced this with a little bit of
hacking. Indeed, a deadlock can occur with some unlucky timing.

Acked-by: Kevin Traynor <[email protected]>

> Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")
> 
> Signed-off-by: Xinxin Zhao <[email protected]>
> ---
>  lib/netdev-dpdk.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 02cef6e45..0c02357f5 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev 
> OVS_UNUSED,
>      OVS_EXCLUDED(dpdk_mutex)
>      OVS_EXCLUDED(dev->mutex)
>  {
> -    return rte_vhost_driver_unregister(vhost_id);
> +    int ret;
> +    /* Due to the rcu wait of the vhost-event thread,
> +     * rte_vhost_driver_unregister() may loop endlessly.
> +     * So the unregister action needs to be removed from the rcu_list.
> +     */
> +    ovsrcu_quiesce_start();
> +    ret = rte_vhost_driver_unregister(vhost_id);
> +    ovsrcu_quiesce_end();
> +
> +    return ret;
>  }
>  
>  static void

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to