[ovs-dev] OVS+DPDK: deadlock and race condtion, which leads OVS deadlock or crash

王志克 Tue, 23 Jan 2018 02:26:47 -0800

Hi Yuanha and all,

Thanks for review. I change the title and let more people involved.


The issues can be reproduced easily with OVS. 
The steps are below:
1) virsh start vm
2) ovs-vsctl add port
3) delete the port via ovs-vsctl del-port.
4) shutdown VM via vrish destroy.

I run a script to repeatly do such job for 2 VM, and it needs about 1 hour to 
reproduce it. 

However, I found there are more issues (I can reproduce it):

1) see below code
static void
vhost_user_read_cb(int connfd, void *dat, int *remove)
{
        struct vhost_user_connection *conn = dat;
        struct vhost_user_socket *vsocket = conn->vsocket;
        int ret;

        ret = vhost_user_msg_handler(conn->vid, connfd);
        if (ret < 0) {
                close(connfd);
                *remove = 1;
                vhost_destroy_device(conn->vid);

                pthread_mutex_lock(&vsocket->conn_mutex);
                TAILQ_REMOVE(&vsocket->conn_list, conn, next);
                pthread_mutex_unlock(&vsocket->conn_mutex);

                free(conn);
                 -------------------------> [Zhike Wang] we can see that the 
vsocket does not exist in conn_list or reconn_list at this short time slot. So 
in this time, if rte_vhost_driver_unregister() is called, vsocket is freed. So 
below vhost_user_create_client() may use raw pointer, and lead to crash.

                if (vsocket->reconnect)
                        vhost_user_create_client(vsocket);
        }
}

2) regarding the above issue, I think we need one mutex to lock conn_list and 
reconn_list, so the vsocket can in either conn_list or reconn_list, not both or 
none. So I use vhost_user.mutex() 
diff --git a/dpdk-16.11.3/lib/librte_vhost/fd_man.c 
b/dpdk-16.11.3/lib/librte_vhost/fd_man.c
@@ -233,6 +234,7 @@
                poll(pfdset->rwfds, numfds, 1000 /* millisecs */);

                need_shrink = 0;
+                pthread_mutex_lock(&vhost_user.mutex);
                for (i = 0; i < numfds; i++) {
                        pthread_mutex_lock(&pfdset->fd_mutex);

@@ -264,7 +266,10 @@
                                rcb(fd, dat, &remove1);
                        if (wcb && pfd->revents & (POLLOUT | FDPOLLERR))
                                wcb(fd, dat, &remove2);
+
+                       pthread_mutex_lock(&pfdset->fd_mutex);
                        pfdentry->busy = 0;
+                       pthread_mutex_unlock(&pfdset->fd_mutex);
                        /*
                         * fdset_del needs to check busy flag.
                         * We don't allow fdset_del to be called in callback
@@ -285,5 +290,6 @@

                if (need_shrink)
                        fdset_shrink(pfdset);
+                pthread_mutex_unlock(&vhost_user.mutex);


@@ -390,6 +395,8 @@ struct vhost_user_reconnect_list {
        struct vhost_user_reconnect *reconn, *next;

        while (1) {
+
+               pthread_mutex_lock(&vhost_user.mutex);
                pthread_mutex_lock(&reconn_list.mutex);

                /*
@@ -417,11 +424,18 @@ struct vhost_user_reconnect_list {
                                "%s: connected\n", reconn->vsocket->path);
                        vhost_user_add_connection(reconn->fd, reconn->vsocket);
 remove_fd:

                pthread_mutex_unlock(&reconn_list.mutex);
+               pthread_mutex_unlock(&vhost_user.mutex);
                sleep(1);
        }

Then I met a deadlock (another thread enters rte_vhost_driver_unregister, and 
waiting for vhost_user.mutex.) 
(gdb) bt
#0  0x00007febfaf8ebcd in poll () from /lib64/libc.so.6
#1  0x00007fec029b6836 in poll (__timeout=<optimized out>, __nfds=2, 
__fds=0x7febe80008c0) at /usr/include/bits/poll2.h:46
#2  time_poll (pollfds=pollfds@entry=0x7febe80008c0, n_pollfds=2, 
handles=handles@entry=0x0, timeout_when=4098990819, 
elapsed=elapsed@entry=0x7febf7450290) at lib/timeval.c:305
#3  0x00007fec0299afea in poll_block () at lib/poll-loop.c:364
#4  0x00007fec0297da25 in ovsrcu_synchronize () at lib/ovs-rcu.c:231
#5  0x00007fec029e9477 in destroy_device (vid=<optimized out>) at 
lib/netdev-dpdk.c:2651
#6  0x00007fec0195db04 in vhost_destroy_device (vid=<optimized out>) at 
/lib/librte_vhost/vhost.c:277
#7  0x00007fec0195cbdc in vhost_user_read_cb (connfd=<optimized out>, 
dat=0x7febc80008c0, remove=0x7febf7451420) at /lib/librte_vhost/socket.c:269
#8  0x00007fec0195c200 in fdset_event_dispatch 
(pfdset=pfdset@entry=0x7fec01b70180 <vhost_user+8192>) at 
/lib/librte_vhost/fd_man.c:266
#9  0x00007fec0195d647 in rte_vhost_driver_session_start () at 
/lib/librte_vhost/socket.c:714
#10 0x00007fec029e661b in start_vhost_loop (dummy=<optimized out>) at 
lib/netdev-dpdk.c:2736
#11 0x00007fec0297f796 in ovsthread_wrapper (aux_=<optimized out>) at 
lib/ovs-thread.c:348
#12 0x00007febfb9b6dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007febfaf9921d in clone () from /lib64/libc.so.6
(gdb) n


So now I believe it is hard to fix the issues insides the modules, and I would 
like to present identified issues, and want to hear your proposal or fix.

Br,
Zhike Wang



-----Original Message-----
From: Yuanhan Liu [mailto:y...@fridaylinux.org] 
Sent: Thursday, January 18, 2018 10:04 PM
To: 王志克
Cc: d...@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of 
conn_mutex

Hi,

Apologize for late review.

On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike <wangzh...@jd.com>
> 
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.

Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.

> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
>    ->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
>    changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
>    such as vhost_user_read_cb() would try to retrieve the conn_mutex.
> 
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.

I then would change the title to "vhost: fix deadlock".

I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.

> Signed-off-by: zhike wang <wangzh...@jd.com>

Again, you need fix your git config file about your name.

> ---
>  lib/librte_vhost/socket.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
>               struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>  
>               if (!strcmp(vsocket->path, path)) {
> +                     int del_fds[MAX_FDS];
> +                     int num_of_fds = 0, fd_index;
> +

I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".

>                       if (vsocket->is_server) {
>                               fdset_del(&vhost_user.fdset, 
> vsocket->socket_fd);
>                               close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
>                               vhost_user_remove_reconnect(vsocket);
>                       }
>  
> +                     /* fdset_del() must be called without conn_mutex. */
> +                     pthread_mutex_lock(&vsocket->conn_mutex);
> +                     for (conn = TAILQ_FIRST(&vsocket->conn_list);
> +                          conn != NULL;
> +                          conn = next) {
> +                             next = TAILQ_NEXT(conn, next);
> +
> +                             del_fds[num_of_fds++] = conn->connfd;
> +                     }
> +                     pthread_mutex_unlock(&vsocket->conn_mutex);
> +
> +                     for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> +                             fdset_del(&vhost_user.fdset, del_fds[fd_index]);
> +
>                       pthread_mutex_lock(&vsocket->conn_mutex);
>                       for (conn = TAILQ_FIRST(&vsocket->conn_list);
>                            conn != NULL;
>                            conn = next) {
>                               next = TAILQ_NEXT(conn, next);
>  
> -                             fdset_del(&vhost_user.fdset, conn->connfd);

If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.

        --yliu
>                               RTE_LOG(INFO, VHOST_CONFIG,
>                                       "free connfd = %d for device '%s'\n",
>                                       conn->connfd, path);
> -- 
> 1.8.3.1
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] OVS+DPDK: deadlock and race condtion, which leads OVS deadlock or crash

Reply via email to