Re: [PATCH v3] virtio_vsock: Fix race condition in virtio_transport_recv_pkt

2020-05-30 Thread David Miller
From: Jia He 
Date: Sat, 30 May 2020 09:38:28 +0800

> When client on the host tries to connect(SOCK_STREAM, O_NONBLOCK) to the
> server on the guest, there will be a panic on a ThunderX2 (armv8a server):
 ...
> The race condition is as follows:
> Task1Task2
> ==
> __sock_release   virtio_transport_recv_pkt
>   __vsock_release  vsock_find_bound_socket (found sk)
> lock_sock_nested
> vsock_remove_sock
> sock_orphan
>   sk_set_socket(sk, NULL)
> sk->sk_shutdown = SHUTDOWN_MASK
> ...
> release_sock
> lock_sock
>virtio_transport_recv_connecting
>  sk->sk_socket->state (panic!)
> 
> The root cause is that vsock_find_bound_socket can't hold the lock_sock,
> so there is a small race window between vsock_find_bound_socket() and
> lock_sock(). If __vsock_release() is running in another task,
> sk->sk_socket will be set to NULL inadvertently.
> 
> This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
> lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
> protection of lock_sock_nested.
> 
> Signed-off-by: Jia He 
> Reviewed-by: Stefano Garzarella 

Applied and queued up for -stable, thank you.


RE: [PATCH v3] virtio_vsock: Fix race condition in virtio_transport_recv_pkt()

2020-05-30 Thread Justin He
Hi Markus

> -Original Message-
> From: Markus Elfring 
> Sent: Saturday, May 30, 2020 6:41 PM
> To: Justin He ; k...@vger.kernel.org;
> net...@vger.kernel.org; virtualizat...@lists.linux-foundation.org
> Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org;
> sta...@vger.kernel.org; David S. Miller ; Jakub
> Kicinski ; Kaly Xin ; Stefan Hajnoczi
> ; Stefano Garzarella 
> Subject: Re: [PATCH v3] virtio_vsock: Fix race condition in
> virtio_transport_recv_pkt()
>
> > This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
> > lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
> > protection of lock_sock_nested.
>
> How do you think about a wording variant like the following?
>
>   Thus check the data structure member “sk_shutdown” (suggested by Stefano)
>   after a call of the function “lock_sock” since this field is set to
>   “SHUTDOWN_MASK” under the protection of “lock_sock_nested”.
>
Okay, will update the commit msg.

>
> Would you like to add the tag “Fixes” to the commit message?
Sure.

Thanks


--
Cheers,
Justin (Jia He)


>
> Regards,
> Markus
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


Re: [PATCH v3] virtio_vsock: Fix race condition in virtio_transport_recv_pkt()

2020-05-30 Thread Markus Elfring
> This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
> lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
> protection of lock_sock_nested.

How do you think about a wording variant like the following?

  Thus check the data structure member “sk_shutdown” (suggested by Stefano)
  after a call of the function “lock_sock” since this field is set to
  “SHUTDOWN_MASK” under the protection of “lock_sock_nested”.


Would you like to add the tag “Fixes” to the commit message?

Regards,
Markus


[PATCH v3] virtio_vsock: Fix race condition in virtio_transport_recv_pkt

2020-05-29 Thread Jia He
When client on the host tries to connect(SOCK_STREAM, O_NONBLOCK) to the
server on the guest, there will be a panic on a ThunderX2 (armv8a server):

[  463.718844] Unable to handle kernel NULL pointer dereference at virtual 
address 
[  463.718848] Mem abort info:
[  463.718849]   ESR = 0x9644
[  463.718852]   EC = 0x25: DABT (current EL), IL = 32 bits
[  463.718853]   SET = 0, FnV = 0
[  463.718854]   EA = 0, S1PTW = 0
[  463.718855] Data abort info:
[  463.718856]   ISV = 0, ISS = 0x0044
[  463.718857]   CM = 0, WnR = 1
[  463.718859] user pgtable: 4k pages, 48-bit VAs, pgdp=008f6f6e9000
[  463.718861] [] pgd=
[  463.718866] Internal error: Oops: 9644 [#1] SMP
[...]
[  463.718977] CPU: 213 PID: 5040 Comm: vhost-5032 Tainted: G   O  
5.7.0-rc7+ #139
[  463.718980] Hardware name: GIGABYTE R281-T91-00/MT91-FS1-00, BIOS F06 
09/25/2018
[  463.718982] pstate: 6049 (nZCv daif +PAN -UAO)
[  463.718995] pc : virtio_transport_recv_pkt+0x4c8/0xd40 
[vmw_vsock_virtio_transport_common]
[  463.718999] lr : virtio_transport_recv_pkt+0x1fc/0xd40 
[vmw_vsock_virtio_transport_common]
[  463.719000] sp : 80002dbe3c40
[...]
[  463.719025] Call trace:
[  463.719030]  virtio_transport_recv_pkt+0x4c8/0xd40 
[vmw_vsock_virtio_transport_common]
[  463.719034]  vhost_vsock_handle_tx_kick+0x360/0x408 [vhost_vsock]
[  463.719041]  vhost_worker+0x100/0x1a0 [vhost]
[  463.719048]  kthread+0x128/0x130
[  463.719052]  ret_from_fork+0x10/0x18

The race condition is as follows:
Task1Task2
==
__sock_release   virtio_transport_recv_pkt
  __vsock_release  vsock_find_bound_socket (found sk)
lock_sock_nested
vsock_remove_sock
sock_orphan
  sk_set_socket(sk, NULL)
sk->sk_shutdown = SHUTDOWN_MASK
...
release_sock
lock_sock
   virtio_transport_recv_connecting
 sk->sk_socket->state (panic!)

The root cause is that vsock_find_bound_socket can't hold the lock_sock,
so there is a small race window between vsock_find_bound_socket() and
lock_sock(). If __vsock_release() is running in another task,
sk->sk_socket will be set to NULL inadvertently.

This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
protection of lock_sock_nested.

Signed-off-by: Jia He 
Cc: sta...@vger.kernel.org
Reviewed-by: Stefano Garzarella 
---
v3: - describe the fix of race condition more clearly
- refine the commit log

 net/vmw_vsock/virtio_transport_common.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 69efc891885f..0edda1edf988 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1132,6 +1132,14 @@ void virtio_transport_recv_pkt(struct virtio_transport 
*t,
 
lock_sock(sk);
 
+   /* Check if sk has been released before lock_sock */
+   if (sk->sk_shutdown == SHUTDOWN_MASK) {
+   (void)virtio_transport_reset_no_sock(t, pkt);
+   release_sock(sk);
+   sock_put(sk);
+   goto free_pkt;
+   }
+
/* Update CID in case it has changed after a transport reset event */
vsk->local_addr.svm_cid = dst.svm_cid;
 
-- 
2.17.1