On Thu, Feb 17, 2022 at 8:50 AM Michael S. Tsirkin <[email protected]> wrote: > > On Thu, Feb 17, 2022 at 03:39:48PM +0800, Jason Wang wrote: > > On Thu, Feb 17, 2022 at 3:36 PM Michael S. Tsirkin <[email protected]> wrote: > > > > > > On Thu, Feb 17, 2022 at 03:34:13PM +0800, Jason Wang wrote: > > > > On Thu, Feb 17, 2022 at 10:01 AM syzbot > > > > <[email protected]> wrote: > > > > > > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: c5d9ae265b10 Merge tag 'for-linus' of > > > > > git://git.kernel.org.. > > > > > git tree: upstream > > > > > console output: > > > > > https://syzkaller.appspot.com/x/log.txt?x=132e687c700000 > > > > > kernel config: > > > > > https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912 > > > > > dashboard link: > > > > > https://syzkaller.appspot.com/bug?extid=1e3ea63db39f2b4440e0 > > > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU > > > > > Binutils for Debian) 2.35.2 > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the > > > > > commit: > > > > > Reported-by: [email protected] > > > > > > > > > > WARNING: CPU: 1 PID: 10828 at drivers/vhost/vhost.c:715 > > > > > vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715 > > > > > Modules linked in: > > > > > CPU: 0 PID: 10828 Comm: syz-executor.0 Not tainted > > > > > 5.17.0-rc4-syzkaller-00051-gc5d9ae265b10 #0 > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, > > > > > BIOS Google 01/01/2011 > > > > > RIP: 0010:vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715 > > > > > > > > Probably a hint that we are missing a flush. > > > > > > > > Looking at vhost_vsock_stop() that is called by > > > > vhost_vsock_dev_release(): > > > > > > > > static int vhost_vsock_stop(struct vhost_vsock *vsock) > > > > { > > > > size_t i; > > > > int ret; > > > > > > > > mutex_lock(&vsock->dev.mutex); > > > > > > > > ret = vhost_dev_check_owner(&vsock->dev); > > > > if (ret) > > > > goto err; > > > > > > > > Where it could fail so the device is not actually stopped. > > > > > > > > I wonder if this is something related. > > > > > > > > Thanks > > > > > > > > > But then if that is not the owner then no work should be running, right? > > > > Could it be a buggy user space that passes the fd to another process > > and changes the owner just before the mutex_lock() above? > > > > Thanks > > Maybe, but can you be a bit more explicit? what is the set of > conditions you see that can lead to this?
I think the issue could be in the vhost_vsock_stop() as Jason mentioned, but not related to fd passing, but related to the do_exit() function. Looking the stack trace, we are in exit_task_work(), that is called after exit_mm(), so the vhost_dev_check_owner() can fail because current->mm should be NULL at that point. It seems the fput work is queued by fput_many() in a worker queue, and in some cases (maybe a lot of files opened?) the work is still queued when we enter in do_exit(). That said, I don't know if we can simply remove that check in vhost_vsock_stop(), or check if current->mm is NULL, to understand if the process is exiting. Stefano _______________________________________________ Virtualization mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/virtualization
