Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Thu, Jun 01, 2023 at 11:33:09AM -0500, Mike Christie wrote: On 6/1/23 2:47 AM, Stefano Garzarella wrote: static void vhost_worker_free(struct vhost_dev *dev) { - struct vhost_worker *worker = dev->worker; + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); - if (!worker) + if (!vtsk) return; - dev->worker = NULL; - WARN_ON(!llist_empty(>work_list)); - vhost_task_stop(worker->vtsk); - kfree(worker); + vhost_task_stop(vtsk); + WARN_ON(!llist_empty(>worker.work_list)); + WRITE_ONCE(dev->worker.vtsk, NULL); The patch LGTM, I just wonder if we should set dev->worker to zero here, We might want to just set kcov_handle to zero for now. In 6.3 and older, I think we could do: 1. vhost_dev_set_owner could successfully set dev->worker. 2. vhost_transport_send_pkt runs vhost_work_queue and sees worker is set and adds the vhost_work to the work_list. 3. vhost_dev_set_owner fails in vhost_attach_cgroups, so we stop the worker before the work can be run and set worker to NULL. 4. We clear kcov_handle and return. We leave the work on the work_list. 5. Userspace can then retry vhost_dev_set_owner. If that works, then the work gets executed ok eventually. OR Userspace can just close the device. vhost_vsock_dev_release would eventually call vhost_dev_cleanup (vhost_dev_flush won't see a worker so will just return), and that will hit the WARN_ON but we would proceed ok. If I do a memset of the worker, then if userspace were to retry VHOST_SET_OWNER, we would lose the queued work since the work_list would get zero'd. I think it's unlikely this ever happens, but you know best so let me know if this a real issue. I don't think it's a problem, though, you're right, we could hide the warning and thus future bugs, better as you proposed. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 6/1/23 2:47 AM, Stefano Garzarella wrote: >> >> static void vhost_worker_free(struct vhost_dev *dev) >> { >> - struct vhost_worker *worker = dev->worker; >> + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); >> >> - if (!worker) >> + if (!vtsk) >> return; >> >> - dev->worker = NULL; >> - WARN_ON(!llist_empty(>work_list)); >> - vhost_task_stop(worker->vtsk); >> - kfree(worker); >> + vhost_task_stop(vtsk); >> + WARN_ON(!llist_empty(>worker.work_list)); >> + WRITE_ONCE(dev->worker.vtsk, NULL); > > The patch LGTM, I just wonder if we should set dev->worker to zero here, We might want to just set kcov_handle to zero for now. In 6.3 and older, I think we could do: 1. vhost_dev_set_owner could successfully set dev->worker. 2. vhost_transport_send_pkt runs vhost_work_queue and sees worker is set and adds the vhost_work to the work_list. 3. vhost_dev_set_owner fails in vhost_attach_cgroups, so we stop the worker before the work can be run and set worker to NULL. 4. We clear kcov_handle and return. We leave the work on the work_list. 5. Userspace can then retry vhost_dev_set_owner. If that works, then the work gets executed ok eventually. OR Userspace can just close the device. vhost_vsock_dev_release would eventually call vhost_dev_cleanup (vhost_dev_flush won't see a worker so will just return), and that will hit the WARN_ON but we would proceed ok. If I do a memset of the worker, then if userspace were to retry VHOST_SET_OWNER, we would lose the queued work since the work_list would get zero'd. I think it's unlikely this ever happens, but you know best so let me know if this a real issue. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Wed, May 31, 2023 at 11:27:12AM -0500, Mike Christie wrote: On 5/31/23 10:15 AM, Mike Christie wrote: rcu would work for your case and for what Jason had requested. Yeah, so you already have some patches? Do you want to send it to solve this problem? Yeah, I'll break them out and send them later today when I can retest rebased patches. Just one question. Do you core vhost developers consider RCU more complex or switching to READ_ONCE/WRITE_ONCE? I am asking because for this immediate regression fix we could just switch to the latter like below to just fix the crash if we think that is more simple. I think RCU is just a little more complex/invasive because it will have the extra synchronize_rcu calls. Yes, you may be right, in this case we should just need READ_ONCE/WRITE_ONCE if dev->worker is no longer a pointer. diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..03fd47a22a73 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; - if (dev->worker) { + if (READ_ONCE(dev->worker.vtsk)) { init_completion(_event); vhost_work_init(, vhost_flush_work); @@ -247,7 +247,9 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { - if (!dev->worker) + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); + + if (!vtsk) return; if (!test_and_set_bit(VHOST_WORK_QUEUED, >flags)) { @@ -255,8 +257,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(>node, >worker->work_list); - wake_up_process(dev->worker->vtsk->task); + llist_add(>node, >worker.work_list); + wake_up_process(vtsk->task); } } EXPORT_SYMBOL_GPL(vhost_work_queue); @@ -264,7 +266,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return dev->worker && !llist_empty(>worker->work_list); + return !llist_empty(>worker.work_list); } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -468,7 +470,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->umem = NULL; dev->iotlb = NULL; dev->mm = NULL; - dev->worker = NULL; + memset(>worker, 0, sizeof(dev->worker)); dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -542,46 +544,38 @@ static void vhost_detach_mm(struct vhost_dev *dev) static void vhost_worker_free(struct vhost_dev *dev) { - struct vhost_worker *worker = dev->worker; + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); - if (!worker) + if (!vtsk) return; - dev->worker = NULL; - WARN_ON(!llist_empty(>work_list)); - vhost_task_stop(worker->vtsk); - kfree(worker); + vhost_task_stop(vtsk); + WARN_ON(!llist_empty(>worker.work_list)); + WRITE_ONCE(dev->worker.vtsk, NULL); The patch LGTM, I just wonder if we should set dev->worker to zero here, but maybe we don't need to. Thanks, Stefano } static int vhost_worker_create(struct vhost_dev *dev) { - struct vhost_worker *worker; struct vhost_task *vtsk; char name[TASK_COMM_LEN]; int ret; - worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); - if (!worker) - return -ENOMEM; - - dev->worker = worker; - worker->kcov_handle = kcov_common_handle(); - init_llist_head(>work_list); + dev->worker.kcov_handle = kcov_common_handle(); + init_llist_head(>worker.work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); - vtsk = vhost_task_create(vhost_worker, worker, name); + vtsk = vhost_task_create(vhost_worker, >worker, name); if (!vtsk) { ret = -ENOMEM; goto free_worker; } - worker->vtsk = vtsk; + WRITE_ONCE(dev->worker.vtsk, vtsk); vhost_task_start(vtsk); return 0; free_worker: - kfree(worker); - dev->worker = NULL; + WRITE_ONCE(dev->worker.vtsk, NULL); return ret; } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 0308638cdeee..305ec8593d46 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -154,7 +154,7 @@ struct vhost_dev { struct vhost_virtqueue **vqs; int nvqs; struct eventfd_ctx *log_ctx; - struct vhost_worker *worker; + struct vhost_worker worker; struct vhost_iotlb *umem; struct vhost_iotlb *iotlb; spinlock_t iotlb_lock; ___ Virtualization mailing
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/31/23 10:15 AM, Mike Christie wrote: >>> rcu would work for your case and for what Jason had requested. >> Yeah, so you already have some patches? >> >> Do you want to send it to solve this problem? >> > Yeah, I'll break them out and send them later today when I can retest > rebased patches. > Just one question. Do you core vhost developers consider RCU more complex or switching to READ_ONCE/WRITE_ONCE? I am asking because for this immediate regression fix we could just switch to the latter like below to just fix the crash if we think that is more simple. I think RCU is just a little more complex/invasive because it will have the extra synchronize_rcu calls. diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..03fd47a22a73 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; - if (dev->worker) { + if (READ_ONCE(dev->worker.vtsk)) { init_completion(_event); vhost_work_init(, vhost_flush_work); @@ -247,7 +247,9 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { - if (!dev->worker) + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); + + if (!vtsk) return; if (!test_and_set_bit(VHOST_WORK_QUEUED, >flags)) { @@ -255,8 +257,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(>node, >worker->work_list); - wake_up_process(dev->worker->vtsk->task); + llist_add(>node, >worker.work_list); + wake_up_process(vtsk->task); } } EXPORT_SYMBOL_GPL(vhost_work_queue); @@ -264,7 +266,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return dev->worker && !llist_empty(>worker->work_list); + return !llist_empty(>worker.work_list); } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -468,7 +470,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->umem = NULL; dev->iotlb = NULL; dev->mm = NULL; - dev->worker = NULL; + memset(>worker, 0, sizeof(dev->worker)); dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -542,46 +544,38 @@ static void vhost_detach_mm(struct vhost_dev *dev) static void vhost_worker_free(struct vhost_dev *dev) { - struct vhost_worker *worker = dev->worker; + struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk); - if (!worker) + if (!vtsk) return; - dev->worker = NULL; - WARN_ON(!llist_empty(>work_list)); - vhost_task_stop(worker->vtsk); - kfree(worker); + vhost_task_stop(vtsk); + WARN_ON(!llist_empty(>worker.work_list)); + WRITE_ONCE(dev->worker.vtsk, NULL); } static int vhost_worker_create(struct vhost_dev *dev) { - struct vhost_worker *worker; struct vhost_task *vtsk; char name[TASK_COMM_LEN]; int ret; - worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); - if (!worker) - return -ENOMEM; - - dev->worker = worker; - worker->kcov_handle = kcov_common_handle(); - init_llist_head(>work_list); + dev->worker.kcov_handle = kcov_common_handle(); + init_llist_head(>worker.work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); - vtsk = vhost_task_create(vhost_worker, worker, name); + vtsk = vhost_task_create(vhost_worker, >worker, name); if (!vtsk) { ret = -ENOMEM; goto free_worker; } - worker->vtsk = vtsk; + WRITE_ONCE(dev->worker.vtsk, vtsk); vhost_task_start(vtsk); return 0; free_worker: - kfree(worker); - dev->worker = NULL; + WRITE_ONCE(dev->worker.vtsk, NULL); return ret; } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 0308638cdeee..305ec8593d46 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -154,7 +154,7 @@ struct vhost_dev { struct vhost_virtqueue **vqs; int nvqs; struct eventfd_ctx *log_ctx; - struct vhost_worker *worker; + struct vhost_worker worker; struct vhost_iotlb *umem; struct vhost_iotlb *iotlb; spinlock_t iotlb_lock; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/31/23 2:27 AM, Stefano Garzarella wrote: > On Tue, May 30, 2023 at 6:30 PM wrote: >> >> On 5/30/23 11:17 AM, Stefano Garzarella wrote: >>> On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: On 5/30/23 11:00 AM, Stefano Garzarella wrote: > I think it is partially related to commit 6e890c5d5021 ("vhost: use > vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move > worker thread fields to new struct"). Maybe that commits just > highlighted the issue and it was already existing. See my mail about the crash. Agree with your analysis about worker->vtsk not being set yet. It's a bug from my commit where I should have not set it so early or I should be checking for if (dev->worker && worker->vtsk) instead of if (dev->worker) >>> >>> Yes, though, in my opinion the problem may persist depending on how the >>> instructions are reordered. >> >> Ah ok. >> >>> >>> Should we protect dev->worker() with an RCU to be safe? >> >> For those multiple worker patchsets Jason had asked me about supporting >> where we don't have a worker while we are swapping workers around. To do >> that I had added rcu around the dev->worker. I removed it in later patchsets >> because I didn't think anyone would use it. >> >> rcu would work for your case and for what Jason had requested. > > Yeah, so you already have some patches? > > Do you want to send it to solve this problem? > Yeah, I'll break them out and send them later today when I can retest rebased patches. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 6:30 PM wrote: > > On 5/30/23 11:17 AM, Stefano Garzarella wrote: > > On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: > >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: > >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use > >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move > >>> worker thread fields to new struct"). Maybe that commits just > >>> highlighted the issue and it was already existing. > >> > >> See my mail about the crash. Agree with your analysis about worker->vtsk > >> not being set yet. It's a bug from my commit where I should have not set > >> it so early or I should be checking for > >> > >> if (dev->worker && worker->vtsk) > >> > >> instead of > >> > >> if (dev->worker) > > > > Yes, though, in my opinion the problem may persist depending on how the > > instructions are reordered. > > Ah ok. > > > > > Should we protect dev->worker() with an RCU to be safe? > > For those multiple worker patchsets Jason had asked me about supporting > where we don't have a worker while we are swapping workers around. To do > that I had added rcu around the dev->worker. I removed it in later patchsets > because I didn't think anyone would use it. > > rcu would work for your case and for what Jason had requested. Yeah, so you already have some patches? Do you want to send it to solve this problem? Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/30/23 11:17 AM, Stefano Garzarella wrote: > On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: >> On 5/30/23 11:00 AM, Stefano Garzarella wrote: >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move >>> worker thread fields to new struct"). Maybe that commits just >>> highlighted the issue and it was already existing. >> >> See my mail about the crash. Agree with your analysis about worker->vtsk >> not being set yet. It's a bug from my commit where I should have not set >> it so early or I should be checking for >> >> if (dev->worker && worker->vtsk) >> >> instead of >> >> if (dev->worker) > > Yes, though, in my opinion the problem may persist depending on how the > instructions are reordered. Ah ok. > > Should we protect dev->worker() with an RCU to be safe? For those multiple worker patchsets Jason had asked me about supporting where we don't have a worker while we are swapping workers around. To do that I had added rcu around the dev->worker. I removed it in later patchsets because I didn't think anyone would use it. rcu would work for your case and for what Jason had requested. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote: On 5/30/23 11:00 AM, Stefano Garzarella wrote: I think it is partially related to commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move worker thread fields to new struct"). Maybe that commits just highlighted the issue and it was already existing. See my mail about the crash. Agree with your analysis about worker->vtsk not being set yet. It's a bug from my commit where I should have not set it so early or I should be checking for if (dev->worker && worker->vtsk) instead of if (dev->worker) Yes, though, in my opinion the problem may persist depending on how the instructions are reordered. Should we protect dev->worker() with an RCU to be safe? One question about the behavior before my commit though and what we want in the end going forward. Before that patch we would just drop work if vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected? I think so, since we ask the guest to call VHOST_SET_OWNER, before any other command. The call to vhost_work_queue in vhost_vsock_start was only seeing the works queued after VHOST_SET_OWNER. Did you want works queued before that? Yes, for example if an application in the host has tried to connect and is waiting for a timeout, we already have work queued up to flush as soon as we start the device. (See commit 0b841030625c ("vhost: vsock: kick send_pkt worker once device is started")). Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 11:01:11AM -0500, Mike Christie wrote: On 5/30/23 10:58 AM, Mike Christie wrote: On 5/30/23 8:44 AM, Stefano Garzarella wrote: From a first glance, it looks like an issue when we call vhost_work_queue(). @Mike, does that ring any bells since you recently looked at that code? I see the bug. needed to have set the dev->worker after setting worker->vtsk Yes, I came to the same conclusion (see my email sent at the same time :-). like below: diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..7bd95984a501 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev) if (!worker) return -ENOMEM; - dev->worker = worker; worker->kcov_handle = kcov_common_handle(); init_llist_head(>work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev) } worker->vtsk = vtsk; Shoot, oh wait, I think I needed a smp_wmb to always make sure worker->vtask is set before dev->worker or vhost_work_queue could still end up seeing dev->worker set before worker->vtsk right? But should we pair smp_wmb() with an smp_rmb() wherever we check dev->worker? Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/30/23 11:00 AM, Stefano Garzarella wrote: > I think it is partially related to commit 6e890c5d5021 ("vhost: use > vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move > worker thread fields to new struct"). Maybe that commits just > highlighted the issue and it was already existing. See my mail about the crash. Agree with your analysis about worker->vtsk not being set yet. It's a bug from my commit where I should have not set it so early or I should be checking for if (dev->worker && worker->vtsk) instead of if (dev->worker) One question about the behavior before my commit though and what we want in the end going forward. Before that patch we would just drop work if vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected? The call to vhost_work_queue in vhost_vsock_start was only seeing the works queued after VHOST_SET_OWNER. Did you want works queued before that? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/30/23 10:58 AM, Mike Christie wrote: > On 5/30/23 8:44 AM, Stefano Garzarella wrote: >> >> From a first glance, it looks like an issue when we call vhost_work_queue(). >> @Mike, does that ring any bells since you recently looked at that code? > > I see the bug. needed to have set the dev->worker after setting worker->vtsk > like below: > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index a92af08e7864..7bd95984a501 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev) > if (!worker) > return -ENOMEM; > > - dev->worker = worker; > worker->kcov_handle = kcov_common_handle(); > init_llist_head(>work_list); > snprintf(name, sizeof(name), "vhost-%d", current->pid); > @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev) > } > > worker->vtsk = vtsk; Shoot, oh wait, I think I needed a smp_wmb to always make sure worker->vtask is set before dev->worker or vhost_work_queue could still end up seeing dev->worker set before worker->vtsk right? > + dev->worker = worker;> vhost_task_start(vtsk); > return 0; > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 3:44 PM Stefano Garzarella wrote: > > On Tue, May 30, 2023 at 1:24 PM Michael S. Tsirkin wrote: > > > > On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit:933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of > > > git://git.ker.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae528 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 > > > dashboard link: > > > https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU > > > Binutils for Debian) 2.35.2 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > Downloadable assets: > > > disk image: > > > https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz > > > vmlinux: > > > https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz > > > kernel image: > > > https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the > > > commit: > > > Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com > > > > > > general protection fault, probably for non-canonical address > > > 0xdc0e: [#1] PREEMPT SMP KASAN > > > KASAN: null-ptr-deref in range [0x0070-0x0077] > > > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted > > > 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > Google 05/16/2023 > > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 > > > 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 > > > 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > > RSP: 0018:c9000333faf8 EFLAGS: 00010202 > > > RAX: dc00 RBX: RCX: c9000d84d000 > > > RDX: 000e RSI: 841221d7 RDI: 0070 > > > RBP: 88804b6b95b0 R08: 0001 R09: > > > R10: 0001 R11: R12: 88804b6b00b0 > > > R13: R14: 88804b6b95e0 R15: 88804b6b95c8 > > > FS: 7f3b445ec700() GS:8880b980() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 001b2e423000 CR3: 5d734000 CR4: 003506f0 > > > DR0: DR1: DR2: > > > DR3: 003b DR6: 0ff0 DR7: 0400 > > > Call Trace: > > > > > > vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 > > > virtio_transport_send_pkt_info+0x54c/0x820 > > > net/vmw_vsock/virtio_transport_common.c:250 > > > virtio_transport_connect+0xb1/0xf0 > > > net/vmw_vsock/virtio_transport_common.c:813 > > > vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 > > > __sys_connect_file+0x153/0x1a0 net/socket.c:2003 > > > __sys_connect+0x165/0x1a0 net/socket.c:2020 > > > __do_sys_connect net/socket.c:2030 [inline] > > > __se_sys_connect net/socket.c:2027 [inline] > > > __x64_sys_connect+0x73/0xb0 net/socket.c:2027 > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > > RIP: 0033:0x7f3b4388c169 > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 > > > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 > > > ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > > > RSP: 002b:7f3b445ec168 EFLAGS: 0246 ORIG_RAX: 002a > > > RAX: ffda RBX: 7f3b439ac050 RCX: 7f3b4388c169 > > > RDX: 0010 RSI: 2140 RDI: 0004 > > > RBP: 7f3b438e7ca1 R08: R09: > > > R10: R11: 0246 R12: > > > R13: 7f3b43acfb1f R14: 7f3b445ec300 R15: 00022000 > > > > > > Modules linked in: > > > ---[ end trace ]--- > > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 > > > 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 > > > 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > > RSP: 0018:c9000333faf8 EFLAGS: 00010202 > > > RAX: dc00 RBX: RCX: c9000d84d000 > > > RDX: 000e RSI: 841221d7 RDI: 0070 > > > RBP: 88804b6b95b0 R08:
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On 5/30/23 8:44 AM, Stefano Garzarella wrote: > > From a first glance, it looks like an issue when we call vhost_work_queue(). > @Mike, does that ring any bells since you recently looked at that code? I see the bug. needed to have set the dev->worker after setting worker->vtsk like below: diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a92af08e7864..7bd95984a501 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev) if (!worker) return -ENOMEM; - dev->worker = worker; worker->kcov_handle = kcov_common_handle(); init_llist_head(>work_list); snprintf(name, sizeof(name), "vhost-%d", current->pid); @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev) } worker->vtsk = vtsk; + dev->worker = worker; vhost_task_start(vtsk); return 0; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 1:24 PM Michael S. Tsirkin wrote: > > On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit:933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae528 > > kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 > > dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils > > for Debian) 2.35.2 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > Downloadable assets: > > disk image: > > https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz > > vmlinux: > > https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz > > kernel image: > > https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com > > > > general protection fault, probably for non-canonical address > > 0xdc0e: [#1] PREEMPT SMP KASAN > > KASAN: null-ptr-deref in range [0x0070-0x0077] > > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted > > 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 05/16/2023 > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 > > 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 > > 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > RSP: 0018:c9000333faf8 EFLAGS: 00010202 > > RAX: dc00 RBX: RCX: c9000d84d000 > > RDX: 000e RSI: 841221d7 RDI: 0070 > > RBP: 88804b6b95b0 R08: 0001 R09: > > R10: 0001 R11: R12: 88804b6b00b0 > > R13: R14: 88804b6b95e0 R15: 88804b6b95c8 > > FS: 7f3b445ec700() GS:8880b980() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 001b2e423000 CR3: 5d734000 CR4: 003506f0 > > DR0: DR1: DR2: > > DR3: 003b DR6: 0ff0 DR7: 0400 > > Call Trace: > > > > vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 > > virtio_transport_send_pkt_info+0x54c/0x820 > > net/vmw_vsock/virtio_transport_common.c:250 > > virtio_transport_connect+0xb1/0xf0 > > net/vmw_vsock/virtio_transport_common.c:813 > > vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 > > __sys_connect_file+0x153/0x1a0 net/socket.c:2003 > > __sys_connect+0x165/0x1a0 net/socket.c:2020 > > __do_sys_connect net/socket.c:2030 [inline] > > __se_sys_connect net/socket.c:2027 [inline] > > __x64_sys_connect+0x73/0xb0 net/socket.c:2027 > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 > > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > RIP: 0033:0x7f3b4388c169 > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 > > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff > > ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > > RSP: 002b:7f3b445ec168 EFLAGS: 0246 ORIG_RAX: 002a > > RAX: ffda RBX: 7f3b439ac050 RCX: 7f3b4388c169 > > RDX: 0010 RSI: 2140 RDI: 0004 > > RBP: 7f3b438e7ca1 R08: R09: > > R10: R11: 0246 R12: > > R13: 7f3b43acfb1f R14: 7f3b445ec300 R15: 00022000 > > > > Modules linked in: > > ---[ end trace ]--- > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 > > 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 > > 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > > RSP: 0018:c9000333faf8 EFLAGS: 00010202 > > RAX: dc00 RBX: RCX: c9000d84d000 > > RDX: 000e RSI: 841221d7 RDI: 0070 > > RBP: 88804b6b95b0 R08: 0001 R09: > > R10: 0001 R11: R12: 88804b6b00b0 > > R13: R14: 88804b6b95e0 R15: 88804b6b95c8 > > FS: 7f3b445ec700() GS:8880b990() knlGS: > > CS: 0010
Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue
On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae528 > kernel config: https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0 > dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils > for Debian) 2.35.2 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com > > general protection fault, probably for non-canonical address > 0xdc0e: [#1] PREEMPT SMP KASAN > KASAN: null-ptr-deref in range [0x0070-0x0077] > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted > 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 05/16/2023 > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 > 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 > 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > RSP: 0018:c9000333faf8 EFLAGS: 00010202 > RAX: dc00 RBX: RCX: c9000d84d000 > RDX: 000e RSI: 841221d7 RDI: 0070 > RBP: 88804b6b95b0 R08: 0001 R09: > R10: 0001 R11: R12: 88804b6b00b0 > R13: R14: 88804b6b95e0 R15: 88804b6b95c8 > FS: 7f3b445ec700() GS:8880b980() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 001b2e423000 CR3: 5d734000 CR4: 003506f0 > DR0: DR1: DR2: > DR3: 003b DR6: 0ff0 DR7: 0400 > Call Trace: > > vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288 > virtio_transport_send_pkt_info+0x54c/0x820 > net/vmw_vsock/virtio_transport_common.c:250 > virtio_transport_connect+0xb1/0xf0 > net/vmw_vsock/virtio_transport_common.c:813 > vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414 > __sys_connect_file+0x153/0x1a0 net/socket.c:2003 > __sys_connect+0x165/0x1a0 net/socket.c:2020 > __do_sys_connect net/socket.c:2030 [inline] > __se_sys_connect net/socket.c:2027 [inline] > __x64_sys_connect+0x73/0xb0 net/socket.c:2027 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f3b4388c169 > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7f3b445ec168 EFLAGS: 0246 ORIG_RAX: 002a > RAX: ffda RBX: 7f3b439ac050 RCX: 7f3b4388c169 > RDX: 0010 RSI: 2140 RDI: 0004 > RBP: 7f3b438e7ca1 R08: R09: > R10: R11: 0246 R12: > R13: 7f3b43acfb1f R14: 7f3b445ec300 R15: 00022000 > > Modules linked in: > ---[ end trace ]--- > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline] > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248 > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 > 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 > 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9 > RSP: 0018:c9000333faf8 EFLAGS: 00010202 > RAX: dc00 RBX: RCX: c9000d84d000 > RDX: 000e RSI: 841221d7 RDI: 0070 > RBP: 88804b6b95b0 R08: 0001 R09: > R10: 0001 R11: R12: 88804b6b00b0 > R13: R14: 88804b6b95e0 R15: 88804b6b95c8 > FS: 7f3b445ec700() GS:8880b990() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 001b2e428000 CR3: 5d734000 CR4: 003506e0 > DR0: DR1: DR2: > DR3: 003b DR6: 0ff0 DR7: