Thanks Jason. I've posted the formal patch for this change. Regards, Gautam
On Mon, 22 Feb 2021 at 14:15, Jason Wang <[email protected]> wrote: > > On 2021/2/18 11:15 下午, Gautam Dawar wrote: > > Hi Jason, > > Thanks for your response. > > On Thu, 18 Feb 2021 at 14:18, Jason Wang <[email protected]> wrote: > >> Hi Gautam: >> On 2021/2/15 9:01 下午, Gautam Dawar wrote: >> >> Hi Jason/Michael, >> >> >> >> I observed a kernel panic while testing vhost-vdpa with Xilinx adapters. >> Here are the details for your review: >> >> >> >> Problem statement: >> >> When qemu with vhost-vdpa netdevice is run for the first time, it works >> well. But after the VM is powered off, next qemu run causes kernel panic >> due to a NULL pointer dereference in irq_bypass_register_producer(). >> >> >> >> Root cause analysis: >> >> When the VM is powered off, vhost_dev_stop() is invoked which in turn >> calls vhost_vdpa_reset_device() causing the irq_bypass producers to be >> unregistered. >> >> >> >> On the next run, when qemu opens the vhost device, the vhost_vdpa_open() >> file operation calls vhost_dev_init(). Here, call_ctx->producer memory is >> cleared in vhost_vring_call_reset(). >> >> >> >> Further, when the virtqueues are initialized by vhost_virtqueue_init(), >> vhost_vdpa_setup_vq_irq() again registers the irq_bypass producer for each >> virtqueue. As the node member of struct irq_bypass_producer is also >> initialized to zero, traversal on the producers list causes crash due to >> NULL pointer dereference. >> >> >> Thanks a lot for reporting this issue. >> >> >> >> >> Fix details: >> >> >> >> I think that this issue can be fixed by invoking >> vhost_vdpa_setup_vq_irq() only when vhost_vdpa_set_status() includes >> VIRTIO_CONFIG_S_DRIVER_OK in the new status value. This way, there won’t be >> any stale nodes in the irqbypass module’s producers list which are reset >> in vhost_vring_call_reset(). >> >> >> >> Patch: >> >> >> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index >> 62a9bb0efc55..fdad94e2fbf9 100644 >> >> --- a/drivers/vhost/vdpa.c >> >> +++ b/drivers/vhost/vdpa.c >> >> @@ -409,7 +409,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa >> *v, unsigned int cmd, >> >> cb.private = NULL; >> >> } >> >> ops->set_vq_cb(vdpa, idx, &cb); >> >> - vhost_vdpa_setup_vq_irq(v, idx); >> >> break; >> >> >> >> case VHOST_SET_VRING_NUM: >> >> >> >> We can also track this issue in Bugzilla ticket 21171 ( >> https://bugzilla.kernel.org/show_bug.cgi?id=211711) and the complete >> patch is attached with this email. >> >> >> So vhost supports to remove or switch eventfd through >> vhost_vdpa_vring_ioctl(). So if userspace want to switch to another >> eventfd, we should re-do the register and unregister. >> > GD>> This makes sense. I missed the use case where userspace may want to > switch to a different eventfd. > > > This can happen when interrupt needs to be disabled for some reason (e.g > MSI-X is masked). > > > > I think we need to deal this issue in another way. Can we check whether or >> not the producer is initialized before? >> >> Thanks >> > GD>> Initialization path is fine but the actual problem lies in the > clean-up part. > I think the following check is the cause of this issue: > > static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) > if (vq->call_ctx.producer.irq) > > irq_bypass_unregister_producer(&vq->call_ctx.producer); > > The above if condition will prevent the de-initialization of the producer > nodes corresponding to irq 0 but irq_bypass_unregister_producer() should > be called for all valid irq values including zero. > > Accordingly, following patch is required to fix this issue: > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 62a9bb0efc55..d1c3a33c6239 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -849,7 +849,7 @@ static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) > > for (i = 0; i < v->nvqs; i++) { > vq = &v->vqs[i]; > - if (vq->call_ctx.producer.irq) > + if (vq->call_ctx.producer.irq >= 0) > > irq_bypass_unregister_producer(&vq->call_ctx.producer); > } > } > > > It should work, please post a formal patch for this. > > I will give more thought in the meanwhile since I spot some other defects > on codes for irqbyass usage in vdpa. > > Thanks > > > > The revised patch (bug211711_fix.patch) is also attached with this email. > >> >> >> Regards, >> >> Gautam Dawar >> >>
_______________________________________________ Virtualization mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/virtualization
