Hi Jason, Thanks for your response.
On Thu, 18 Feb 2021 at 14:18, Jason Wang <[email protected]> wrote: > Hi Gautam: > On 2021/2/15 9:01 下午, Gautam Dawar wrote: > > Hi Jason/Michael, > > > > I observed a kernel panic while testing vhost-vdpa with Xilinx adapters. > Here are the details for your review: > > > > Problem statement: > > When qemu with vhost-vdpa netdevice is run for the first time, it works > well. But after the VM is powered off, next qemu run causes kernel panic > due to a NULL pointer dereference in irq_bypass_register_producer(). > > > > Root cause analysis: > > When the VM is powered off, vhost_dev_stop() is invoked which in turn > calls vhost_vdpa_reset_device() causing the irq_bypass producers to be > unregistered. > > > > On the next run, when qemu opens the vhost device, the vhost_vdpa_open() > file operation calls vhost_dev_init(). Here, call_ctx->producer memory is > cleared in vhost_vring_call_reset(). > > > > Further, when the virtqueues are initialized by vhost_virtqueue_init(), > vhost_vdpa_setup_vq_irq() again registers the irq_bypass producer for each > virtqueue. As the node member of struct irq_bypass_producer is also > initialized to zero, traversal on the producers list causes crash due to > NULL pointer dereference. > > > Thanks a lot for reporting this issue. > > > > > Fix details: > > > > I think that this issue can be fixed by invoking vhost_vdpa_setup_vq_irq() > only when vhost_vdpa_set_status() includes VIRTIO_CONFIG_S_DRIVER_OK in the > new status value. This way, there won’t be any stale nodes in the > irqbypass module’s producers list which are reset in > vhost_vring_call_reset(). > > > > Patch: > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index > 62a9bb0efc55..fdad94e2fbf9 100644 > > --- a/drivers/vhost/vdpa.c > > +++ b/drivers/vhost/vdpa.c > > @@ -409,7 +409,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa > *v, unsigned int cmd, > > cb.private = NULL; > > } > > ops->set_vq_cb(vdpa, idx, &cb); > > - vhost_vdpa_setup_vq_irq(v, idx); > > break; > > > > case VHOST_SET_VRING_NUM: > > > > We can also track this issue in Bugzilla ticket 21171 ( > https://bugzilla.kernel.org/show_bug.cgi?id=211711) and the complete > patch is attached with this email. > > > So vhost supports to remove or switch eventfd through > vhost_vdpa_vring_ioctl(). So if userspace want to switch to another > eventfd, we should re-do the register and unregister. > GD>> This makes sense. I missed the use case where userspace may want to switch to a different eventfd. I think we need to deal this issue in another way. Can we check whether or > not the producer is initialized before? > > Thanks > GD>> Initialization path is fine but the actual problem lies in the clean-up part. I think the following check is the cause of this issue: static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) if (vq->call_ctx.producer.irq) irq_bypass_unregister_producer(&vq->call_ctx.producer); The above if condition will prevent the de-initialization of the producer nodes corresponding to irq 0 but irq_bypass_unregister_producer() should be called for all valid irq values including zero. Accordingly, following patch is required to fix this issue: diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 62a9bb0efc55..d1c3a33c6239 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -849,7 +849,7 @@ static void vhost_vdpa_clean_irq(struct vhost_vdpa *v) for (i = 0; i < v->nvqs; i++) { vq = &v->vqs[i]; - if (vq->call_ctx.producer.irq) + if (vq->call_ctx.producer.irq >= 0) irq_bypass_unregister_producer(&vq->call_ctx.producer); } } The revised patch (bug211711_fix.patch) is also attached with this email. > > > Regards, > > Gautam Dawar > >
bug211711_fix.patch
Description: Binary data
_______________________________________________ Virtualization mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/virtualization
