Re: Kernel panic with vhost-vdpa

Jason Wang Mon, 22 Feb 2021 00:45:18 -0800


On 2021/2/18 11:15 下午, Gautam Dawar wrote:

Hi Jason,


Thanks for your response.

On Thu, 18 Feb 2021 at 14:18, Jason Wang <[email protected]<mailto:[email protected]>> wrote:


    Hi Gautam:

    On 2021/2/15 9:01 下午, Gautam Dawar wrote:


    Hi Jason/Michael,

    I observed a kernel panic while testing vhost-vdpa with Xilinx
    adapters. Here are the details for your review:

    Problem statement:

    When qemu with vhost-vdpa netdevice is run for the first time, it
    works well. But after the VM is powered off, next qemu run causes
    kernel panic due to a NULL pointer dereference in
    irq_bypass_register_producer().

    Root cause analysis:

    When the VM is powered off, vhost_dev_stop() is invoked which in
    turn calls vhost_vdpa_reset_device() causing the irq_bypass
    producers to be unregistered.

    On the next run, when qemu opens the vhost device, the
    vhost_vdpa_open() file operation calls vhost_dev_init(). Here,
    call_ctx->producer memory is cleared in vhost_vring_call_reset().

    Further, when the virtqueues are initialized by
    vhost_virtqueue_init(), vhost_vdpa_setup_vq_irq() again registers
    the irq_bypass producer for each virtqueue. As the node member of
    struct irq_bypass_producer is also initialized to zero, traversal
    on the producers list causes crash due to NULL pointer dereference.


    Thanks a lot for reporting this issue.

    Fix details:

    I think that this issue can be fixed by invoking
    vhost_vdpa_setup_vq_irq() only when vhost_vdpa_set_status()
    includes VIRTIO_CONFIG_S_DRIVER_OK in the new status value. This
    way, there won’t be any stale nodes in the irqbypass  module’s
    producers list which are reset in vhost_vring_call_reset().

    Patch:

    diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index
    62a9bb0efc55..fdad94e2fbf9 100644

    --- a/drivers/vhost/vdpa.c

    +++ b/drivers/vhost/vdpa.c

    @@ -409,7 +409,6 @@ static long vhost_vdpa_vring_ioctl(struct
    vhost_vdpa *v, unsigned int cmd,

    cb.private = NULL;

    }

    ops->set_vq_cb(vdpa, idx, &cb);

    - vhost_vdpa_setup_vq_irq(v, idx);

    break;

    case VHOST_SET_VRING_NUM:

    We can also track this issue in Bugzilla ticket 21171
    (https://bugzilla.kernel.org/show_bug.cgi?id=211711
    <https://bugzilla.kernel.org/show_bug.cgi?id=211711>) and the
    complete patch is attached with this email.


    So vhost supports to remove or switch eventfd through
    vhost_vdpa_vring_ioctl(). So if userspace want to switch to
    another eventfd, we should re-do the register and unregister.

GD>> This makes sense. I missed the use case where userspace may wantto switch to a different eventfd.

This can happen when interrupt needs to be disabled for some reason (e.gMSI-X is masked).


    I think we need to deal this issue in another way. Can we check
    whether or not the producer is initialized before?

    Thanks

GD>> Initialization path is fine but the actual problem lies in theclean-up part.

I think the following check is the cause of this issue:

static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
if (vq->call_ctx.producer.irq)
irq_bypass_unregister_producer(&vq->call_ctx.producer);

The above if condition will prevent the de-initialization of theproducer nodes corresponding to irq 0 butirq_bypass_unregister_producer() should be called for all valid irqvalues including zero.


Accordingly, following patch is required to fix this issue:

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 62a9bb0efc55..d1c3a33c6239 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -849,7 +849,7 @@ static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)

        for (i = 0; i < v->nvqs; i++) {
                vq = &v->vqs[i];
-               if (vq->call_ctx.producer.irq)
+               if (vq->call_ctx.producer.irq >= 0)
irq_bypass_unregister_producer(&vq->call_ctx.producer);
        }
 }



It should work, please post a formal patch for this.

I will give more thought in the meanwhile since I spot some otherdefects on codes for irqbyass usage in vdpa.


Thanks


The revised patch (bug211711_fix.patch) is also attached with this email.

    Regards,

    Gautam Dawar

_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Kernel panic with vhost-vdpa

Reply via email to