Thanks Jason. I've posted the formal patch for this change.

Regards,
Gautam

On Mon, 22 Feb 2021 at 14:15, Jason Wang <[email protected]> wrote:

>
> On 2021/2/18 11:15 下午, Gautam Dawar wrote:
>
> Hi Jason,
>
> Thanks for your response.
>
> On Thu, 18 Feb 2021 at 14:18, Jason Wang <[email protected]> wrote:
>
>> Hi Gautam:
>> On 2021/2/15 9:01 下午, Gautam Dawar wrote:
>>
>> Hi Jason/Michael,
>>
>>
>>
>> I observed a kernel panic while testing vhost-vdpa with Xilinx adapters.
>> Here are the details for your review:
>>
>>
>>
>> Problem statement:
>>
>> When qemu with vhost-vdpa netdevice is run for the first time, it works
>> well. But after the VM is powered off, next qemu run causes kernel panic
>> due to a NULL pointer dereference in irq_bypass_register_producer().
>>
>>
>>
>> Root cause analysis:
>>
>> When the VM is powered off, vhost_dev_stop() is invoked which in turn
>> calls vhost_vdpa_reset_device() causing the irq_bypass producers to be
>> unregistered.
>>
>>
>>
>> On the next run, when qemu opens the vhost device, the vhost_vdpa_open()
>> file operation calls vhost_dev_init(). Here, call_ctx->producer memory is
>> cleared in vhost_vring_call_reset().
>>
>>
>>
>> Further, when the virtqueues are initialized by vhost_virtqueue_init(),
>> vhost_vdpa_setup_vq_irq() again registers the irq_bypass producer for each
>> virtqueue. As the node member of struct irq_bypass_producer is also
>> initialized to zero, traversal on the producers list causes crash due to
>> NULL pointer dereference.
>>
>>
>> Thanks a lot for reporting this issue.
>>
>>
>>
>>
>> Fix details:
>>
>>
>>
>> I think that this issue can be fixed by invoking
>> vhost_vdpa_setup_vq_irq() only when vhost_vdpa_set_status() includes
>> VIRTIO_CONFIG_S_DRIVER_OK in the new status value. This way, there won’t be
>> any stale nodes in the irqbypass  module’s producers list which are reset
>> in vhost_vring_call_reset().
>>
>>
>>
>> Patch:
>>
>>
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index
>> 62a9bb0efc55..fdad94e2fbf9 100644
>>
>> --- a/drivers/vhost/vdpa.c
>>
>> +++ b/drivers/vhost/vdpa.c
>>
>> @@ -409,7 +409,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa
>> *v, unsigned int cmd,
>>
>>                         cb.private = NULL;
>>
>>                 }
>>
>>                 ops->set_vq_cb(vdpa, idx, &cb);
>>
>> -               vhost_vdpa_setup_vq_irq(v, idx);
>>
>>                 break;
>>
>>
>>
>>         case VHOST_SET_VRING_NUM:
>>
>>
>>
>> We can also track this issue in Bugzilla ticket 21171 (
>> https://bugzilla.kernel.org/show_bug.cgi?id=211711)  and the complete
>> patch is attached with this email.
>>
>>
>> So vhost supports to remove or switch eventfd through
>> vhost_vdpa_vring_ioctl(). So if userspace want to switch to another
>> eventfd, we should re-do the register and unregister.
>>
> GD>>  This makes sense. I missed the use case where userspace may want to
> switch to a different eventfd.
>
>
> This can happen when interrupt needs to be disabled for some reason (e.g
> MSI-X is masked).
>
>
>
> I think we need to deal this issue in another way. Can we check whether or
>> not the producer is initialized before?
>>
>> Thanks
>>
> GD>> Initialization path is fine but the actual problem lies in the
> clean-up part.
> I think the following check is the cause of this issue:
>
> static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
>                 if (vq->call_ctx.producer.irq)
>
> irq_bypass_unregister_producer(&vq->call_ctx.producer);
>
> The above if condition will prevent the de-initialization of the producer
> nodes corresponding to irq 0 but  irq_bypass_unregister_producer() should
> be called for all valid irq values including zero.
>
> Accordingly, following patch is required to fix this issue:
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 62a9bb0efc55..d1c3a33c6239 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -849,7 +849,7 @@ static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
>
>         for (i = 0; i < v->nvqs; i++) {
>                 vq = &v->vqs[i];
> -               if (vq->call_ctx.producer.irq)
> +               if (vq->call_ctx.producer.irq >= 0)
>
> irq_bypass_unregister_producer(&vq->call_ctx.producer);
>         }
>  }
>
>
> It should work, please post a formal patch for this.
>
> I will give more thought in the meanwhile since I spot some other defects
> on codes for irqbyass usage in vdpa.
>
> Thanks
>
>
>
> The revised patch (bug211711_fix.patch) is also attached with this email.
>
>>
>>
>> Regards,
>>
>> Gautam Dawar
>>
>>
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to