Re: [PATCH V2] virtio-mmio: harden interrupt

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 2:55 PM Michael S. Tsirkin  wrote:
>
> On Thu, Dec 09, 2021 at 10:06:34AM +0800, Jason Wang wrote:
> > On Thu, Dec 9, 2021 at 4:27 AM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote:
> > > > This patch tries to make sure the virtio interrupt handler for MMIO
> > > > won't be called after a reset and before virtio_device_ready(). We
> > > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a
> > > > new intr_soft_enabled variable and toggle it during in
> > > > vm_disable/enable_interrupts(). The MMIO interrupt handler will check
> > > > intr_soft_enabled before processing the actual interrupt.
> > > >
> > > > Signed-off-by: Jason Wang 
> > > > ---
> > > > Changes since V1:
> > > > - Silent compling warnings
> > > >  drivers/virtio/virtio_mmio.c | 37 
> > > >  1 file changed, 37 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > > index 56128b9c46eb..c517afdd2cc5 100644
> > > > --- a/drivers/virtio/virtio_mmio.c
> > > > +++ b/drivers/virtio/virtio_mmio.c
> > > > @@ -90,6 +90,7 @@ struct virtio_mmio_device {
> > > >   /* a list of queues so we can dispatch IRQs */
> > > >   spinlock_t lock;
> > > >   struct list_head virtqueues;
> > > > + bool intr_soft_enabled;
> > > >  };
> > > >
> > > >  struct virtio_mmio_vq_info {
> > > > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info {
> > > >   struct list_head node;
> > > >  };
> > > >
> > > > +/* disable irq handlers */
> > > > +static void vm_disable_cbs(struct virtio_device *vdev)
> > > > +{
> > > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > > > + int irq = platform_get_irq(vm_dev->pdev, 0);
> > > >
> > > > + /*
> > > > +  * The below synchronize() guarantees that any
> > > > +  * interrupt for this line arriving after
> > > > +  * synchronize_irq() has completed is guaranteed to see
> > > > +  * intx_soft_enabled == false.
> > > > +  */
> > > > + WRITE_ONCE(vm_dev->intr_soft_enabled, false);
> > > > + synchronize_irq(irq);
> > > > +}
> > > > +
> > > > +/* enable irq handlers */
> > > > +static void vm_enable_cbs(struct virtio_device *vdev)
> > > > +{
> > > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > > > + int irq = platform_get_irq(vm_dev->pdev, 0);
> > > > +
> > > > + disable_irq(irq);
> > > > + /*
> > > > +  * The above disable_irq() provides TSO ordering and
> > > > +  * as such promotes the below store to store-release.
> > > > +  */
> > > > + WRITE_ONCE(vm_dev->intr_soft_enabled, true);
> > > > + enable_irq(irq);
> > > > + return;
> > > > +}
> > > >
> > > >  /* Configuration interface */
> > > >
> > > > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev)
> > > >
> > > >   /* 0 status means a reset. */
> > > >   writel(0, vm_dev->base + VIRTIO_MMIO_STATUS);
> > >
> > > There was a discussion about reading status to make sure it is clear.
> > > The spec says we should, this can't hurt as a further hardening measure.
> > > In fact, let's do it in the core maybe? Spec says it applies to all
> > > devices ...
> >
> > We can do that, but I'm not sure if we break some existing device.
>
> Hmm. Have anything specific in mind?

No, I can send a patch to do that.

>
> > >
> > > > + /* Disable VQ/configuration callbacks. */
> > > > + vm_disable_cbs(vdev);
> > > >  }
> > > >
> > > >
> > > > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void 
> > > > *opaque)
> > > >   unsigned long flags;
> > > >   irqreturn_t ret = IRQ_NONE;
> > > >
> > > > + if (!READ_ONCE(vm_dev->intr_soft_enabled))
> > > > + return IRQ_NONE;
> > > > +
> > >
> > > So if the write is seen before reset happened (should not happen, but we
> > > are talking a buggy device) then it won't be acknowledged and device
> > > will keep pulling the interrupt. I think as long as we are hardening
> > > this, let's go the full mile and try to avoid DoS if we can, do the
> > > check before invoking the callback, but do not skip the read.
> > > Whether to still return IRQ_NONE is a good question.
> >
> > Did you mean something like this:
> >
> > /* Read and acknowledge interrupts */
> > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
> >
> > if (status)
> > ret = IRQ_HANDLED;
> >
> >if (!READ_ONCE(vm_dev->intr_soft_enabled))
> >return ret;
> >
> > Thanks
>
> Maybe. Or is
>
> if (!READ_ONCE(vm_dev->intr_soft_enabled))
> return IRQ_NONE;
>
> better here?

Yes, I just paste part of the code, the ret is initialized to IRQ_NONE.

Thanks

>
>
> > >
> > >
> > >
> > >
> > > >   

Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 3:07 PM Eli Cohen  wrote:
>
> On Thu, Dec 09, 2021 at 01:33:01PM +0800, Jason Wang wrote:
> > On Thu, Dec 9, 2021 at 4:14 AM Eli Cohen  wrote:
> > >
> > > Provide an interface to read the negotiated features. This is needed
> > > when building the netlink message in vdpa_dev_net_config_fill().
> > >
> > > Also fix the implementation of vdpa_dev_net_config_fill() to use the
> > > negotiated features instead of the device features.
> > >
> > > To make APIs clearer, make the following name changes to struct
> > > vdpa_config_ops so they better describe their operations:
> > >
> > > get_features -> get_device_features
> > > set_features -> set_driver_features

[...]

> > > + * @get_driver_features:   Get virtio features in action
> >
> > Maybe "Get virtio driver features .." is better.
>
> I hope the name does not become too long.
> Which one would you favor?
>
> get_vio_driver_features
> get_virtio_drv_features
> get_virtio_driver_features

The name is fine, I mean the comment might be

"Get the virtio driver features in action"

Thanks

>
> >
> > Thanks
> >
> > > + * @vdev: vdpa device
> > > + * Returns the virtio features accepted
> > >   * @set_config_cb: Set the config interrupt callback
> > >   * @vdev: vdpa device
> > >   * @cb: virtio-vdev interrupt callback 
> > > structure
> > > @@ -276,8 +279,9 @@ struct vdpa_config_ops {
> > >
> > > /* Device ops */
> > > u32 (*get_vq_align)(struct vdpa_device *vdev);
> > > -   u64 (*get_features)(struct vdpa_device *vdev);
> > > -   int (*set_features)(struct vdpa_device *vdev, u64 features);
> > > +   u64 (*get_device_features)(struct vdpa_device *vdev);
> > > +   int (*set_driver_features)(struct vdpa_device *vdev, u64 
> > > features);
> > > +   u64 (*get_driver_features)(struct vdpa_device *vdev);
> > > void (*set_config_cb)(struct vdpa_device *vdev,
> > >   struct vdpa_callback *cb);
> > > u16 (*get_vq_num_max)(struct vdpa_device *vdev);
> > > @@ -395,7 +399,7 @@ static inline int vdpa_set_features(struct 
> > > vdpa_device *vdev, u64 features)
> > > const struct vdpa_config_ops *ops = vdev->config;
> > >
> > > vdev->features_valid = true;
> > > -   return ops->set_features(vdev, features);
> > > +   return ops->set_driver_features(vdev, features);
> > >  }
> > >
> > >  void vdpa_get_config(struct vdpa_device *vdev, unsigned int offset,
> > > --
> > > 2.33.1
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 3:09 PM Eli Cohen  wrote:
>
> On Thu, Dec 09, 2021 at 01:44:56PM +0800, Jason Wang wrote:
> > On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen  wrote:
> > >
> > > Add netlink attribute and callback function to query the control VQ
> > > index of a device.
> > >
> > > Example:
> > >
> > > $ vdpa dev config show vdpa-a
> > >   vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 
> > > 5 \
> > >   mtu 9000 ctrl_vq_idx 10
> >
> >
> > I still wonder about the motivation for this.
> To be able to show the stats for CVQ.

Right.

>
> > And as discussed, the
> > ctrl_vq_idx varies depending on whether MQ is negotiated.
>
> I handle this according to the spec and it depends on MQ.

Yes, but there could be a chance that the cvq index changes after the
vdpa dev config show.

Thanks

>
> >
> > Thanks
> >
> > >
> > > Signed-off-by: Eli Cohen 
> > > ---
> > > v0 -> v1:
> > > 1. Use logic defined in the spec to deduce virtqueue index.
> > >
> > >  drivers/vdpa/vdpa.c   | 25 +
> > >  include/uapi/linux/vdpa.h |  1 +
> > >  2 files changed, 26 insertions(+)
> > >
> > > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> > > index 3bf016e03512..b4d4b8a7ca4e 100644
> > > --- a/drivers/vdpa/vdpa.c
> > > +++ b/drivers/vdpa/vdpa.c
> > > @@ -712,6 +712,27 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff 
> > > *msg, struct netlink_callba
> > > return msg->len;
> > >  }
> > >
> > > +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev,
> > > +struct sk_buff *msg,
> > > +struct virtio_net_config *config,
> > > +u64 features)
> > > +{
> > > +   u16 N;
> > > +
> > > +   /* control VQ index, if available, is deduced according to the 
> > > logic
> > > +* described in the virtio spec in section 5.1.2
> > > +*/
> > > +   if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> > > +   return 0;
> > > +
> > > +   if (features & BIT_ULL(VIRTIO_NET_F_MQ))
> > > +   N = le16_to_cpu(config->max_virtqueue_pairs);
> > > +   else
> > > +   N = 1;
> > > +
> > > +   return nla_put_u16(msg, VDPA_ATTR_DEV_CTRL_VQ_IDX, 2 * N);
> > > +}
> > > +
> > >  static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
> > >struct sk_buff *msg, u64 features,
> > >const struct virtio_net_config 
> > > *config)
> > > @@ -730,6 +751,7 @@ static int vdpa_dev_net_config_fill(struct 
> > > vdpa_device *vdev, struct sk_buff *ms
> > > struct virtio_net_config config = {};
> > > u64 features;
> > > u16 val_u16;
> > > +   int err;
> > >
> > > vdpa_get_config(vdev, 0, , sizeof(config));
> > >
> > > @@ -746,6 +768,9 @@ static int vdpa_dev_net_config_fill(struct 
> > > vdpa_device *vdev, struct sk_buff *ms
> > > return -EMSGSIZE;
> > >
> > > features = vdev->config->get_driver_features(vdev);
> > > +   err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, , features);
> > > +   if (err)
> > > +   return err;
> > >
> > > return vdpa_dev_net_mq_config_fill(vdev, msg, features, );
> > >  }
> > > diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
> > > index a252f06f9dfd..2e3a7f89f42d 100644
> > > --- a/include/uapi/linux/vdpa.h
> > > +++ b/include/uapi/linux/vdpa.h
> > > @@ -34,6 +34,7 @@ enum vdpa_attr {
> > > VDPA_ATTR_DEV_MAX_VQS,  /* u32 */
> > > VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
> > > VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
> > > +   VDPA_ATTR_DEV_CTRL_VQ_IDX,  /* u16 */
> > >
> > > VDPA_ATTR_DEV_NET_CFG_MACADDR,  /* binary */
> > > VDPA_ATTR_DEV_NET_STATUS,   /* u8 */
> > > --
> > > 2.33.1
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2] virtio-mmio: harden interrupt

2021-12-08 Thread Michael S. Tsirkin
On Thu, Dec 09, 2021 at 10:06:34AM +0800, Jason Wang wrote:
> On Thu, Dec 9, 2021 at 4:27 AM Michael S. Tsirkin  wrote:
> >
> > On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote:
> > > This patch tries to make sure the virtio interrupt handler for MMIO
> > > won't be called after a reset and before virtio_device_ready(). We
> > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a
> > > new intr_soft_enabled variable and toggle it during in
> > > vm_disable/enable_interrupts(). The MMIO interrupt handler will check
> > > intr_soft_enabled before processing the actual interrupt.
> > >
> > > Signed-off-by: Jason Wang 
> > > ---
> > > Changes since V1:
> > > - Silent compling warnings
> > >  drivers/virtio/virtio_mmio.c | 37 
> > >  1 file changed, 37 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > index 56128b9c46eb..c517afdd2cc5 100644
> > > --- a/drivers/virtio/virtio_mmio.c
> > > +++ b/drivers/virtio/virtio_mmio.c
> > > @@ -90,6 +90,7 @@ struct virtio_mmio_device {
> > >   /* a list of queues so we can dispatch IRQs */
> > >   spinlock_t lock;
> > >   struct list_head virtqueues;
> > > + bool intr_soft_enabled;
> > >  };
> > >
> > >  struct virtio_mmio_vq_info {
> > > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info {
> > >   struct list_head node;
> > >  };
> > >
> > > +/* disable irq handlers */
> > > +static void vm_disable_cbs(struct virtio_device *vdev)
> > > +{
> > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > > + int irq = platform_get_irq(vm_dev->pdev, 0);
> > >
> > > + /*
> > > +  * The below synchronize() guarantees that any
> > > +  * interrupt for this line arriving after
> > > +  * synchronize_irq() has completed is guaranteed to see
> > > +  * intx_soft_enabled == false.
> > > +  */
> > > + WRITE_ONCE(vm_dev->intr_soft_enabled, false);
> > > + synchronize_irq(irq);
> > > +}
> > > +
> > > +/* enable irq handlers */
> > > +static void vm_enable_cbs(struct virtio_device *vdev)
> > > +{
> > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > > + int irq = platform_get_irq(vm_dev->pdev, 0);
> > > +
> > > + disable_irq(irq);
> > > + /*
> > > +  * The above disable_irq() provides TSO ordering and
> > > +  * as such promotes the below store to store-release.
> > > +  */
> > > + WRITE_ONCE(vm_dev->intr_soft_enabled, true);
> > > + enable_irq(irq);
> > > + return;
> > > +}
> > >
> > >  /* Configuration interface */
> > >
> > > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev)
> > >
> > >   /* 0 status means a reset. */
> > >   writel(0, vm_dev->base + VIRTIO_MMIO_STATUS);
> >
> > There was a discussion about reading status to make sure it is clear.
> > The spec says we should, this can't hurt as a further hardening measure.
> > In fact, let's do it in the core maybe? Spec says it applies to all
> > devices ...
> 
> We can do that, but I'm not sure if we break some existing device.

Hmm. Have anything specific in mind?

> >
> > > + /* Disable VQ/configuration callbacks. */
> > > + vm_disable_cbs(vdev);
> > >  }
> > >
> > >
> > > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
> > >   unsigned long flags;
> > >   irqreturn_t ret = IRQ_NONE;
> > >
> > > + if (!READ_ONCE(vm_dev->intr_soft_enabled))
> > > + return IRQ_NONE;
> > > +
> >
> > So if the write is seen before reset happened (should not happen, but we
> > are talking a buggy device) then it won't be acknowledged and device
> > will keep pulling the interrupt. I think as long as we are hardening
> > this, let's go the full mile and try to avoid DoS if we can, do the
> > check before invoking the callback, but do not skip the read.
> > Whether to still return IRQ_NONE is a good question.
> 
> Did you mean something like this:
> 
> /* Read and acknowledge interrupts */
> status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
> 
> if (status)
> ret = IRQ_HANDLED;
> 
>if (!READ_ONCE(vm_dev->intr_soft_enabled))
>return ret;
> 
> Thanks

Maybe. Or is

if (!READ_ONCE(vm_dev->intr_soft_enabled))
return IRQ_NONE;

better here?


> >
> >
> >
> >
> > >   /* Read and acknowledge interrupts */
> > >   status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> > >   writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
> > > @@ -529,6 +565,7 @@ static bool vm_get_shm_region(struct virtio_device 
> > > *vdev,
> > >  }
> > >
> > >  static const struct virtio_config_ops virtio_mmio_config_ops = {
> > > + .enable_cbs = vm_enable_cbs,
> > >   .get= 

Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq

2021-12-08 Thread Michael S. Tsirkin
On Thu, Dec 09, 2021 at 11:00:55AM +0800, 王贇 wrote:
> 
> 
> 在 2021/12/9 上午8:19, Michael S. Tsirkin 写道:
> [snip]
> > > > > > > 
> > > > > Hi, Michael
> > > > > 
> > > > > Thanks for the comment, unfortunately modify device is not an option 
> > > > > for us
> > > > > :-(
> > > > > 
> > > > > Is there any idea on how to solve this issue properly?
> > > > > 
> > > > > Regards,
> > > > > Michael Wang
> > > > 
> > > > By the way, there is a bug in the error message. Want to fix that?
> > > 
> > > Could you please provide more detail about the bug? We'd like to help 
> > > fixing
> > > it :-)
> > > 
> > > Besides, I've checked that patch but it can't address our issue, we 
> > > actually
> > > have this legacy pci device on arm platform, and the memory layout is
> > > unfriendly since allocation rarely providing page-address below 44bit, we
> > > understand the virtio-iommu case should not do force dma, while we don't
> > > have that so it's just working fine.
> > > 
> > > Regards,
> > > Michael Wang
> > 
> > BTW is it just the ring that's at issue?
> 
> Yes, the dma address for ring allocated as page can't fit the requirement.
> 
> > Figuring out we have this problematic config and then allocating just
> > the ring from coherent memory seems more palatable.
> 
> Agree, I'm also wondering why can't we force alloc 44bit-pfn page to fit the
> requirement? I mean if there are such pages, we should use them firstly as
> dma address for legacy devices, and only fail when there are no such pages
> at all, but can't find existing API to alloc page with such requirement...
> anyway.
> 
> > 
> > But please note we still need to detect config with a virtual iommu (can
> > be any kind not just virtio-iommu, smmu, vtd are all affected) and
> > disable the hacks. This is what the new DMA API I suggested would do.
> 
> Fair enough, any more details about the design of new API?
> 
> Regards,
> Michael Wang


The idea was that on some systems any DMA address is also a
physical address (as in the case of e.g. bounce buffers) and
then it can be used without VIRTIO_F_ACCESS_PLATFORM.

> 
> > 
> > 
> > > > 
> > > > 
> > > > > > 
> > > > > > > ---
> > > > > > > drivers/virtio/virtio_pci_legacy.c | 10 ++
> > > > > > > drivers/virtio/virtio_ring.c   |  3 +++
> > > > > > > include/linux/virtio.h |  1 +
> > > > > > > 3 files changed, 14 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > index d62e983..11f2ebf 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct 
> > > > > > > virtio_pci_device
> > > > > > > *vp_dev)
> > > > > > >   vp_dev->setup_vq = setup_vq;
> > > > > > >   vp_dev->del_vq = del_vq;
> > > > > > > 
> > > > > > > + /*
> > > > > > > +  * The legacy pci device requre 32bit-pfn vq,
> > > > > > > +  * or setup_vq() will failed.
> > > > > > > +  *
> > > > > > > +  * Thus we make sure vring_use_dma_api() will
> > > > > > > +  * return true during the allocation by marking
> > > > > > > +  * force_dma here.
> > > > > > > +  */
> > > > > > > + vp_dev->vdev.force_dma = true;
> > > > > > > +
> > > > > > >   return 0;
> > > > > > > 
> > > > > > > err_iomap:
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > > > b/drivers/virtio/virtio_ring.c
> > > > > > > index 3035bb6..6562e01 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -245,6 +245,9 @@ static inline bool 
> > > > > > > virtqueue_use_indirect(struct
> > > > > > > virtqueue *_vq,
> > > > > > > 
> > > > > > > static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > > > > {
> > > > > > > + if (vdev->force_dma)
> > > > > > > + return true;
> > > > > > > +
> > > > > > >   if (!virtio_has_dma_quirk(vdev))
> > > > > > >   return true;
> > > > > > > 
> > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > index 41edbc0..a4eb29d 100644
> > > > > > > --- a/include/linux/virtio.h
> > > > > > > +++ b/include/linux/virtio.h
> > > > > > > @@ -109,6 +109,7 @@ struct virtio_device {
> > > > > > >   bool failed;
> > > > > > >   bool config_enabled;
> > > > > > >   bool config_change_pending;
> > > > > > > + bool force_dma;
> > > > > > >   spinlock_t config_lock;
> > > > > > >   spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > >   struct device dev;
> > > > > > > -- 
> > > > > > > 1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq

2021-12-08 Thread Michael S. Tsirkin
On Thu, Dec 09, 2021 at 11:21:36AM +0800, 王贇 wrote:
> 
> 
> 在 2021/12/8 下午7:08, Michael S. Tsirkin 写道:
> [snip]
> > > > > 
> > > > > Hi, Michael
> > > > > 
> > > > > Thanks for the comment, unfortunately modify device is not an option 
> > > > > for us
> > > > > :-(
> > > > > 
> > > > > Is there any idea on how to solve this issue properly?
> > > > > 
> > > > > Regards,
> > > > > Michael Wang
> > > > 
> > > > By the way, there is a bug in the error message. Want to fix that?
> > > 
> > > Could you please provide more detail about the bug? We'd like to help 
> > > fixing
> > > it :-)
> > 
> > virtio-pci :14:00.0: platform bug: legacy virtio-mmio must ...
> > 
> > should be virtio-pci not virtio-mmio
> 
> Patch on the way~
> 
> > 
> > 
> > 
> > > Besides, I've checked that patch but it can't address our issue, we 
> > > actually
> > > have this legacy pci device on arm platform, and the memory layout is
> > > unfriendly since allocation rarely providing page-address below 44bit, we
> > > understand the virtio-iommu case should not do force dma, while we don't
> > > have that so it's just working fine.
> > > 
> > > Regards,
> > > Michael Wang
> > 
> > Hmm wait a sec is it a physical device or a hypervisor?
> > If a physical one then doesn't it need VIRTIO_F_ORDER_PLATFORM
> > on ARM?
> 
> The PCI device is virtual, I can't see how VIRTIO_F_ORDER_PLATFORM help
> address this issue, legacy pci config is 32bit but it's 36, seems like will
> never be included?
> 
> Regards,
> Michael Wang

Oh, if the device is virtual then I think you should just update it please.
virtio 0.X is architecturally limited to small VMs,
if your hypervisor supports more it should emulate virtio 1.0.



> > 
> > 
> > 
> > > > 
> > > > 
> > > > > > 
> > > > > > > ---
> > > > > > > drivers/virtio/virtio_pci_legacy.c | 10 ++
> > > > > > > drivers/virtio/virtio_ring.c   |  3 +++
> > > > > > > include/linux/virtio.h |  1 +
> > > > > > > 3 files changed, 14 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > index d62e983..11f2ebf 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct 
> > > > > > > virtio_pci_device
> > > > > > > *vp_dev)
> > > > > > >   vp_dev->setup_vq = setup_vq;
> > > > > > >   vp_dev->del_vq = del_vq;
> > > > > > > 
> > > > > > > + /*
> > > > > > > +  * The legacy pci device requre 32bit-pfn vq,
> > > > > > > +  * or setup_vq() will failed.
> > > > > > > +  *
> > > > > > > +  * Thus we make sure vring_use_dma_api() will
> > > > > > > +  * return true during the allocation by marking
> > > > > > > +  * force_dma here.
> > > > > > > +  */
> > > > > > > + vp_dev->vdev.force_dma = true;
> > > > > > > +
> > > > > > >   return 0;
> > > > > > > 
> > > > > > > err_iomap:
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > > > b/drivers/virtio/virtio_ring.c
> > > > > > > index 3035bb6..6562e01 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -245,6 +245,9 @@ static inline bool 
> > > > > > > virtqueue_use_indirect(struct
> > > > > > > virtqueue *_vq,
> > > > > > > 
> > > > > > > static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > > > > {
> > > > > > > + if (vdev->force_dma)
> > > > > > > + return true;
> > > > > > > +
> > > > > > >   if (!virtio_has_dma_quirk(vdev))
> > > > > > >   return true;
> > > > > > > 
> > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > index 41edbc0..a4eb29d 100644
> > > > > > > --- a/include/linux/virtio.h
> > > > > > > +++ b/include/linux/virtio.h
> > > > > > > @@ -109,6 +109,7 @@ struct virtio_device {
> > > > > > >   bool failed;
> > > > > > >   bool config_enabled;
> > > > > > >   bool config_change_pending;
> > > > > > > + bool force_dma;
> > > > > > >   spinlock_t config_lock;
> > > > > > >   spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > >   struct device dev;
> > > > > > > -- 
> > > > > > > 1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods

2021-12-08 Thread Christoph Hellwig
These methods indirect the actual DAX read/write path.  In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.

Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
special variants, then use them everywhere as they fall back to the plain
ones on s390 anyway and remove an indirect call from the read/write path
as well as a lot of boilerplate code.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c   | 36 ++--
 drivers/md/dm-linear.c| 20 -
 drivers/md/dm-log-writes.c| 80 ---
 drivers/md/dm-stripe.c| 20 -
 drivers/md/dm.c   | 50 --
 drivers/nvdimm/pmem.c | 20 -
 drivers/s390/block/dcssblk.c  | 14 --
 fs/dax.c  |  5 ---
 fs/fuse/virtio_fs.c   | 19 +
 include/linux/dax.h   |  9 ++--
 include/linux/device-mapper.h |  4 --
 11 files changed, 37 insertions(+), 240 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e81d5ee57390f..ff676a07480c8 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -105,6 +105,10 @@ enum dax_device_flags {
DAXDEV_WRITE_CACHE,
/* flag to check if device supports synchronous flush */
DAXDEV_SYNC,
+   /* do not use uncached operations to write data */
+   DAXDEV_CACHED,
+   /* do not use mcsafe operations to read data */
+   DAXDEV_NOMCSAFE,
 };
 
 /**
@@ -146,9 +150,15 @@ size_t dax_copy_from_iter(struct dax_device *dax_dev, 
pgoff_t pgoff, void *addr,
if (!dax_alive(dax_dev))
return 0;
 
-   return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i);
+   /*
+* The userspace address for the memory copy has already been validated
+* via access_ok() in vfs_write, so use the 'no check' version to bypass
+* the HARDENED_USERCOPY overhead.
+*/
+   if (test_bit(DAXDEV_CACHED, _dev->flags))
+   return _copy_from_iter(addr, bytes, i);
+   return _copy_from_iter_flushcache(addr, bytes, i);
 }
-EXPORT_SYMBOL_GPL(dax_copy_from_iter);
 
 size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
size_t bytes, struct iov_iter *i)
@@ -156,9 +166,15 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, 
pgoff_t pgoff, void *addr,
if (!dax_alive(dax_dev))
return 0;
 
-   return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i);
+   /*
+* The userspace address for the memory copy has already been validated
+* via access_ok() in vfs_red, so use the 'no check' version to bypass
+* the HARDENED_USERCOPY overhead.
+*/
+   if (test_bit(DAXDEV_NOMCSAFE, _dev->flags))
+   return _copy_to_iter(addr, bytes, i);
+   return _copy_mc_to_iter(addr, bytes, i);
 }
-EXPORT_SYMBOL_GPL(dax_copy_to_iter);
 
 int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
size_t nr_pages)
@@ -220,6 +236,18 @@ void set_dax_synchronous(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(set_dax_synchronous);
 
+void set_dax_cached(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_CACHED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_cached);
+
+void set_dax_nomcsafe(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_NOMCSAFE, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 90de42f6743ac..1b97a11d71517 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -180,22 +180,6 @@ static long linear_dax_direct_access(struct dm_target *ti, 
pgoff_t pgoff,
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
-static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
-   void *addr, size_t bytes, struct iov_iter *i)
-{
-   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
-
-   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
-static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
-   void *addr, size_t bytes, struct iov_iter *i)
-{
-   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
-
-   return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
@@ -206,8 +190,6 @@ static int linear_dax_zero_page_range(struct dm_target *ti, 
pgoff_t pgoff,
 
 #else
 #define linear_dax_direct_access NULL
-#define linear_dax_copy_from_iter NULL
-#define linear_dax_copy_to_iter NULL
 #define linear_dax_zero_page_range NULL
 #endif
 
@@ -225,8 +207,6 @@ static struct target_type linear_target = {
  

[PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()

2021-12-08 Thread Christoph Hellwig
These two wrappers are never used.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvdimm/pmem.c |  4 ++--
 include/linux/uio.h   | 20 +---
 2 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4190c8c46ca88..8294f1c701baa 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -302,8 +302,8 @@ static long pmem_dax_direct_access(struct dax_device 
*dax_dev,
 }
 
 /*
- * Use the 'no check' versions of copy_from_iter_flushcache() and
- * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
+ * Use the 'no check' versions of _copy_from_iter_flushcache() and
+ * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
  * checking, both file offset and device offset, is handled by
  * dax_iomap_actor()
  */
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 6350354f97e90..494d552c1d663 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -196,7 +196,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, 
struct iov_iter *i)
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 /*
  * Note, users like pmem that depend on the stricter semantics of
- * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
+ * _copy_from_iter_flushcache() than _copy_from_iter_nocache() must check for
  * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
  * destination is flushed from the cache on return.
  */
@@ -211,24 +211,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, 
struct iov_iter *i);
 #define _copy_mc_to_iter _copy_to_iter
 #endif
 
-static __always_inline __must_check
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
-{
-   if (unlikely(!check_copy_size(addr, bytes, false)))
-   return 0;
-   else
-   return _copy_from_iter_flushcache(addr, bytes, i);
-}
-
-static __always_inline __must_check
-size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
-{
-   if (unlikely(!check_copy_size(addr, bytes, true)))
-   return 0;
-   else
-   return _copy_mc_to_iter(addr, bytes, i);
-}
-
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
 unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag

2021-12-08 Thread Christoph Hellwig
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
just let the drivers call set_dax_synchronous directly.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/bus.c| 3 ++-
 drivers/dax/super.c  | 6 +-
 drivers/md/dm.c  | 2 +-
 drivers/nvdimm/pmem.c| 7 +++
 drivers/s390/block/dcssblk.c | 4 ++--
 fs/fuse/virtio_fs.c  | 2 +-
 include/linux/dax.h  | 8 ++--
 7 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6683d42c32c56..da2a14d096d29 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1324,11 +1324,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
 * No dax_operations since there is no access to this device outside of
 * mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
}
+   set_dax_synchronous(dax_dev);
 
/* a device_dax instance is dead while the driver is not attached */
kill_dax(dax_dev);
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e18155f43a635..e81d5ee57390f 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -345,8 +345,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
return dax_dev;
 }
 
-struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
-   unsigned long flags)
+struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
 {
struct dax_device *dax_dev;
dev_t devt;
@@ -366,9 +365,6 @@ struct dax_device *alloc_dax(void *private, const struct 
dax_operations *ops,
 
dax_dev->ops = ops;
dax_dev->private = private;
-   if (flags & DAXDEV_F_SYNC)
-   set_dax_synchronous(dax_dev);
-
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4e997c02bb0a0..f4b972af10928 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1765,7 +1765,7 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_FS_DAX)) {
-   md->dax_dev = alloc_dax(md, _dax_ops, 0);
+   md->dax_dev = alloc_dax(md, _dax_ops);
if (IS_ERR(md->dax_dev)) {
md->dax_dev = NULL;
goto bad;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 8294f1c701baa..85b3339286bd8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -402,7 +402,6 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
-   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -495,13 +494,13 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _range);
disk->bb = >bb;
 
-   if (is_nvdimm_sync(nd_region))
-   flags = DAXDEV_F_SYNC;
-   dax_dev = alloc_dax(pmem, _dax_ops, flags);
+   dax_dev = alloc_dax(pmem, _dax_ops);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto out;
}
+   if (is_nvdimm_sync(nd_region))
+   set_dax_synchronous(dax_dev);
rc = dax_add_host(dax_dev, disk);
if (rc)
goto out_cleanup_dax;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index e65e83764d1ce..10823debc09bd 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -686,13 +686,13 @@ dcssblk_add_store(struct device *dev, struct 
device_attribute *attr, const char
if (rc)
goto put_dev;
 
-   dev_info->dax_dev = alloc_dax(dev_info, _dax_ops,
-   DAXDEV_F_SYNC);
+   dev_info->dax_dev = alloc_dax(dev_info, _dax_ops);
if (IS_ERR(dev_info->dax_dev)) {
rc = PTR_ERR(dev_info->dax_dev);
dev_info->dax_dev = NULL;
goto put_dev;
}
+   set_dax_synchronous(dev_info->dax_dev);
rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
if (rc)
goto out_dax;
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 242cc1c0d7ed7..5c03a0364a9bb 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -850,7 +850,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, 
struct virtio_fs *fs)
dev_dbg(>dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 
0x%llx\n",
__func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);
 
-   fs->dax_dev = alloc_dax(fs, _fs_dax_ops, 0);
+   fs->dax_dev = alloc_dax(fs, _fs_dax_ops);
if (IS_ERR(fs->dax_dev))
return PTR_ERR(fs->dax_dev);
 
diff 

[PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter

2021-12-08 Thread Christoph Hellwig
While using the MC-safe copy routines is rather pointless on a virtual device
like virtiofs, it also isn't harmful at all.  So just use _copy_mc_to_iter
unconditionally to simplify the code.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 10 --
 fs/fuse/virtio_fs.c |  1 -
 include/linux/dax.h |  1 -
 3 files changed, 12 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index ff676a07480c8..fe783234ca669 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,8 +107,6 @@ enum dax_device_flags {
DAXDEV_SYNC,
/* do not use uncached operations to write data */
DAXDEV_CACHED,
-   /* do not use mcsafe operations to read data */
-   DAXDEV_NOMCSAFE,
 };
 
 /**
@@ -171,8 +169,6 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t 
pgoff, void *addr,
 * via access_ok() in vfs_red, so use the 'no check' version to bypass
 * the HARDENED_USERCOPY overhead.
 */
-   if (test_bit(DAXDEV_NOMCSAFE, _dev->flags))
-   return _copy_to_iter(addr, bytes, i);
return _copy_mc_to_iter(addr, bytes, i);
 }
 
@@ -242,12 +238,6 @@ void set_dax_cached(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(set_dax_cached);
 
-void set_dax_nomcsafe(struct dax_device *dax_dev)
-{
-   set_bit(DAXDEV_NOMCSAFE, _dev->flags);
-}
-EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
-
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 754319ce2a29b..d9c20b148ac19 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -838,7 +838,6 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, 
struct virtio_fs *fs)
if (IS_ERR(fs->dax_dev))
return PTR_ERR(fs->dax_dev);
set_dax_cached(fs->dax_dev);
-   set_dax_nomcsafe(fs->dax_dev);
return devm_add_action_or_reset(>dev, virtio_fs_cleanup_dax,
fs->dax_dev);
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d22cbf03d37d2..d267331bc37e7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -90,7 +90,6 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
 #endif
 
 void set_dax_cached(struct dax_device *dax_dev);
-void set_dax_nomcsafe(struct dax_device *dax_dev);
 
 struct writeback_control;
 #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


devirtualize kernel access to DAX

2021-12-08 Thread Christoph Hellwig
Hi Dan,

this series cleans up a few loose end ends and then removes the
copy_from_iter and copy_to_iter dax_operations methods in favor of
straight calls.

Diffstat:
 drivers/dax/bus.c |3 +
 drivers/dax/super.c   |   40 ++---
 drivers/md/dm-linear.c|   20 --
 drivers/md/dm-log-writes.c|   80 --
 drivers/md/dm-stripe.c|   20 --
 drivers/md/dm.c   |   52 ---
 drivers/nvdimm/pmem.c |   27 +-
 drivers/s390/block/dcssblk.c  |   18 +
 fs/dax.c  |5 --
 fs/fuse/virtio_fs.c   |   20 +-
 include/linux/dax.h   |   28 +++---
 include/linux/device-mapper.h |4 --
 include/linux/uio.h   |   20 --
 13 files changed, 44 insertions(+), 293 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous

2021-12-08 Thread Christoph Hellwig
Remove the pointless wrappers.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c |  8 
 include/linux/dax.h | 12 ++--
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e7152a6c4cc40..e18155f43a635 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
-bool __dax_synchronous(struct dax_device *dax_dev)
+bool dax_synchronous(struct dax_device *dax_dev)
 {
return test_bit(DAXDEV_SYNC, _dev->flags);
 }
-EXPORT_SYMBOL_GPL(__dax_synchronous);
+EXPORT_SYMBOL_GPL(dax_synchronous);
 
-void __set_dax_synchronous(struct dax_device *dax_dev)
+void set_dax_synchronous(struct dax_device *dax_dev)
 {
set_bit(DAXDEV_SYNC, _dev->flags);
 }
-EXPORT_SYMBOL_GPL(__set_dax_synchronous);
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
 
 bool dax_alive(struct dax_device *dax_dev)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
-bool __dax_synchronous(struct dax_device *dax_dev);
-static inline bool dax_synchronous(struct dax_device *dax_dev)
-{
-   return  __dax_synchronous(dax_dev);
-}
-void __set_dax_synchronous(struct dax_device *dax_dev);
-static inline void set_dax_synchronous(struct dax_device *dax_dev)
-{
-   __set_dax_synchronous(dax_dev);
-}
+bool dax_synchronous(struct dax_device *dax_dev);
+void set_dax_synchronous(struct dax_device *dax_dev);
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen  wrote:
>
> Add netlink attribute and callback function to query the control VQ
> index of a device.
>
> Example:
>
> $ vdpa dev config show vdpa-a
>   vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 5 \
>   mtu 9000 ctrl_vq_idx 10


I still wonder about the motivation for this. And as discussed, the
ctrl_vq_idx varies depending on whether MQ is negotiated.

Thanks

>
> Signed-off-by: Eli Cohen 
> ---
> v0 -> v1:
> 1. Use logic defined in the spec to deduce virtqueue index.
>
>  drivers/vdpa/vdpa.c   | 25 +
>  include/uapi/linux/vdpa.h |  1 +
>  2 files changed, 26 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 3bf016e03512..b4d4b8a7ca4e 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -712,6 +712,27 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff 
> *msg, struct netlink_callba
> return msg->len;
>  }
>
> +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev,
> +struct sk_buff *msg,
> +struct virtio_net_config *config,
> +u64 features)
> +{
> +   u16 N;
> +
> +   /* control VQ index, if available, is deduced according to the logic
> +* described in the virtio spec in section 5.1.2
> +*/
> +   if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> +   return 0;
> +
> +   if (features & BIT_ULL(VIRTIO_NET_F_MQ))
> +   N = le16_to_cpu(config->max_virtqueue_pairs);
> +   else
> +   N = 1;
> +
> +   return nla_put_u16(msg, VDPA_ATTR_DEV_CTRL_VQ_IDX, 2 * N);
> +}
> +
>  static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
>struct sk_buff *msg, u64 features,
>const struct virtio_net_config *config)
> @@ -730,6 +751,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
> *vdev, struct sk_buff *ms
> struct virtio_net_config config = {};
> u64 features;
> u16 val_u16;
> +   int err;
>
> vdpa_get_config(vdev, 0, , sizeof(config));
>
> @@ -746,6 +768,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
> *vdev, struct sk_buff *ms
> return -EMSGSIZE;
>
> features = vdev->config->get_driver_features(vdev);
> +   err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, , features);
> +   if (err)
> +   return err;
>
> return vdpa_dev_net_mq_config_fill(vdev, msg, features, );
>  }
> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
> index a252f06f9dfd..2e3a7f89f42d 100644
> --- a/include/uapi/linux/vdpa.h
> +++ b/include/uapi/linux/vdpa.h
> @@ -34,6 +34,7 @@ enum vdpa_attr {
> VDPA_ATTR_DEV_MAX_VQS,  /* u32 */
> VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
> VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
> +   VDPA_ATTR_DEV_CTRL_VQ_IDX,  /* u16 */
>
> VDPA_ATTR_DEV_NET_CFG_MACADDR,  /* binary */
> VDPA_ATTR_DEV_NET_STATUS,   /* u8 */
> --
> 2.33.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 5/7] vdpa/mlx5: Support configuring max data virtqueue pairs

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen  wrote:
>
> Check whether the max number of data virtqueue pairs was provided when a
> adding a new device and verify the new value does not exceed device
> capabilities.
>
> In addition, change the arrays holding virtqueue and callback contexts
> to be dynamically allocated.
>
> Signed-off-by: Eli Cohen 

Acked-by: Jason Wang 

> ---
> v0->v1:
> 1. Remove requirement to configure power of two virtqueue pairs
> 2. Fix cleanup if kcalloc failed
>
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 43 +--
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 1ec29879fc2c..4f0b8bba8b58 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -131,11 +131,6 @@ struct mlx5_vdpa_virtqueue {
> struct mlx5_vq_restore_info ri;
>  };
>
> -/* We will remove this limitation once mlx5_vdpa_alloc_resources()
> - * provides for driver space allocation
> - */
> -#define MLX5_MAX_SUPPORTED_VQS 16
> -
>  static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
>  {
> if (unlikely(idx > mvdev->max_idx))
> @@ -148,8 +143,8 @@ struct mlx5_vdpa_net {
> struct mlx5_vdpa_dev mvdev;
> struct mlx5_vdpa_net_resources res;
> struct virtio_net_config config;
> -   struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS];
> -   struct vdpa_callback event_cbs[MLX5_MAX_SUPPORTED_VQS + 1];
> +   struct mlx5_vdpa_virtqueue *vqs;
> +   struct vdpa_callback *event_cbs;
>
> /* Serialize vq resources creation and destruction. This is required
>  * since memory map might change and we need to destroy and create
> @@ -1218,7 +1213,7 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>  {
> int i;
>
> -   for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
> +   for (i = 0; i < ndev->mvdev.max_vqs; i++)
> suspend_vq(ndev, >vqs[i]);
>  }
>
> @@ -1245,7 +1240,7 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
> int i, j;
> int err;
>
> -   max_rqt = min_t(int, MLX5_MAX_SUPPORTED_VQS / 2,
> +   max_rqt = min_t(int, roundup_pow_of_two(ndev->mvdev.max_vqs  / 2),
> 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, 
> log_max_rqt_size));
> if (max_rqt < 1)
> return -EOPNOTSUPP;
> @@ -2219,7 +2214,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
> clear_vqs_ready(ndev);
> mlx5_vdpa_destroy_mr(>mvdev);
> ndev->mvdev.status = 0;
> -   memset(ndev->event_cbs, 0, sizeof(ndev->event_cbs));
> +   memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs 
> + 1));
> ndev->mvdev.actual_features = 0;
> ++mvdev->generation;
> if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
> @@ -2292,6 +2287,8 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> }
> mlx5_vdpa_free_resources(>mvdev);
> mutex_destroy(>reslock);
> +   kfree(ndev->event_cbs);
> +   kfree(ndev->vqs);
>  }
>
>  static struct vdpa_notification_area mlx5_get_vq_notification(struct 
> vdpa_device *vdev, u16 idx)
> @@ -2537,15 +2534,33 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
> return -EOPNOTSUPP;
> }
>
> -   /* we save one virtqueue for control virtqueue should we require it */
> max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
> -   max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
> +   if (max_vqs < 2) {
> +   dev_warn(mdev->device,
> +"%d virtqueues are supported. At least 2 are 
> required\n",
> +max_vqs);
> +   return -EAGAIN;
> +   }
> +
> +   if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) {
> +   if (add_config->net.max_vq_pairs > max_vqs / 2)
> +   return -EINVAL;
> +   max_vqs = min_t(u32, max_vqs, 2 * 
> add_config->net.max_vq_pairs);
> +   } else {
> +   max_vqs = 2;
> +   }
>
> ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
> mdev->device, _vdpa_ops,
>  name, false);
> if (IS_ERR(ndev))
> return PTR_ERR(ndev);
>
> +   ndev->vqs = kcalloc(max_vqs, sizeof(*ndev->vqs), GFP_KERNEL);
> +   ndev->event_cbs = kcalloc(max_vqs + 1, sizeof(*ndev->event_cbs), 
> GFP_KERNEL);
> +   if (!ndev->vqs || !ndev->event_cbs) {
> +   err = -ENOMEM;
> +   goto err_alloc;
> +   }
> ndev->mvdev.max_vqs = max_vqs;
> mvdev = >mvdev;
> mvdev->mdev = mdev;
> @@ -2626,6 +2641,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
> mlx5_mpfs_del_mac(pfmdev, config->mac);
>  err_mtu:
> 

Re: [PATCH v1 3/7] vdpa: Allow to configure max data virtqueues

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 8:25 AM Si-Wei Liu  wrote:
>
>
>
> On 12/8/2021 12:14 PM, Eli Cohen wrote:
> > Add netlink support to configure the max virtqueue pairs for a device.
> > At least one pair is required. The maximum is dictated by the device.
> >
> > Example:
> >
> > $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 5
> Not this patch, but I think there should be a mega attribute defined
> ahead to specify the virtio class/type to create vdpa instance with.
> Like the existing ones e.g. mac_addr and mtu, max_vqp is also vdpa net
> specific attribute.

Probably, but this only works for the case when a single parent is
allowed to create different types of devices. It looks to me the
current model to have a per type parent.

Thanks

>
> -Siwei
>
> >
> > Signed-off-by: Eli Cohen 
> > ---
> > v0 -> v1:
> > 1. fix typo
> > 2. move max_vq_pairs to net specific struct
> >
> >   drivers/vdpa/vdpa.c  | 14 +-
> >   include/linux/vdpa.h |  1 +
> >   2 files changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> > index c37d384c0f33..3bf016e03512 100644
> > --- a/drivers/vdpa/vdpa.c
> > +++ b/drivers/vdpa/vdpa.c
> > @@ -480,7 +480,8 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, 
> > struct netlink_callback *cb)
> >   }
> >
> >   #define VDPA_DEV_NET_ATTRS_MASK ((1 << VDPA_ATTR_DEV_NET_CFG_MACADDR) | \
> > -  (1 << VDPA_ATTR_DEV_NET_CFG_MTU))
> > +  (1 << VDPA_ATTR_DEV_NET_CFG_MTU) | \
> > +  (1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
> >
> >   static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct 
> > genl_info *info)
> >   {
> > @@ -506,6 +507,17 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff 
> > *skb, struct genl_info *i
> >   nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU]);
> >   config.mask |= (1 << VDPA_ATTR_DEV_NET_CFG_MTU);
> >   }
> > + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]) {
> > + config.net.max_vq_pairs =
> > + nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]);
> > + if (!config.net.max_vq_pairs) {
> > + NL_SET_ERR_MSG_MOD(info->extack,
> > +"At least one pair of VQs is 
> > required");
> > + err = -EINVAL;
> > + goto err;
> > + }
> > + config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
> > + }
> >
> >   /* Skip checking capability if user didn't prefer to configure any
> >* device networking attributes. It is likely that user might have 
> > used
> > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> > index db24317d61b6..b62032573780 100644
> > --- a/include/linux/vdpa.h
> > +++ b/include/linux/vdpa.h
> > @@ -99,6 +99,7 @@ struct vdpa_dev_set_config {
> >   struct {
> >   u8 mac[ETH_ALEN];
> >   u16 mtu;
> > + u16 max_vq_pairs;
> >   } net;
> >   u64 mask;
> >   };
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 4:14 AM Eli Cohen  wrote:
>
> Provide an interface to read the negotiated features. This is needed
> when building the netlink message in vdpa_dev_net_config_fill().
>
> Also fix the implementation of vdpa_dev_net_config_fill() to use the
> negotiated features instead of the device features.
>
> To make APIs clearer, make the following name changes to struct
> vdpa_config_ops so they better describe their operations:
>
> get_features -> get_device_features
> set_features -> set_driver_features
>
> Finally, add get_driver_features to return the negotiated features and
> add implementation to all the upstream drivers.
>
> Signed-off-by: Eli Cohen 
> ---
>  drivers/vdpa/alibaba/eni_vdpa.c| 16 
>  drivers/vdpa/ifcvf/ifcvf_main.c| 16 
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 16 
>  drivers/vdpa/vdpa.c|  2 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c   | 21 +++--
>  drivers/vdpa/vdpa_user/vduse_dev.c | 16 
>  drivers/vdpa/virtio_pci/vp_vdpa.c  | 16 
>  drivers/vhost/vdpa.c   |  2 +-
>  drivers/virtio/virtio_vdpa.c   |  2 +-
>  include/linux/vdpa.h   | 14 +-
>  10 files changed, 87 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c
> index 3f788794571a..ac0975660f4d 100644
> --- a/drivers/vdpa/alibaba/eni_vdpa.c
> +++ b/drivers/vdpa/alibaba/eni_vdpa.c
> @@ -58,7 +58,7 @@ static struct virtio_pci_legacy_device *vdpa_to_ldev(struct 
> vdpa_device *vdpa)
> return _vdpa->ldev;
>  }
>
> -static u64 eni_vdpa_get_features(struct vdpa_device *vdpa)
> +static u64 eni_vdpa_get_device_features(struct vdpa_device *vdpa)
>  {
> struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
> u64 features = vp_legacy_get_features(ldev);
> @@ -69,7 +69,7 @@ static u64 eni_vdpa_get_features(struct vdpa_device *vdpa)
> return features;
>  }
>
> -static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features)
> +static int eni_vdpa_set_driver_features(struct vdpa_device *vdpa, u64 
> features)
>  {
> struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
>
> @@ -84,6 +84,13 @@ static int eni_vdpa_set_features(struct vdpa_device *vdpa, 
> u64 features)
> return 0;
>  }
>
> +static u64 eni_vdpa_get_driver_features(struct vdpa_device *vdpa)
> +{
> +   struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
> +
> +   return vp_legacy_get_driver_features(ldev);
> +}
> +
>  static u8 eni_vdpa_get_status(struct vdpa_device *vdpa)
>  {
> struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
> @@ -401,8 +408,9 @@ static void eni_vdpa_set_config_cb(struct vdpa_device 
> *vdpa,
>  }
>
>  static const struct vdpa_config_ops eni_vdpa_ops = {
> -   .get_features   = eni_vdpa_get_features,
> -   .set_features   = eni_vdpa_set_features,
> +   .get_device_features = eni_vdpa_get_device_features,
> +   .set_driver_features = eni_vdpa_set_driver_features,
> +   .get_driver_features = eni_vdpa_get_driver_features,
> .get_status = eni_vdpa_get_status,
> .set_status = eni_vdpa_set_status,
> .reset  = eni_vdpa_reset,
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index 6dc75ca70b37..6a6a0a462392 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -169,7 +169,7 @@ static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device 
> *vdpa_dev)
> return >vf;
>  }
>
> -static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev)
> +static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
>  {
> struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev);
> struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
> @@ -187,7 +187,7 @@ static u64 ifcvf_vdpa_get_features(struct vdpa_device 
> *vdpa_dev)
> return features;
>  }
>
> -static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 
> features)
> +static int ifcvf_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 
> features)
>  {
> struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
> int ret;
> @@ -201,6 +201,13 @@ static int ifcvf_vdpa_set_features(struct vdpa_device 
> *vdpa_dev, u64 features)
> return 0;
>  }
>
> +static u64 ifcvf_vdpa_get_driver_features(struct vdpa_device *vdpa_dev)
> +{
> +   struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
> +
> +   return vf->req_features;
> +}
> +
>  static u8 ifcvf_vdpa_get_status(struct vdpa_device *vdpa_dev)
>  {
> struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
> @@ -443,8 +450,9 @@ static struct vdpa_notification_area 
> ifcvf_get_vq_notification(struct vdpa_devic
>   * implemented set_map()/dma_map()/dma_unmap()
>   */
>  static const struct vdpa_config_ops ifc_vdpa_ops = {
> -   .get_features   = ifcvf_vdpa_get_features,
> -   

Re: [PATCH 2/2 v4] vdpa: check that offsets are within bounds

2021-12-08 Thread Jason Wang
On Wed, Dec 8, 2021 at 6:34 PM Dan Carpenter  wrote:
>
> In this function "c->off" is a u32 and "size" is a long.  On 64bit systems
> if "c->off" is greater than "size" then "size - c->off" is a negative and
> we always return -E2BIG.  But on 32bit systems the subtraction is type
> promoted to a high positive u32 value and basically any "c->len" is
> accepted.
>
> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> Reported-by: Xie Yongji 
> Signed-off-by: Dan Carpenter 

Acked-by: Jason Wang 

> ---
> v4: split into a separate patch
>
>  drivers/vhost/vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 29cced1cd277..e3c4f059b21a 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -197,7 +197,7 @@ static int vhost_vdpa_config_validate(struct vhost_vdpa 
> *v,
> struct vdpa_device *vdpa = v->vdpa;
> long size = vdpa->config->get_config_size(vdpa);
>
> -   if (c->len == 0)
> +   if (c->len == 0 || c->off > size)
> return -EINVAL;
>
> if (c->len > size - c->off)
> --
> 2.20.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2 v4] vduse: fix memory corruption in vduse_dev_ioctl()

2021-12-08 Thread Jason Wang
On Wed, Dec 8, 2021 at 6:33 PM Dan Carpenter  wrote:
>
> The "config.offset" comes from the user.  There needs to a check to
> prevent it being out of bounds.  The "config.offset" and
> "dev->config_size" variables are both type u32.  So if the offset if
> out of bounds then the "dev->config_size - config.offset" subtraction
> results in a very high u32 value.  The out of bounds offset can result
> in memory corruption.
>
> Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> Signed-off-by: Dan Carpenter 
> ---
> v2: fix reversed if statement
> v3: fix vhost_vdpa_config_validate() as pointed out by Yongji Xie.
> v4: split the vhost_vdpa_config_validate() change into a separate path

Acked-by: Jason Wang 

>
>  drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index c9204c62f339..1a206f95d73a 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -975,7 +975,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
> int cmd,
> break;
>
> ret = -EINVAL;
> -   if (config.length == 0 ||
> +   if (config.offset > dev->config_size ||
> +   config.length == 0 ||
> config.length > dev->config_size - config.offset)
> break;
>
> --
> 2.20.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2] virtio-mmio: harden interrupt

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 4:27 AM Michael S. Tsirkin  wrote:
>
> On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote:
> > This patch tries to make sure the virtio interrupt handler for MMIO
> > won't be called after a reset and before virtio_device_ready(). We
> > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > (IRQF_SHARED). So this patch tracks the interrupt enabling status in a
> > new intr_soft_enabled variable and toggle it during in
> > vm_disable/enable_interrupts(). The MMIO interrupt handler will check
> > intr_soft_enabled before processing the actual interrupt.
> >
> > Signed-off-by: Jason Wang 
> > ---
> > Changes since V1:
> > - Silent compling warnings
> >  drivers/virtio/virtio_mmio.c | 37 
> >  1 file changed, 37 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index 56128b9c46eb..c517afdd2cc5 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -90,6 +90,7 @@ struct virtio_mmio_device {
> >   /* a list of queues so we can dispatch IRQs */
> >   spinlock_t lock;
> >   struct list_head virtqueues;
> > + bool intr_soft_enabled;
> >  };
> >
> >  struct virtio_mmio_vq_info {
> > @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info {
> >   struct list_head node;
> >  };
> >
> > +/* disable irq handlers */
> > +static void vm_disable_cbs(struct virtio_device *vdev)
> > +{
> > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > + int irq = platform_get_irq(vm_dev->pdev, 0);
> >
> > + /*
> > +  * The below synchronize() guarantees that any
> > +  * interrupt for this line arriving after
> > +  * synchronize_irq() has completed is guaranteed to see
> > +  * intx_soft_enabled == false.
> > +  */
> > + WRITE_ONCE(vm_dev->intr_soft_enabled, false);
> > + synchronize_irq(irq);
> > +}
> > +
> > +/* enable irq handlers */
> > +static void vm_enable_cbs(struct virtio_device *vdev)
> > +{
> > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> > + int irq = platform_get_irq(vm_dev->pdev, 0);
> > +
> > + disable_irq(irq);
> > + /*
> > +  * The above disable_irq() provides TSO ordering and
> > +  * as such promotes the below store to store-release.
> > +  */
> > + WRITE_ONCE(vm_dev->intr_soft_enabled, true);
> > + enable_irq(irq);
> > + return;
> > +}
> >
> >  /* Configuration interface */
> >
> > @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev)
> >
> >   /* 0 status means a reset. */
> >   writel(0, vm_dev->base + VIRTIO_MMIO_STATUS);
>
> There was a discussion about reading status to make sure it is clear.
> The spec says we should, this can't hurt as a further hardening measure.
> In fact, let's do it in the core maybe? Spec says it applies to all
> devices ...

We can do that, but I'm not sure if we break some existing device.

>
> > + /* Disable VQ/configuration callbacks. */
> > + vm_disable_cbs(vdev);
> >  }
> >
> >
> > @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
> >   unsigned long flags;
> >   irqreturn_t ret = IRQ_NONE;
> >
> > + if (!READ_ONCE(vm_dev->intr_soft_enabled))
> > + return IRQ_NONE;
> > +
>
> So if the write is seen before reset happened (should not happen, but we
> are talking a buggy device) then it won't be acknowledged and device
> will keep pulling the interrupt. I think as long as we are hardening
> this, let's go the full mile and try to avoid DoS if we can, do the
> check before invoking the callback, but do not skip the read.
> Whether to still return IRQ_NONE is a good question.

Did you mean something like this:

/* Read and acknowledge interrupts */
status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);

if (status)
ret = IRQ_HANDLED;

   if (!READ_ONCE(vm_dev->intr_soft_enabled))
   return ret;

Thanks

>
>
>
>
> >   /* Read and acknowledge interrupts */
> >   status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> >   writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
> > @@ -529,6 +565,7 @@ static bool vm_get_shm_region(struct virtio_device 
> > *vdev,
> >  }
> >
> >  static const struct virtio_config_ops virtio_mmio_config_ops = {
> > + .enable_cbs = vm_enable_cbs,
> >   .get= vm_get,
> >   .set= vm_set,
> >   .generation = vm_generation,
> > --
> > 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/7] vhost_vsock: simplify vhost_vsock_flush()

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 1:35 AM  wrote:
>
> On 12/7/21 9:53 PM, Jason Wang wrote:
> > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie
> >  wrote:
> >>
> >> From: Andrey Ryabinin 
> >>
> >> vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev)
> >> before vhost_work_dev_flush(>dev). This seems pointless
> >> as vsock->vqs[i].poll.dev is the same as >dev and several flushes
> >> in a row doesn't do anything useful, one is just enough.
> >>
> >> Signed-off-by: Andrey Ryabinin 
> >> Reviewed-by: Stefano Garzarella 
> >> Signed-off-by: Mike Christie 
> >> ---
> >>  drivers/vhost/vsock.c | 5 -
> >>  1 file changed, 5 deletions(-)
> >>
> >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >> index 2339587bcd31..1f38160b249d 100644
> >> --- a/drivers/vhost/vsock.c
> >> +++ b/drivers/vhost/vsock.c
> >> @@ -703,11 +703,6 @@ static int vhost_vsock_dev_open(struct inode *inode, 
> >> struct file *file)
> >>
> >>  static void vhost_vsock_flush(struct vhost_vsock *vsock)
> >>  {
> >> -   int i;
> >> -
> >> -   for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
> >> -   if (vsock->vqs[i].handle_kick)
> >> -   vhost_work_dev_flush(vsock->vqs[i].poll.dev);
> >> vhost_work_dev_flush(>dev);
> >>  }
> >
> > Is this better to be consistent with vhost-net so that we can simply
> > remove the vhost_vsock_flush() wrapper here.
> >
>
> I didn't understand that comment.
>
> Did you mean consistent as they both have vhost_vsock/net_flush functions
> or as in they prefer to not have one line wrappers?
>
> Before and after this patchset, both net and vsock have a 
> vhost_vsock/net_flush
> function, so maybe you didn't mean that.
>
> I think the wrapper is not very useful and could be removed. However,
> I thought vsock preferred wrappers because we have vhost_vsock_free
> which is just a wrapper around kfree. I also noticed test.c is a
> fan of one line wrappers, but I see net and scsi do not do that.

Ok, then I'm fine with this.

Thanks

>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 6/7] vhost-scsi: drop flush after vhost_dev_cleanup

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 12:45 AM Mike Christie
 wrote:
>
> On 12/7/21 10:07 PM, Jason Wang wrote:
> > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie
> >  wrote:
> >>
> >> The flush after vhost_dev_cleanup is not needed because:
> >>
> >> 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread
> >> so the flush call will just return since the worker has not device.
> >>
> >> 2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs
> >> the mutex and if the backend is NULL will return without queueing a work.
> >> vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex
> >> then drops the mutex and does a flush. So we know when
> >> vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend
> >> no evt related work will be able to requeue. The flush would then make sure
> >> any queued evts are run and return.
> >
> > What happens if a kick after vhost_scsi_clear_endpoint() but before
> > vhost_dev_stop()?
>
> vhost_dev_stop also does a flush, so:
>
> 1. The kick handler would see the backend is null and return immediately.
> 2. The flush in vhost_dev_stop would wait for those kicks in #1 to complete.

You are right.

So

Acked-by: Jason Wang 

>
>
> >
> > Is this safe or the kthread_stop() in vhost_dev_cleanup() makes us safe?
> >
> > Thanks
> >
> >>
> >> Signed-off-by: Mike Christie 
> >> ---
> >>  drivers/vhost/scsi.c | 2 --
> >>  1 file changed, 2 deletions(-)
> >>
> >> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> >> index 532e204f2b1b..94535c813ef7 100644
> >> --- a/drivers/vhost/scsi.c
> >> +++ b/drivers/vhost/scsi.c
> >> @@ -1827,8 +1827,6 @@ static int vhost_scsi_release(struct inode *inode, 
> >> struct file *f)
> >> vhost_scsi_clear_endpoint(vs, );
> >> vhost_dev_stop(>dev);
> >> vhost_dev_cleanup(>dev);
> >> -   /* Jobs can re-queue themselves in evt kick handler. Do extra 
> >> flush. */
> >> -   vhost_scsi_flush(vs);
> >> kfree(vs->dev.vqs);
> >> kvfree(vs);
> >> return 0;
> >> --
> >> 2.25.1
> >>
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/7] vhost: get rid of vhost_poll_flush() wrapper

2021-12-08 Thread Jason Wang
On Thu, Dec 9, 2021 at 12:41 AM Mike Christie
 wrote:
>
> On 12/7/21 9:49 PM, Jason Wang wrote:
> > On Tue, Dec 7, 2021 at 10:45 AM Mike Christie
> >  wrote:
> >>
> >> vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
> >> It gives wrong impression that we are doing some work over vhost_poll,
> >> while in fact it flushes vhost_poll->dev.
> >
> > This "problem" is a byproduct of 7235acdb1144 ("vhost: simplify work 
> > flushing").
> >
> > Before that we indeed have per poll flush flush.
> >
> >> It only complicate understanding of the code and leads to mistakes
> >> like flushing the same vhost_dev several times in a row.
> >>
> >> Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.
> >
> > Not a native speaker but since we don't have an per work flush, is it
> > better to rename this simply as vhost_flush()?
> >
>
> What about vhost_dev_flush?
>
> For the existing naming when we have a function exported we tend to have
> "vhost_" then the object/struct it works on then the action.
>
> For work we have:
>
> vhost_work_queue/init
>
> (we also have vhost_has_work which doesn't follow that pattern but
> would sound strange as vhost_work_has so ignore that one).
>
> For dev operations we have:
>
> vhost_dev_reset_owner/set_owner/has_owner/cleanup/init
>
> For the flush operation I wanted it to reflect it flushed all work
> on the device, so I mashed up the work and dev naming above and
> I agree it's a little strange.

It looks fine to me.

Thanks

>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index

2021-12-08 Thread Si-Wei Liu



On 12/8/2021 12:14 PM, Eli Cohen wrote:

Add netlink attribute and callback function to query the control VQ
index of a device.

Example:

$ vdpa dev config show vdpa-a
   vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 5 \
   mtu 9000 ctrl_vq_idx 10
First, I am not sure if there's value or a case showing that expose and 
trace the guest ctrl_vq_idx value (running state) to admin users turns 
out to be useful. Previously I thought you want to expose it to QEMU but 
it seems this is a bit redundant, which can be deduced from max_vqp 
passing up to QEMU. Second, I don't feel running states such as 
link_announce and ctrl_vq_idx are qualified to be configuration field 
that can be displayed in 'vdpa dev config show'. Could you please 
clarify the scope for what kind of info this command should cover?


-Siwei



Signed-off-by: Eli Cohen 
---
v0 -> v1:
1. Use logic defined in the spec to deduce virtqueue index.

  drivers/vdpa/vdpa.c   | 25 +
  include/uapi/linux/vdpa.h |  1 +
  2 files changed, 26 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 3bf016e03512..b4d4b8a7ca4e 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -712,6 +712,27 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, 
struct netlink_callba
return msg->len;
  }
  
+static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev,

+struct sk_buff *msg,
+struct virtio_net_config *config,
+u64 features)
+{
+   u16 N;
+
+   /* control VQ index, if available, is deduced according to the logic
+* described in the virtio spec in section 5.1.2
+*/
+   if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
+   return 0;
+
+   if (features & BIT_ULL(VIRTIO_NET_F_MQ))
+   N = le16_to_cpu(config->max_virtqueue_pairs);
+   else
+   N = 1;
+
+   return nla_put_u16(msg, VDPA_ATTR_DEV_CTRL_VQ_IDX, 2 * N);
+}
+
  static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
   struct sk_buff *msg, u64 features,
   const struct virtio_net_config *config)
@@ -730,6 +751,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
*vdev, struct sk_buff *ms
struct virtio_net_config config = {};
u64 features;
u16 val_u16;
+   int err;
  
  	vdpa_get_config(vdev, 0, , sizeof(config));
  
@@ -746,6 +768,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms

return -EMSGSIZE;
  
  	features = vdev->config->get_driver_features(vdev);

+   err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, , features);
+   if (err)
+   return err;
  
  	return vdpa_dev_net_mq_config_fill(vdev, msg, features, );

  }
diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
index a252f06f9dfd..2e3a7f89f42d 100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -34,6 +34,7 @@ enum vdpa_attr {
VDPA_ATTR_DEV_MAX_VQS,  /* u32 */
VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
+   VDPA_ATTR_DEV_CTRL_VQ_IDX,  /* u16 */
  
  	VDPA_ATTR_DEV_NET_CFG_MACADDR,		/* binary */

VDPA_ATTR_DEV_NET_STATUS,   /* u8 */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 3/7] vdpa: Allow to configure max data virtqueues

2021-12-08 Thread Si-Wei Liu




On 12/8/2021 12:14 PM, Eli Cohen wrote:

Add netlink support to configure the max virtqueue pairs for a device.
At least one pair is required. The maximum is dictated by the device.

Example:

$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 5
Not this patch, but I think there should be a mega attribute defined 
ahead to specify the virtio class/type to create vdpa instance with. 
Like the existing ones e.g. mac_addr and mtu, max_vqp is also vdpa net 
specific attribute.


-Siwei



Signed-off-by: Eli Cohen 
---
v0 -> v1:
1. fix typo
2. move max_vq_pairs to net specific struct

  drivers/vdpa/vdpa.c  | 14 +-
  include/linux/vdpa.h |  1 +
  2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index c37d384c0f33..3bf016e03512 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -480,7 +480,8 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, struct 
netlink_callback *cb)
  }
  
  #define VDPA_DEV_NET_ATTRS_MASK ((1 << VDPA_ATTR_DEV_NET_CFG_MACADDR) | \

-(1 << VDPA_ATTR_DEV_NET_CFG_MTU))
+(1 << VDPA_ATTR_DEV_NET_CFG_MTU) | \
+(1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
  
  static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *info)

  {
@@ -506,6 +507,17 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff 
*skb, struct genl_info *i
nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU]);
config.mask |= (1 << VDPA_ATTR_DEV_NET_CFG_MTU);
}
+   if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]) {
+   config.net.max_vq_pairs =
+   nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]);
+   if (!config.net.max_vq_pairs) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "At least one pair of VQs is 
required");
+   err = -EINVAL;
+   goto err;
+   }
+   config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
+   }
  
  	/* Skip checking capability if user didn't prefer to configure any

 * device networking attributes. It is likely that user might have used
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index db24317d61b6..b62032573780 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -99,6 +99,7 @@ struct vdpa_dev_set_config {
struct {
u8 mac[ETH_ALEN];
u16 mtu;
+   u16 max_vq_pairs;
} net;
u64 mask;
  };


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq

2021-12-08 Thread Michael S. Tsirkin
On Wed, Dec 08, 2021 at 04:04:19PM +0800, 王贇 wrote:
> 
> 
> 在 2021/12/8 下午3:23, Michael S. Tsirkin 写道:
> > On Tue, Dec 07, 2021 at 05:09:56PM +0800, 王贇 wrote:
> > > 
> > > 
> > > 在 2021/12/7 下午4:13, Michael S. Tsirkin 写道:
> > > > On Tue, Dec 07, 2021 at 03:51:45PM +0800, 王贇 wrote:
> > > > > We observed issues like:
> > > > > virtio-pci :14:00.0: platform bug: legacy virtio-mmio must
> > > > > not be used with RAM above 0x4000GB
> > > > > 
> > > > > when we have a legacy pci device which desired 32bit-pfn vq
> > > > > but gain 64bit-pfn instead, lead into the failure of probe.
> > > > > 
> > > > > vring_use_dma_api() is playing the key role in here, to help the
> > > > > allocation process understand which kind of vq it should alloc,
> > > > > however, it failed to take care the legacy pci device, which only
> > > > > have 32bit feature flag and can never have VIRTIO_F_ACCESS_PLATFORM
> > > > > setted.
> > > > > 
> > > > > This patch introduce force_dma flag to help vring_use_dma_api()
> > > > > understanding the requirement better, to avoid the failing.
> > > > > 
> > > > > Signed-off-by: Michael Wang 
> > > > 
> > > > This will break configs where the device appears behind
> > > > a virtual iommu, so this won't fly.
> > > > Just make your device support 1.0, eh?
> > > 
> > > Hi, Michael
> > > 
> > > Thanks for the comment, unfortunately modify device is not an option for 
> > > us
> > > :-(
> > > 
> > > Is there any idea on how to solve this issue properly?
> > > 
> > > Regards,
> > > Michael Wang
> > 
> > By the way, there is a bug in the error message. Want to fix that?
> 
> Could you please provide more detail about the bug? We'd like to help fixing
> it :-)
> 
> Besides, I've checked that patch but it can't address our issue, we actually
> have this legacy pci device on arm platform, and the memory layout is
> unfriendly since allocation rarely providing page-address below 44bit, we
> understand the virtio-iommu case should not do force dma, while we don't
> have that so it's just working fine.
> 
> Regards,
> Michael Wang

BTW is it just the ring that's at issue?
Figuring out we have this problematic config and then allocating just
the ring from coherent memory seems more palatable.

But please note we still need to detect config with a virtual iommu (can
be any kind not just virtio-iommu, smmu, vtd are all affected) and
disable the hacks. This is what the new DMA API I suggested would do.


> > 
> > 
> > > > 
> > > > > ---
> > > > >drivers/virtio/virtio_pci_legacy.c | 10 ++
> > > > >drivers/virtio/virtio_ring.c   |  3 +++
> > > > >include/linux/virtio.h |  1 +
> > > > >3 files changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c
> > > > > b/drivers/virtio/virtio_pci_legacy.c
> > > > > index d62e983..11f2ebf 100644
> > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct 
> > > > > virtio_pci_device
> > > > > *vp_dev)
> > > > >   vp_dev->setup_vq = setup_vq;
> > > > >   vp_dev->del_vq = del_vq;
> > > > > 
> > > > > + /*
> > > > > +  * The legacy pci device requre 32bit-pfn vq,
> > > > > +  * or setup_vq() will failed.
> > > > > +  *
> > > > > +  * Thus we make sure vring_use_dma_api() will
> > > > > +  * return true during the allocation by marking
> > > > > +  * force_dma here.
> > > > > +  */
> > > > > + vp_dev->vdev.force_dma = true;
> > > > > +
> > > > >   return 0;
> > > > > 
> > > > >err_iomap:
> > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > b/drivers/virtio/virtio_ring.c
> > > > > index 3035bb6..6562e01 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -245,6 +245,9 @@ static inline bool virtqueue_use_indirect(struct
> > > > > virtqueue *_vq,
> > > > > 
> > > > >static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > >{
> > > > > + if (vdev->force_dma)
> > > > > + return true;
> > > > > +
> > > > >   if (!virtio_has_dma_quirk(vdev))
> > > > >   return true;
> > > > > 
> > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > index 41edbc0..a4eb29d 100644
> > > > > --- a/include/linux/virtio.h
> > > > > +++ b/include/linux/virtio.h
> > > > > @@ -109,6 +109,7 @@ struct virtio_device {
> > > > >   bool failed;
> > > > >   bool config_enabled;
> > > > >   bool config_change_pending;
> > > > > + bool force_dma;
> > > > >   spinlock_t config_lock;
> > > > >   spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > >   struct device dev;
> > > > > -- 
> > > > > 1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 2/7] vdpa/mlx5: Distribute RX virtqueues in RQT object

2021-12-08 Thread Si-Wei Liu




On 12/8/2021 12:14 PM, Eli Cohen wrote:

Distribute the available rx virtqueues amongst the available RQT
entries.

RQTs require to have a power of two entries. When creating or modifying
the RQT, use the lowest number of power of two entries that is not less
than the number of rx virtqueues. Distribute them in the available
entries such that some virtqueus may be referenced twice.

This allows to configure any number of virtqueue pairs when multiqueue
is used.

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 30 +++---
  1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ce2e13135dd8..e1a8a790f213 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1261,17 +1261,10 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
-   for (i = 0, j = 0; j < max_rqt; j++) {
-   if (!ndev->vqs[j].initialized)
-   continue;

Why the !initialized check is dropped from the new code?


-
-   if (!vq_is_tx(ndev->vqs[j].index)) {
-   list[i] = cpu_to_be32(ndev->vqs[j].virtq_id);
-   i++;
-   }
-   }
-   MLX5_SET(rqtc, rqtc, rqt_actual_size, i);
+   for (i = 0, j = 0; i < max_rqt; i++, j += 2)
+   list[i] = cpu_to_be32(ndev->vqs[j % 
ndev->mvdev.max_vqs].virtq_id);
  
+	MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);

err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, >res.rqtn);
kfree(in);
if (err)
@@ -1292,7 +1285,7 @@ static int modify_rqt(struct mlx5_vdpa_net *ndev, int num)
int i, j;
int err;
  
-	max_rqt = min_t(int, ndev->cur_num_vqs / 2,

+   max_rqt = min_t(int, roundup_pow_of_two(ndev->cur_num_vqs / 2),
1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
if (max_rqt < 1)
return -EOPNOTSUPP;
@@ -1308,16 +1301,10 @@ static int modify_rqt(struct mlx5_vdpa_net *ndev, int 
num)
MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
  
  	list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);

-   for (i = 0, j = 0; j < num; j++) {
-   if (!ndev->vqs[j].initialized)
-   continue;

Ditto.

-Siwei


+   for (i = 0, j = 0; i < max_rqt; i++, j += 2)
+   list[i] = cpu_to_be32(ndev->vqs[j % num].virtq_id);
  
-		if (!vq_is_tx(ndev->vqs[j].index)) {

-   list[i] = cpu_to_be32(ndev->vqs[j].virtq_id);
-   i++;
-   }
-   }
-   MLX5_SET(rqtc, rqtc, rqt_actual_size, i);
+   MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);
err = mlx5_vdpa_modify_rqt(>mvdev, in, inlen, ndev->res.rqtn);
kfree(in);
if (err)
@@ -1581,9 +1568,6 @@ static virtio_net_ctrl_ack handle_ctrl_mq(struct 
mlx5_vdpa_dev *mvdev, u8 cmd)
break;
}
  
-		if (newqps & (newqps - 1))

-   break;
-
if (!change_num_qps(mvdev, newqps))
status = VIRTIO_NET_OK;
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features

2021-12-08 Thread Si-Wei Liu




On 12/8/2021 12:14 PM, Eli Cohen wrote:

Provide an interface to read the negotiated features. This is needed
when building the netlink message in vdpa_dev_net_config_fill().

Also fix the implementation of vdpa_dev_net_config_fill() to use the
negotiated features instead of the device features.
Are we sure the use of device feature is a bug rather than expected 
behavior? How do users determine the configured number of queue pairs at 
any point in case of a non-multiqueue supporting guest/driver or that 
the device is being reset (where actual_features == 0)? Maybe we should 
differentiate the static device config against driver/device running 
state here. I thought the change to vdpa_dev_net_config_fill() in this 
patch would break vdpa tool user's expectation that the value of max_vqp 
config is a fixed value since the vdpa creation time.


Perhaps worth adding another attribute to represent the running state 
(cur_max_qps) based on the negotiated features.


-Siwei



To make APIs clearer, make the following name changes to struct
vdpa_config_ops so they better describe their operations:

get_features -> get_device_features
set_features -> set_driver_features

Finally, add get_driver_features to return the negotiated features and
add implementation to all the upstream drivers.

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/alibaba/eni_vdpa.c| 16 
  drivers/vdpa/ifcvf/ifcvf_main.c| 16 
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 16 
  drivers/vdpa/vdpa.c|  2 +-
  drivers/vdpa/vdpa_sim/vdpa_sim.c   | 21 +++--
  drivers/vdpa/vdpa_user/vduse_dev.c | 16 
  drivers/vdpa/virtio_pci/vp_vdpa.c  | 16 
  drivers/vhost/vdpa.c   |  2 +-
  drivers/virtio/virtio_vdpa.c   |  2 +-
  include/linux/vdpa.h   | 14 +-
  10 files changed, 87 insertions(+), 34 deletions(-)

diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c
index 3f788794571a..ac0975660f4d 100644
--- a/drivers/vdpa/alibaba/eni_vdpa.c
+++ b/drivers/vdpa/alibaba/eni_vdpa.c
@@ -58,7 +58,7 @@ static struct virtio_pci_legacy_device *vdpa_to_ldev(struct 
vdpa_device *vdpa)
return _vdpa->ldev;
  }
  
-static u64 eni_vdpa_get_features(struct vdpa_device *vdpa)

+static u64 eni_vdpa_get_device_features(struct vdpa_device *vdpa)
  {
struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
u64 features = vp_legacy_get_features(ldev);
@@ -69,7 +69,7 @@ static u64 eni_vdpa_get_features(struct vdpa_device *vdpa)
return features;
  }
  
-static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features)

+static int eni_vdpa_set_driver_features(struct vdpa_device *vdpa, u64 features)
  {
struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
  
@@ -84,6 +84,13 @@ static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features)

return 0;
  }
  
+static u64 eni_vdpa_get_driver_features(struct vdpa_device *vdpa)

+{
+   struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
+
+   return vp_legacy_get_driver_features(ldev);
+}
+
  static u8 eni_vdpa_get_status(struct vdpa_device *vdpa)
  {
struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
@@ -401,8 +408,9 @@ static void eni_vdpa_set_config_cb(struct vdpa_device *vdpa,
  }
  
  static const struct vdpa_config_ops eni_vdpa_ops = {

-   .get_features   = eni_vdpa_get_features,
-   .set_features   = eni_vdpa_set_features,
+   .get_device_features = eni_vdpa_get_device_features,
+   .set_driver_features = eni_vdpa_set_driver_features,
+   .get_driver_features = eni_vdpa_get_driver_features,
.get_status = eni_vdpa_get_status,
.set_status = eni_vdpa_set_status,
.reset  = eni_vdpa_reset,
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 6dc75ca70b37..6a6a0a462392 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -169,7 +169,7 @@ static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device 
*vdpa_dev)
return >vf;
  }
  
-static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev)

+static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
  {
struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev);
struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
@@ -187,7 +187,7 @@ static u64 ifcvf_vdpa_get_features(struct vdpa_device 
*vdpa_dev)
return features;
  }
  
-static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 features)

+static int ifcvf_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 
features)
  {
struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
int ret;
@@ -201,6 +201,13 @@ static int ifcvf_vdpa_set_features(struct vdpa_device 
*vdpa_dev, u64 features)
return 0;
  }
  
+static u64 ifcvf_vdpa_get_driver_features(struct vdpa_device 

Re: [PATCH] eni_vdpa: alibaba: select VIRTIO_PCI_LIB

2021-12-08 Thread Michael S. Tsirkin
On Wed, Dec 08, 2021 at 10:31:03PM +0100, Arnd Bergmann wrote:
> On Wed, Dec 8, 2021 at 8:56 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Dec 03, 2021 at 07:55:14PM +0100, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > >
> > > When VIRTIO_PCI_LIB is not built-in but the alibaba driver is, the
> > > kernel runs into a link error:
> > >
> > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> > > `eni_vdpa_set_features':
> > > eni_vdpa.c:(.text+0x23f): undefined reference to `vp_legacy_set_features'
> > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> > > `eni_vdpa_set_vq_state':
> > > eni_vdpa.c:(.text+0x2fe): undefined reference to 
> > > `vp_legacy_get_queue_enable'
> > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> > > `eni_vdpa_set_vq_address':
> > > eni_vdpa.c:(.text+0x376): undefined reference to 
> > > `vp_legacy_set_queue_address'
> > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> > > `eni_vdpa_set_vq_ready':
> > > eni_vdpa.c:(.text+0x3b4): undefined reference to 
> > > `vp_legacy_set_queue_address'
> > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> > > `eni_vdpa_free_irq':
> > > eni_vdpa.c:(.text+0x460): undefined reference to `vp_legacy_queue_vector'
> > > x86_64-linux-ld: eni_vdpa.c:(.text+0x4b7): undefined reference to 
> > > `vp_legacy_config_vector'
> > > x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> > > `eni_vdpa_reset':
> > >
> > > Selecting VIRTIO_PCI_LIB_LEGACY is not sufficient here since that is
> > > only part of the VIRTIO_PCI_LIB support.
> > >
> > > Fixes: e85087beedca ("eni_vdpa: add vDPA driver for Alibaba ENI")
> > > Signed-off-by: Arnd Bergmann 
> >
> >
> > Confused. These are all part of
> > drivers/virtio/virtio_pci_legacy_dev.c
> >
> > and
> > obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
> >
> > what gives?
> 
> The patch was wrong, see
> 
> https://lore.kernel.org/lkml/20211206085034.2836099-1-a...@kernel.org/
> 
> for the correct fix.
> 
> Arnd

Oh, ok then, I picked that one up, that explains why the issue does not
reproduce for me.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V6 01/10] Use copy_process in vhost layer

2021-12-08 Thread michael . christie
On 12/8/21 2:34 PM, Michael S. Tsirkin wrote:
> On Mon, Nov 29, 2021 at 01:46:57PM -0600, Mike Christie wrote:
>> The following patches made over Linus's tree, allow the vhost layer to do
>> a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how
>> io_uring does a copy_process against its userspace app. This allows the
>> vhost layer's worker threads to inherit cgroups, namespaces, address
>> space, etc and this worker thread will also be accounted for against that
>> owner/parent process's RLIMIT_NPROC limit.
>>
>> If you are not familiar with qemu and vhost here is more detailed
>> problem description:
>>
>> Qemu will create vhost devices in the kernel which perform network, SCSI,
>> etc IO and management operations from worker threads created by the
>> kthread API. Because the kthread API does a copy_process on the kthreadd
>> thread, the vhost layer has to use kthread_use_mm to access the Qemu
>> thread's memory and cgroup_attach_task_all to add itself to the Qemu
>> thread's cgroups.
>>
>> The problem with this approach is that we then have to add new functions/
>> args/functionality for every thing we want to inherit. I started doing
>> that here:
>>
>> https://urldefense.com/v3/__https://lkml.org/lkml/2021/6/23/1233__;!!ACWV5N9M2RV99hQ!ceUHd4m4MTJFOGccB9N5r7WonxVoYYT2XPiYwWt2-Vt1Y-DmQirRN8OqKozFLN1h73N6$
>>  
>>
>> for the RLIMIT_NPROC check, but it seems it might be easier to just
>> inherit everything from the beginning, becuase I'd need to do something
>> like that patch several times.
> 
> 
> So who's merging this? Me? Did all patches get acked by appropriate
> maintainers?
> 

Not yet.

Jens, The last open review comment was from you for the name change
and additional patch description info.

In this posting, I changed the name from:

kernel_worker/kernel_worker_start

to

user_worker_create/user_worker_start

I didn't do the start/create_user_worker format originally discussed
because a while back Christoph had given me a review comment about trying
to tie everything together into an API. Everything having the user_worker
prefix felt nicer in that it was easy to tell the functions and flags went
together, and so I thought it would handle his comment too.

And patch:

[PATCH V6 07/10] io_uring: switch to user_worker

should better explain the reason for the switch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V6 01/10] Use copy_process in vhost layer

2021-12-08 Thread Michael S. Tsirkin
On Mon, Nov 29, 2021 at 01:46:57PM -0600, Mike Christie wrote:
> The following patches made over Linus's tree, allow the vhost layer to do
> a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how
> io_uring does a copy_process against its userspace app. This allows the
> vhost layer's worker threads to inherit cgroups, namespaces, address
> space, etc and this worker thread will also be accounted for against that
> owner/parent process's RLIMIT_NPROC limit.
> 
> If you are not familiar with qemu and vhost here is more detailed
> problem description:
> 
> Qemu will create vhost devices in the kernel which perform network, SCSI,
> etc IO and management operations from worker threads created by the
> kthread API. Because the kthread API does a copy_process on the kthreadd
> thread, the vhost layer has to use kthread_use_mm to access the Qemu
> thread's memory and cgroup_attach_task_all to add itself to the Qemu
> thread's cgroups.
> 
> The problem with this approach is that we then have to add new functions/
> args/functionality for every thing we want to inherit. I started doing
> that here:
> 
> https://lkml.org/lkml/2021/6/23/1233
> 
> for the RLIMIT_NPROC check, but it seems it might be easier to just
> inherit everything from the beginning, becuase I'd need to do something
> like that patch several times.


So who's merging this? Me? Did all patches get acked by appropriate
maintainers?

> V6:
> - Rename kernel_worker to user_worker and fix prefixes.
> - Add better patch descriptions.
> V5:
> - Handle kbuild errors by building patchset against current kernel that
>   has all deps merged. Also add patch to remove create_io_thread code as
>   it's not used anymore.
> - Rebase patchset against current kernel and handle a new vm PF_IO_WORKER
>   case added in 5.16-rc1.
> - Add PF_USER_WORKER flag so we can check it later after the initial
>   thread creation for the wake up, vm and singal cses.
> - Added patch to auto reap the worker thread.
> V4:
> - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
> - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
>   added the new kernel worker functions.
> - Fixed extra "i" issue.
> - Added PF_USER_WORKER flag and added check that kernel_worker_start users
>   had that flag set. Also dropped patches that passed worker flags to
>   copy_thread and replaced with PF_USER_WORKER check.
> V3:
> - Add parentheses in p->flag and work_flags check in copy_thread.
> - Fix check in arm/arm64 which was doing the reverse of other archs
>   where it did likely(!flags) instead of unlikely(flags).
> V2:
> - Rename kernel_copy_process to kernel_worker.
> - Instead of exporting functions, make kernel_worker() a proper
>   function/API that does common work for the caller.
> - Instead of adding new fields to kernel_clone_args for each option
>   make it flag based similar to CLONE_*.
> - Drop unused completion struct in vhost.
> - Fix compile warnings by merging vhost cgroup cleanup patch and
>   vhost conversion patch.
> ~  
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATH v1 0/2] Support reading vendor statistics through iproute2/vdpa tool

2021-12-08 Thread Michael S. Tsirkin
On Wed, Nov 24, 2021 at 06:55:29PM +0200, Eli Cohen wrote:
> The following patches add support for reading vdpa statistics
> information from a vdpa instance using netlink messages.
> 
> mlx5_vdpa is also implemented to provide that information.
> 
> The supported statistics data are number of descriptors recieved by the
> virtqueue and the number of descriptors completed by it.

So I assume a v2 of this will be forthcoming, right?

> v0 -> v1:
> Emphasize that we're dealing with vendor specific counters.
> Add mutex to synchronize query stats with changing of number of queues
> 
> Eli Cohen (2):
>   vdpa: Add support for querying vendor statistics
>   vdpa/mlx5: Add support for reading descriptor statistics
> 
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 143 
>  drivers/vdpa/vdpa.c| 144 +
>  include/linux/mlx5/mlx5_ifc.h  |   1 +
>  include/linux/mlx5/mlx5_ifc_vdpa.h |  39 
>  include/linux/vdpa.h   |  10 ++
>  include/uapi/linux/vdpa.h  |   9 ++
>  6 files changed, 346 insertions(+)
> 
> -- 
> 2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2] virtio-mmio: harden interrupt

2021-12-08 Thread Michael S. Tsirkin
On Fri, Nov 26, 2021 at 12:41:02PM +0800, Jason Wang wrote:
> This patch tries to make sure the virtio interrupt handler for MMIO
> won't be called after a reset and before virtio_device_ready(). We
> can't use IRQF_NO_AUTOEN since we're using shared interrupt
> (IRQF_SHARED). So this patch tracks the interrupt enabling status in a
> new intr_soft_enabled variable and toggle it during in
> vm_disable/enable_interrupts(). The MMIO interrupt handler will check
> intr_soft_enabled before processing the actual interrupt.
> 
> Signed-off-by: Jason Wang 
> ---
> Changes since V1:
> - Silent compling warnings
>  drivers/virtio/virtio_mmio.c | 37 
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 56128b9c46eb..c517afdd2cc5 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -90,6 +90,7 @@ struct virtio_mmio_device {
>   /* a list of queues so we can dispatch IRQs */
>   spinlock_t lock;
>   struct list_head virtqueues;
> + bool intr_soft_enabled;
>  };
>  
>  struct virtio_mmio_vq_info {
> @@ -100,7 +101,37 @@ struct virtio_mmio_vq_info {
>   struct list_head node;
>  };
>  
> +/* disable irq handlers */
> +static void vm_disable_cbs(struct virtio_device *vdev)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int irq = platform_get_irq(vm_dev->pdev, 0);
>  
> + /*
> +  * The below synchronize() guarantees that any
> +  * interrupt for this line arriving after
> +  * synchronize_irq() has completed is guaranteed to see
> +  * intx_soft_enabled == false.
> +  */
> + WRITE_ONCE(vm_dev->intr_soft_enabled, false);
> + synchronize_irq(irq);
> +}
> +
> +/* enable irq handlers */
> +static void vm_enable_cbs(struct virtio_device *vdev)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> + int irq = platform_get_irq(vm_dev->pdev, 0);
> +
> + disable_irq(irq);
> + /*
> +  * The above disable_irq() provides TSO ordering and
> +  * as such promotes the below store to store-release.
> +  */
> + WRITE_ONCE(vm_dev->intr_soft_enabled, true);
> + enable_irq(irq);
> + return;
> +}
>  
>  /* Configuration interface */
>  
> @@ -262,6 +293,8 @@ static void vm_reset(struct virtio_device *vdev)
>  
>   /* 0 status means a reset. */
>   writel(0, vm_dev->base + VIRTIO_MMIO_STATUS);

There was a discussion about reading status to make sure it is clear.
The spec says we should, this can't hurt as a further hardening measure.
In fact, let's do it in the core maybe? Spec says it applies to all
devices ...

> + /* Disable VQ/configuration callbacks. */
> + vm_disable_cbs(vdev);
>  }
>  
>  
> @@ -288,6 +321,9 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
>   unsigned long flags;
>   irqreturn_t ret = IRQ_NONE;
>  
> + if (!READ_ONCE(vm_dev->intr_soft_enabled))
> + return IRQ_NONE;
> +

So if the write is seen before reset happened (should not happen, but we
are talking a buggy device) then it won't be acknowledged and device
will keep pulling the interrupt. I think as long as we are hardening
this, let's go the full mile and try to avoid DoS if we can, do the
check before invoking the callback, but do not skip the read.
Whether to still return IRQ_NONE is a good question.




>   /* Read and acknowledge interrupts */
>   status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
>   writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
> @@ -529,6 +565,7 @@ static bool vm_get_shm_region(struct virtio_device *vdev,
>  }
>  
>  static const struct virtio_config_ops virtio_mmio_config_ops = {
> + .enable_cbs = vm_enable_cbs,
>   .get= vm_get,
>   .set= vm_set,
>   .generation = vm_generation,
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] i2c: virtio: fix completion handling

2021-12-08 Thread Michael S. Tsirkin
On Thu, Dec 02, 2021 at 04:32:14PM +0100, Vincent Whitchurch wrote:
> The driver currently assumes that the notify callback is only received
> when the device is done with all the queued buffers.
> 
> However, this is not true, since the notify callback could be called
> without any of the queued buffers being completed (for example, with
> virtio-pci and shared interrupts) or with only some of the buffers being
> completed (since the driver makes them available to the device in
> multiple separate virtqueue_add_sgs() calls).
> 
> This can lead to incorrect data on the I2C bus or memory corruption in
> the guest if the device operates on buffers which are have been freed by
> the driver.  (The WARN_ON in the driver is also triggered.)
> 
>  BUG kmalloc-128 (Tainted: GW): Poison overwritten
>  First byte 0x0 instead of 0x6b
>  Allocated in i2cdev_ioctl_rdwr+0x9d/0x1de age=243 cpu=0 pid=28
>   memdup_user+0x2e/0xbd
>   i2cdev_ioctl_rdwr+0x9d/0x1de
>   i2cdev_ioctl+0x247/0x2ed
>   vfs_ioctl+0x21/0x30
>   sys_ioctl+0xb18/0xb41
>  Freed in i2cdev_ioctl_rdwr+0x1bb/0x1de age=68 cpu=0 pid=28
>   kfree+0x1bd/0x1cc
>   i2cdev_ioctl_rdwr+0x1bb/0x1de
>   i2cdev_ioctl+0x247/0x2ed
>   vfs_ioctl+0x21/0x30
>   sys_ioctl+0xb18/0xb41
> 
> Fix this by calling virtio_get_buf() from the notify handler like other
> virtio drivers and by actually waiting for all the buffers to be
> completed.
> 
> Fixes: 3cfc88380413d20f ("i2c: virtio: add a virtio i2c frontend driver")
> Acked-by: Viresh Kumar 
> Signed-off-by: Vincent Whitchurch 

Acked-by: Michael S. Tsirkin 

who's queueing this?

> ---
> 
> Notes:
> v3: Wait for all completions instead of only the last one.
> v2: Add Acked-by and Fixes tags.
> 
>  drivers/i2c/busses/i2c-virtio.c | 32 
>  1 file changed, 12 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
> index 95378780da6d..41eb0dcc3204 100644
> --- a/drivers/i2c/busses/i2c-virtio.c
> +++ b/drivers/i2c/busses/i2c-virtio.c
> @@ -22,24 +22,24 @@
>  /**
>   * struct virtio_i2c - virtio I2C data
>   * @vdev: virtio device for this controller
> - * @completion: completion of virtio I2C message
>   * @adap: I2C adapter for this controller
>   * @vq: the virtio virtqueue for communication
>   */
>  struct virtio_i2c {
>   struct virtio_device *vdev;
> - struct completion completion;
>   struct i2c_adapter adap;
>   struct virtqueue *vq;
>  };
>  
>  /**
>   * struct virtio_i2c_req - the virtio I2C request structure
> + * @completion: completion of virtio I2C message
>   * @out_hdr: the OUT header of the virtio I2C message
>   * @buf: the buffer into which data is read, or from which it's written
>   * @in_hdr: the IN header of the virtio I2C message
>   */
>  struct virtio_i2c_req {
> + struct completion completion;
>   struct virtio_i2c_out_hdr out_hdr   cacheline_aligned;
>   uint8_t *bufcacheline_aligned;
>   struct virtio_i2c_in_hdr in_hdr cacheline_aligned;
> @@ -47,9 +47,11 @@ struct virtio_i2c_req {
>  
>  static void virtio_i2c_msg_done(struct virtqueue *vq)
>  {
> - struct virtio_i2c *vi = vq->vdev->priv;
> + struct virtio_i2c_req *req;
> + unsigned int len;
>  
> - complete(>completion);
> + while ((req = virtqueue_get_buf(vq, )))
> + complete(>completion);
>  }
>  
>  static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
> @@ -62,6 +64,8 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
>   for (i = 0; i < num; i++) {
>   int outcnt = 0, incnt = 0;
>  
> + init_completion([i].completion);
> +
>   /*
>* Only 7-bit mode supported for this moment. For the address
>* format, Please check the Virtio I2C Specification.
> @@ -106,21 +110,15 @@ static int virtio_i2c_complete_reqs(struct virtqueue 
> *vq,
>   struct virtio_i2c_req *reqs,
>   struct i2c_msg *msgs, int num)
>  {
> - struct virtio_i2c_req *req;
>   bool failed = false;
> - unsigned int len;
>   int i, j = 0;
>  
>   for (i = 0; i < num; i++) {
> - /* Detach the ith request from the vq */
> - req = virtqueue_get_buf(vq, );
> + struct virtio_i2c_req *req = [i];
>  
> - /*
> -  * Condition req == [i] should always meet since we have
> -  * total num requests in the vq. reqs[i] can never be NULL here.
> -  */
> - if (!failed && (WARN_ON(req != [i]) ||
> - req->in_hdr.status != VIRTIO_I2C_MSG_OK))
> + wait_for_completion(>completion);
> +
> + if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK)
>   failed = true;
>  
>   

Re: [PATCH] eni_vdpa: alibaba: select VIRTIO_PCI_LIB

2021-12-08 Thread Michael S. Tsirkin
On Fri, Dec 03, 2021 at 07:55:14PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> When VIRTIO_PCI_LIB is not built-in but the alibaba driver is, the
> kernel runs into a link error:
> 
> x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> `eni_vdpa_set_features':
> eni_vdpa.c:(.text+0x23f): undefined reference to `vp_legacy_set_features'
> x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> `eni_vdpa_set_vq_state':
> eni_vdpa.c:(.text+0x2fe): undefined reference to `vp_legacy_get_queue_enable'
> x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> `eni_vdpa_set_vq_address':
> eni_vdpa.c:(.text+0x376): undefined reference to `vp_legacy_set_queue_address'
> x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> `eni_vdpa_set_vq_ready':
> eni_vdpa.c:(.text+0x3b4): undefined reference to `vp_legacy_set_queue_address'
> x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> `eni_vdpa_free_irq':
> eni_vdpa.c:(.text+0x460): undefined reference to `vp_legacy_queue_vector'
> x86_64-linux-ld: eni_vdpa.c:(.text+0x4b7): undefined reference to 
> `vp_legacy_config_vector'
> x86_64-linux-ld: drivers/vdpa/alibaba/eni_vdpa.o: in function 
> `eni_vdpa_reset':
> 
> Selecting VIRTIO_PCI_LIB_LEGACY is not sufficient here since that is
> only part of the VIRTIO_PCI_LIB support.
> 
> Fixes: e85087beedca ("eni_vdpa: add vDPA driver for Alibaba ENI")
> Signed-off-by: Arnd Bergmann 


Confused. These are all part of
drivers/virtio/virtio_pci_legacy_dev.c

and
obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o

what gives?


> ---
>  drivers/vdpa/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> index 50f45d037611..04466603021f 100644
> --- a/drivers/vdpa/Kconfig
> +++ b/drivers/vdpa/Kconfig
> @@ -80,6 +80,7 @@ config VP_VDPA
>  
>  config ALIBABA_ENI_VDPA
>   tristate "vDPA driver for Alibaba ENI"
> + select VIRTIO_PCI_LIB
>   select VIRTIO_PCI_LIB_LEGACY
>   depends on PCI_MSI && X86
>   help
> -- 
> 2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/7] vhost_vsock: simplify vhost_vsock_flush()

2021-12-08 Thread michael . christie
On 12/7/21 9:53 PM, Jason Wang wrote:
> On Tue, Dec 7, 2021 at 10:45 AM Mike Christie
>  wrote:
>>
>> From: Andrey Ryabinin 
>>
>> vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev)
>> before vhost_work_dev_flush(>dev). This seems pointless
>> as vsock->vqs[i].poll.dev is the same as >dev and several flushes
>> in a row doesn't do anything useful, one is just enough.
>>
>> Signed-off-by: Andrey Ryabinin 
>> Reviewed-by: Stefano Garzarella 
>> Signed-off-by: Mike Christie 
>> ---
>>  drivers/vhost/vsock.c | 5 -
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 2339587bcd31..1f38160b249d 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -703,11 +703,6 @@ static int vhost_vsock_dev_open(struct inode *inode, 
>> struct file *file)
>>
>>  static void vhost_vsock_flush(struct vhost_vsock *vsock)
>>  {
>> -   int i;
>> -
>> -   for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
>> -   if (vsock->vqs[i].handle_kick)
>> -   vhost_work_dev_flush(vsock->vqs[i].poll.dev);
>> vhost_work_dev_flush(>dev);
>>  }
> 
> Is this better to be consistent with vhost-net so that we can simply
> remove the vhost_vsock_flush() wrapper here.
> 

I didn't understand that comment.

Did you mean consistent as they both have vhost_vsock/net_flush functions
or as in they prefer to not have one line wrappers?

Before and after this patchset, both net and vsock have a vhost_vsock/net_flush
function, so maybe you didn't mean that.

I think the wrapper is not very useful and could be removed. However,
I thought vsock preferred wrappers because we have vhost_vsock_free
which is just a wrapper around kfree. I also noticed test.c is a
fan of one line wrappers, but I see net and scsi do not do that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 6/7] vhost-scsi: drop flush after vhost_dev_cleanup

2021-12-08 Thread Mike Christie
On 12/7/21 10:07 PM, Jason Wang wrote:
> On Tue, Dec 7, 2021 at 10:45 AM Mike Christie
>  wrote:
>>
>> The flush after vhost_dev_cleanup is not needed because:
>>
>> 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread
>> so the flush call will just return since the worker has not device.
>>
>> 2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs
>> the mutex and if the backend is NULL will return without queueing a work.
>> vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex
>> then drops the mutex and does a flush. So we know when
>> vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend
>> no evt related work will be able to requeue. The flush would then make sure
>> any queued evts are run and return.
> 
> What happens if a kick after vhost_scsi_clear_endpoint() but before
> vhost_dev_stop()?

vhost_dev_stop also does a flush, so:

1. The kick handler would see the backend is null and return immediately.
2. The flush in vhost_dev_stop would wait for those kicks in #1 to complete.


> 
> Is this safe or the kthread_stop() in vhost_dev_cleanup() makes us safe?
> 
> Thanks
> 
>>
>> Signed-off-by: Mike Christie 
>> ---
>>  drivers/vhost/scsi.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
>> index 532e204f2b1b..94535c813ef7 100644
>> --- a/drivers/vhost/scsi.c
>> +++ b/drivers/vhost/scsi.c
>> @@ -1827,8 +1827,6 @@ static int vhost_scsi_release(struct inode *inode, 
>> struct file *f)
>> vhost_scsi_clear_endpoint(vs, );
>> vhost_dev_stop(>dev);
>> vhost_dev_cleanup(>dev);
>> -   /* Jobs can re-queue themselves in evt kick handler. Do extra flush. 
>> */
>> -   vhost_scsi_flush(vs);
>> kfree(vs->dev.vqs);
>> kvfree(vs);
>> return 0;
>> --
>> 2.25.1
>>
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/7] vhost: get rid of vhost_poll_flush() wrapper

2021-12-08 Thread Mike Christie
On 12/7/21 9:49 PM, Jason Wang wrote:
> On Tue, Dec 7, 2021 at 10:45 AM Mike Christie
>  wrote:
>>
>> vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
>> It gives wrong impression that we are doing some work over vhost_poll,
>> while in fact it flushes vhost_poll->dev.
> 
> This "problem" is a byproduct of 7235acdb1144 ("vhost: simplify work 
> flushing").
> 
> Before that we indeed have per poll flush flush.
> 
>> It only complicate understanding of the code and leads to mistakes
>> like flushing the same vhost_dev several times in a row.
>>
>> Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.
> 
> Not a native speaker but since we don't have an per work flush, is it
> better to rename this simply as vhost_flush()?
> 

What about vhost_dev_flush?

For the existing naming when we have a function exported we tend to have
"vhost_" then the object/struct it works on then the action.

For work we have:

vhost_work_queue/init

(we also have vhost_has_work which doesn't follow that pattern but
would sound strange as vhost_work_has so ignore that one).

For dev operations we have:

vhost_dev_reset_owner/set_owner/has_owner/cleanup/init

For the flush operation I wanted it to reflect it flushed all work
on the device, so I mashed up the work and dev naming above and
I agree it's a little strange.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vduse: check that offset is within bounds in get_config()

2021-12-08 Thread Dan Carpenter
This condition checks "len" but it does not check "offset" and that
could result in an out of bounds read if "offset > dev->config_size".
The problem is that since both variables are unsigned the
"dev->config_size - offset" subtraction would result in a very high
unsigned value.

I think these checks might not be necessary because "len" and "offset"
are supposed to already have been validated using the
vhost_vdpa_config_validate() function.  But I do not know the code
perfectly, and I like to be safe.

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Dan Carpenter 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index c9204c62f339..6693a2c9e4a6 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -655,7 +655,8 @@ static void vduse_vdpa_get_config(struct vdpa_device *vdpa, 
unsigned int offset,
 {
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
 
-   if (len > dev->config_size - offset)
+   if (offset > dev->config_size ||
+   len > dev->config_size - offset)
return;
 
memcpy(buf, dev->config + offset, len);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio: make sure legacy pci device gain 32bit-pfn vq

2021-12-08 Thread Michael S. Tsirkin
On Wed, Dec 08, 2021 at 04:04:19PM +0800, 王贇 wrote:
> 
> 
> 在 2021/12/8 下午3:23, Michael S. Tsirkin 写道:
> > On Tue, Dec 07, 2021 at 05:09:56PM +0800, 王贇 wrote:
> > > 
> > > 
> > > 在 2021/12/7 下午4:13, Michael S. Tsirkin 写道:
> > > > On Tue, Dec 07, 2021 at 03:51:45PM +0800, 王贇 wrote:
> > > > > We observed issues like:
> > > > > virtio-pci :14:00.0: platform bug: legacy virtio-mmio must
> > > > > not be used with RAM above 0x4000GB
> > > > > 
> > > > > when we have a legacy pci device which desired 32bit-pfn vq
> > > > > but gain 64bit-pfn instead, lead into the failure of probe.
> > > > > 
> > > > > vring_use_dma_api() is playing the key role in here, to help the
> > > > > allocation process understand which kind of vq it should alloc,
> > > > > however, it failed to take care the legacy pci device, which only
> > > > > have 32bit feature flag and can never have VIRTIO_F_ACCESS_PLATFORM
> > > > > setted.
> > > > > 
> > > > > This patch introduce force_dma flag to help vring_use_dma_api()
> > > > > understanding the requirement better, to avoid the failing.
> > > > > 
> > > > > Signed-off-by: Michael Wang 
> > > > 
> > > > This will break configs where the device appears behind
> > > > a virtual iommu, so this won't fly.
> > > > Just make your device support 1.0, eh?
> > > 
> > > Hi, Michael
> > > 
> > > Thanks for the comment, unfortunately modify device is not an option for 
> > > us
> > > :-(
> > > 
> > > Is there any idea on how to solve this issue properly?
> > > 
> > > Regards,
> > > Michael Wang
> > 
> > By the way, there is a bug in the error message. Want to fix that?
> 
> Could you please provide more detail about the bug? We'd like to help fixing
> it :-)

virtio-pci :14:00.0: platform bug: legacy virtio-mmio must ...

should be virtio-pci not virtio-mmio



> Besides, I've checked that patch but it can't address our issue, we actually
> have this legacy pci device on arm platform, and the memory layout is
> unfriendly since allocation rarely providing page-address below 44bit, we
> understand the virtio-iommu case should not do force dma, while we don't
> have that so it's just working fine.
> 
> Regards,
> Michael Wang

Hmm wait a sec is it a physical device or a hypervisor?
If a physical one then doesn't it need VIRTIO_F_ORDER_PLATFORM
on ARM?



> > 
> > 
> > > > 
> > > > > ---
> > > > >drivers/virtio/virtio_pci_legacy.c | 10 ++
> > > > >drivers/virtio/virtio_ring.c   |  3 +++
> > > > >include/linux/virtio.h |  1 +
> > > > >3 files changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_pci_legacy.c
> > > > > b/drivers/virtio/virtio_pci_legacy.c
> > > > > index d62e983..11f2ebf 100644
> > > > > --- a/drivers/virtio/virtio_pci_legacy.c
> > > > > +++ b/drivers/virtio/virtio_pci_legacy.c
> > > > > @@ -263,6 +263,16 @@ int virtio_pci_legacy_probe(struct 
> > > > > virtio_pci_device
> > > > > *vp_dev)
> > > > >   vp_dev->setup_vq = setup_vq;
> > > > >   vp_dev->del_vq = del_vq;
> > > > > 
> > > > > + /*
> > > > > +  * The legacy pci device requre 32bit-pfn vq,
> > > > > +  * or setup_vq() will failed.
> > > > > +  *
> > > > > +  * Thus we make sure vring_use_dma_api() will
> > > > > +  * return true during the allocation by marking
> > > > > +  * force_dma here.
> > > > > +  */
> > > > > + vp_dev->vdev.force_dma = true;
> > > > > +
> > > > >   return 0;
> > > > > 
> > > > >err_iomap:
> > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > b/drivers/virtio/virtio_ring.c
> > > > > index 3035bb6..6562e01 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -245,6 +245,9 @@ static inline bool virtqueue_use_indirect(struct
> > > > > virtqueue *_vq,
> > > > > 
> > > > >static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > >{
> > > > > + if (vdev->force_dma)
> > > > > + return true;
> > > > > +
> > > > >   if (!virtio_has_dma_quirk(vdev))
> > > > >   return true;
> > > > > 
> > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > index 41edbc0..a4eb29d 100644
> > > > > --- a/include/linux/virtio.h
> > > > > +++ b/include/linux/virtio.h
> > > > > @@ -109,6 +109,7 @@ struct virtio_device {
> > > > >   bool failed;
> > > > >   bool config_enabled;
> > > > >   bool config_change_pending;
> > > > > + bool force_dma;
> > > > >   spinlock_t config_lock;
> > > > >   spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > >   struct device dev;
> > > > > -- 
> > > > > 1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 2/2 v4] vdpa: check that offsets are within bounds

2021-12-08 Thread Dan Carpenter
In this function "c->off" is a u32 and "size" is a long.  On 64bit systems
if "c->off" is greater than "size" then "size - c->off" is a negative and
we always return -E2BIG.  But on 32bit systems the subtraction is type
promoted to a high positive u32 value and basically any "c->len" is
accepted.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Reported-by: Xie Yongji 
Signed-off-by: Dan Carpenter 
---
v4: split into a separate patch

 drivers/vhost/vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 29cced1cd277..e3c4f059b21a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -197,7 +197,7 @@ static int vhost_vdpa_config_validate(struct vhost_vdpa *v,
struct vdpa_device *vdpa = v->vdpa;
long size = vdpa->config->get_config_size(vdpa);
 
-   if (c->len == 0)
+   if (c->len == 0 || c->off > size)
return -EINVAL;
 
if (c->len > size - c->off)
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2 v4] vduse: fix memory corruption in vduse_dev_ioctl()

2021-12-08 Thread Dan Carpenter
The "config.offset" comes from the user.  There needs to a check to
prevent it being out of bounds.  The "config.offset" and
"dev->config_size" variables are both type u32.  So if the offset if
out of bounds then the "dev->config_size - config.offset" subtraction
results in a very high u32 value.  The out of bounds offset can result
in memory corruption.

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Dan Carpenter 
---
v2: fix reversed if statement
v3: fix vhost_vdpa_config_validate() as pointed out by Yongji Xie.
v4: split the vhost_vdpa_config_validate() change into a separate path

 drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index c9204c62f339..1a206f95d73a 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -975,7 +975,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned int 
cmd,
break;
 
ret = -EINVAL;
-   if (config.length == 0 ||
+   if (config.offset > dev->config_size ||
+   config.length == 0 ||
config.length > dev->config_size - config.offset)
break;
 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drm: Return error codes from struct drm_driver.gem_create_object

2021-12-08 Thread Thomas Zimmermann

Hi

Am 07.12.21 um 10:24 schrieb Thomas Zimmermann:

Hi

Am 07.12.21 um 09:55 schrieb Dan Carpenter:

I appologize.  This thread has been really frustrating.  I got mixed up
because I recently sent patches for ingenic and vc4.  Also we are
working against different trees so maybe that is part of the problem?

I'm looking at today's linux-next.  Your patch has been applied.

Yes.  You updated all the drivers.  But somehow the vc4 chunk from your
patch was dropped.  It was *NOT* dropped by Stephen Rothwell.  It got
dropped earlier.  I am including the `git format-patch -1 ` output
from the commit.


My vc4 change is in drm-misc-next, [1] but not in drm-tip. [2] Your vc4 
patch went through drm-misc-fixes.


drm-tip is build automatically by our DRM tools from the various trees. 
The tools took my patch from drm-misc-next and your patch from 
drm-misc-fixes and the result was broken.


Thanks for bringing this up. I'll see what I can do about it.


FYI I fixed drm-tip to return a pointer-encoded error in vc4. [1]

Best regards
Thomas

[1] 
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/vc4/vc4_bo.c?id=cc3b9eb186e3c8dfb0bcc7d54fa4bfbe52c0b58b#n394





Best regards
Thomas

[1] 
https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/vc4/vc4_bo.c#n394 

[2] 
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/vc4/vc4_bo.c





--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization