date:20211013

Re: [PATCH V2 03/12] virtio-console: switch to use .validate()

2021-10-13 Thread Michael S. Tsirkin

On Thu, Oct 14, 2021 at 10:28:06AM +0800, Jason Wang wrote:
> On Wed, Oct 13, 2021 at 5:51 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Oct 12, 2021 at 02:52:18PM +0800, Jason Wang wrote:
> > > This patch switches to use validate() to filter out the features that
> > > is not supported by the rproc.
> >
> > are not supported
> >
> > >
> > > Cc: Amit Shah 
> > > Signed-off-by: Jason Wang 
> >
> >
> > Does this have anything to do with hardening?
> >
> > It seems cleaner to not negotiate features we do not use,
> > but given we did this for many years ... should we bother
> > at this point?
> 
> It looks cleaner to do all the validation in one places:
> 
> 1) check unsupported feature for rproc
> 2) validate the max_nr_ports (as has been done in patch 4)
> 
> >
> >
> > > ---
> > >  drivers/char/virtio_console.c | 41 ++-
> > >  1 file changed, 26 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index 7eaf303a7a86..daeed31df622 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -1172,9 +1172,7 @@ static void resize_console(struct port *port)
> > >
> > >   vdev = port->portdev->vdev;
> > >
> > > - /* Don't test F_SIZE at all if we're rproc: not a valid feature! */
> > > - if (!is_rproc_serial(vdev) &&
> > > - virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
> > > + if (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
> > >   hvc_resize(port->cons.hvc, port->cons.ws);
> > >  }
> > >
> > > @@ -1981,6 +1979,29 @@ static void virtcons_remove(struct virtio_device 
> > > *vdev)
> > >   kfree(portdev);
> > >  }
> > >
> > > +static int virtcons_validate(struct virtio_device *vdev)
> > > +{
> > > + if (is_rproc_serial(vdev)) {
> > > + /* Don't test F_SIZE at all if we're rproc: not a
> > > +  * valid feature! */
> >
> >
> > This comment needs to be fixed now. And the format's wrong
> > since you made it a multi-line comment.
> > Should be
> > /*
> >  * like
> >  * this
> >  */
> 
> Ok.
> 
> >
> > > + __virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_SIZE);
> > > + /* Don't test MULTIPORT at all if we're rproc: not a
> > > +  * valid feature! */
> > > + __virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
> > > + }
> > > +
> > > + /* We only need a config space if features are offered */
> > > + if (!vdev->config->get &&
> > > + (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
> > > +  || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
> > > + dev_err(>dev, "%s failure: config access disabled\n",
> > > + __func__);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  /*
> > >   * Once we're further in boot, we get probed like any other virtio
> > >   * device.
> >
> > This switches the order of tests around, so if an rproc device
> > offers VIRTIO_CONSOLE_F_SIZE or VIRTIO_CONSOLE_F_MULTIPORT
> > without get it will now try to work instead of failing.
> 
> Ok, so we can fail the validation in this case.

We can. But if we are going to, then it's easier to just fail probe.
Or if you want to try and work around this case,
that's also reasonable but pls document in the commit log.


> Thanks
> 
> >
> > Which is maybe a worthy goal, but given rproc does not support
> > virtio 1.0 it also risks trying to drive something completely
> > unreasonable.
> >
> > Overall does not feel like hardening which is supposed to make
> > things more strict, not less.
> >
> >
> > > @@ -1996,15 +2017,6 @@ static int virtcons_probe(struct virtio_device 
> > > *vdev)
> > >   bool multiport;
> > >   bool early = early_put_chars != NULL;
> > >
> > > - /* We only need a config space if features are offered */
> > > - if (!vdev->config->get &&
> > > - (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
> > > -  || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
> > > - dev_err(>dev, "%s failure: config access disabled\n",
> > > - __func__);
> > > - return -EINVAL;
> > > - }
> > > -
> > >   /* Ensure to read early_put_chars now */
> > >   barrier();
> > >
> > > @@ -2031,9 +2043,7 @@ static int virtcons_probe(struct virtio_device 
> > > *vdev)
> > >   multiport = false;
> > >   portdev->max_nr_ports = 1;
> > >
> > > - /* Don't test MULTIPORT at all if we're rproc: not a valid feature! 
> > > */
> > > - if (!is_rproc_serial(vdev) &&
> > > - virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
> > > + if (virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
> > >struct virtio_console_config, max_nr_ports,
> > >>max_nr_ports) == 0) {
> > >

Re: [PATCH] Revert "virtio-blk: Add validation for block size in config space"

2021-10-13 Thread Michael S. Tsirkin

On Thu, Oct 14, 2021 at 10:45:25AM +0800, Jason Wang wrote:
> On Wed, Oct 13, 2021 at 8:46 PM Michael S. Tsirkin  wrote:
> >
> > It turns out that access to config space before completing the feature
> > negotiation is broken for big endian guests at least with QEMU hosts up
> > to 6.1 inclusive.  This affects any device that accesses config space in
> > the validate callback: at the moment that is virtio-net with
> > VIRTIO_NET_F_MTU but since 82e89ea077b9 ("virtio-blk: Add validation for
> > block size in config space") that also started affecting virtio-blk with
> > VIRTIO_BLK_F_BLK_SIZE.
> 
> Ok, so it looks like I need to do something similar in my series to
> validate max_nr_ports in probe() instead of validate()? (multi port is
> not enabled by default).

I think what we should do is fix upstream QEMU first. With that, plus
Halil's work-around writing out the version 1 feature, we should be
good reading config in validate.

> 
> I wonder if we need to document this and rename the validate to
> validate_features() (since we can do very little thing if we can't
> read config in .validate()).

That's not a bad idea anyway - any validation that does not
need to clear features can be done in probe in any case.


> > Further, unlike VIRTIO_NET_F_MTU which is off by
> > default on QEMU, VIRTIO_BLK_F_BLK_SIZE is on by default, which resulted
> > in lots of people not being able to boot VMs on BE.
> >
> > The spec is very clear that what we are doing is legal so QEMU needs to
> > be fixed, but given it's been broken for so many years and no one
> > noticed, we need to give QEMU a bit more time before applying this.
> >
> > Further, this patch is incomplete (does not check blk size is a power
> > of two) and it duplicates the logic from nbd.
> >
> > Revert for now, and we'll reapply a cleaner logic in the next release.
> >
> > Cc: sta...@vger.kernel.org
> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
> > space")
> > Cc: Xie Yongji 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Acked-by: jason Wang 
> 
> > ---
> >  drivers/block/virtio_blk.c | 37 ++---
> >  1 file changed, 6 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 9b3bd083b411..303caf2d17d0 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -689,28 +689,6 @@ static const struct blk_mq_ops virtio_mq_ops = {
> >  static unsigned int virtblk_queue_depth;
> >  module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
> >
> > -static int virtblk_validate(struct virtio_device *vdev)
> > -{
> > -   u32 blk_size;
> > -
> > -   if (!vdev->config->get) {
> > -   dev_err(>dev, "%s failure: config access disabled\n",
> > -   __func__);
> > -   return -EINVAL;
> > -   }
> > -
> > -   if (!virtio_has_feature(vdev, VIRTIO_BLK_F_BLK_SIZE))
> > -   return 0;
> > -
> > -   blk_size = virtio_cread32(vdev,
> > -   offsetof(struct virtio_blk_config, blk_size));
> > -
> > -   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE)
> > -   __virtio_clear_bit(vdev, VIRTIO_BLK_F_BLK_SIZE);
> > -
> > -   return 0;
> > -}
> > -
> >  static int virtblk_probe(struct virtio_device *vdev)
> >  {
> > struct virtio_blk *vblk;
> > @@ -722,6 +700,12 @@ static int virtblk_probe(struct virtio_device *vdev)
> > u8 physical_block_exp, alignment_offset;
> > unsigned int queue_depth;
> >
> > +   if (!vdev->config->get) {
> > +   dev_err(>dev, "%s failure: config access disabled\n",
> > +   __func__);
> > +   return -EINVAL;
> > +   }
> > +
> > err = ida_simple_get(_index_ida, 0, minor_to_index(1 << 
> > MINORBITS),
> >  GFP_KERNEL);
> > if (err < 0)
> > @@ -836,14 +820,6 @@ static int virtblk_probe(struct virtio_device *vdev)
> > else
> > blk_size = queue_logical_block_size(q);
> >
> > -   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE) {
> > -   dev_err(>dev,
> > -   "block size is changed unexpectedly, now is %u\n",
> > -   blk_size);
> > -   err = -EINVAL;
> > -   goto out_cleanup_disk;
> > -   }
> > -
> > /* Use topology information if available */
> > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
> >struct virtio_blk_config, 
> > physical_block_exp,
> > @@ -1009,7 +985,6 @@ static struct virtio_driver virtio_blk = {
> > .driver.name= KBUILD_MODNAME,
> > .driver.owner   = THIS_MODULE,
> > .id_table   = id_table,
> > -   .validate   = virtblk_validate,
> > .probe  = virtblk_probe,
> >

Re: [PATCH V2 07/12] virtio-pci: harden INTX interrupts

2021-10-13 Thread Michael S. Tsirkin

On Thu, Oct 14, 2021 at 10:35:48AM +0800, Jason Wang wrote:
> On Wed, Oct 13, 2021 at 5:42 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Oct 12, 2021 at 02:52:22PM +0800, Jason Wang wrote:
> > > This patch tries to make sure the virtio interrupt handler for INTX
> > > won't be called after a reset and before virtio_device_ready(). We
> > > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > > intx_soft_enabled variable and toggle it during in
> > > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > > intx_soft_enabled before processing the actual interrupt.
> > >
> > > Cc: Boqun Feng 
> > > Cc: Thomas Gleixner 
> > > Cc: Peter Zijlstra 
> > > Cc: Paul E. McKenney 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >  drivers/virtio/virtio_pci_common.c | 24 ++--
> > >  drivers/virtio/virtio_pci_common.h |  1 +
> > >  2 files changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c 
> > > b/drivers/virtio/virtio_pci_common.c
> > > index 0b9523e6dd39..5ae6a2a4eb77 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -30,8 +30,16 @@ void vp_disable_vectors(struct virtio_device *vdev)
> > >   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >   int i;
> > >
> > > - if (vp_dev->intx_enabled)
> > > + if (vp_dev->intx_enabled) {
> > > + /*
> > > +  * The below synchronize() guarantees that any
> > > +  * interrupt for this line arriving after
> > > +  * synchronize_irq() has completed is guaranteed to see
> > > +  * intx_soft_enabled == false.
> > > +  */
> > > + WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> > >   synchronize_irq(vp_dev->pci_dev->irq);
> > > + }
> > >
> > >   for (i = 0; i < vp_dev->msix_vectors; ++i)
> > >   disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > @@ -43,8 +51,16 @@ void vp_enable_vectors(struct virtio_device *vdev)
> > >   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >   int i;
> > >
> > > - if (vp_dev->intx_enabled)
> > > + if (vp_dev->intx_enabled) {
> > > + disable_irq(vp_dev->pci_dev->irq);
> > > + /*
> > > +  * The above disable_irq() provides TSO ordering and
> > > +  * as such promotes the below store to store-release.
> > > +  */
> > > + WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> > > + enable_irq(vp_dev->pci_dev->irq);
> > >   return;
> > > + }
> > >
> > >   for (i = 0; i < vp_dev->msix_vectors; ++i)
> > >   enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > @@ -97,6 +113,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> > >   struct virtio_pci_device *vp_dev = opaque;
> > >   u8 isr;
> > >
> > > + /* read intx_soft_enabled before read others */
> > > + if (!smp_load_acquire(_dev->intx_soft_enabled))
> > > + return IRQ_NONE;
> > > +
> > >   /* reading the ISR has the effect of also clearing it so it's very
> > >* important to save off the value. */
> > >   isr = ioread8(vp_dev->isr);
> >
> > I don't see why we need this ordering guarantee here.
> >
> > synchronize_irq above makes sure no interrupt handler
> > is in progress.
> 
> Yes.
> 
> > the handler itself thus does not need
> > any specific order, it is ok if intx_soft_enabled is read
> > after, not before the rest of it.
> 
> But the interrupt could be raised after synchronize_irq() which may
> see a false of the intx_soft_enabled.

You mean a "true" value right? false is what we are writing there.

Are you sure it can happen? I think that synchronize_irq makes the value
visible on all CPUs running the irq.

> In this case we still need the
> make sure intx_soft_enbled to be read first instead of allowing other
> operations to be done first, otherwise the intx_soft_enabled is
> meaningless.
> 
> Thanks

If intx_soft_enbled were not visible after synchronize_irq then
it does not matter in which order we read it wrt other values,
it still wouldn't work right.

> >
> > Just READ_ONCE should be enough, and we can drop the comment.
> >
> >
> > > diff --git a/drivers/virtio/virtio_pci_common.h 
> > > b/drivers/virtio/virtio_pci_common.h
> > > index a235ce9ff6a5..3c06e0f92ee4 100644
> > > --- a/drivers/virtio/virtio_pci_common.h
> > > +++ b/drivers/virtio/virtio_pci_common.h
> > > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> > >   /* MSI-X support */
> > >   int msix_enabled;
> > >   int intx_enabled;
> > > + bool intx_soft_enabled;
> > >   cpumask_var_t *msix_affinity_masks;
> > >   /* Name strings for interrupts. This size should be enough,
> > >* and I'm too lazy to allocate each name separately. */
> > > --
> > >

Re: [PATCH V2 01/12] virtio-blk: validate num_queues during probe

2021-10-13 Thread Michael S. Tsirkin

On Thu, Oct 14, 2021 at 10:32:32AM +0800, Jason Wang wrote:
> On Wed, Oct 13, 2021 at 6:04 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Oct 12, 2021 at 02:52:16PM +0800, Jason Wang wrote:
> > > If an untrusted device neogitates BLK_F_MQ but advertises a zero
> > > num_queues, the driver may end up trying to allocating zero size
> > > buffers where ZERO_SIZE_PTR is returned which may pass the checking
> > > against the NULL. This will lead unexpected results.
> > >
> > > Fixing this by using single queue if num_queues is zero.
> > >
> > > Cc: Paolo Bonzini 
> > > Cc: Stefan Hajnoczi 
> > > Cc: Stefano Garzarella 
> > > Reviewed-by: Stefan Hajnoczi 
> > > Signed-off-by: Jason Wang 
> >
> > I'd rather fail probe so we don't need to support that.
> 
> I think we should be consistent among all virtio drivers.

Well we started being permissive. We can't change that
since that might break on some hosts. But given focus on
security being restrictive sounds better now.

> E.g without this patch, we stick to 1 if virtio_create_feature() fail.
> Do we need to fix that?

We can't easily, some hosts might be broken.

> And we do something similar at least for the virtio-net and a lot of
> other places.
> 
> /* We need at least 2 queue's */
> if (err || max_queue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
> max_queue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
> !virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> max_queue_pairs = 1;
> 
> Thanks
> 
> >
> > > ---
> > >  drivers/block/virtio_blk.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 9b3bd083b411..9deff01a38cb 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -495,7 +495,8 @@ static int init_vq(struct virtio_blk *vblk)
> > >   err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
> > >  struct virtio_blk_config, num_queues,
> > >  _vqs);
> > > - if (err)
> > > + /* We need at least one virtqueue */
> > > + if (err || !num_vqs)
> > >   num_vqs = 1;
> > >
> > >   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
> > > --
> > > 2.25.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 3/3] vdpa: Check for iova range at mappings changes

2021-10-13 Thread Jason Wang

On Tue, Oct 12, 2021 at 10:07 PM Eugenio Pérez  wrote:
>
> Check vdpa device range before updating memory regions so we don't add
> any outside of it, and report the invalid change if any.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  include/hw/virtio/vhost-vdpa.h |  2 ++
>  hw/virtio/vhost-vdpa.c | 62 +-
>  hw/virtio/trace-events |  1 +
>  3 files changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index a8963da2d9..c288cf7ecb 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -13,6 +13,7 @@
>  #define HW_VIRTIO_VHOST_VDPA_H
>
>  #include "hw/virtio/virtio.h"
> +#include "standard-headers/linux/vhost_types.h"
>
>  typedef struct VhostVDPAHostNotifier {
>  MemoryRegion mr;
> @@ -24,6 +25,7 @@ typedef struct vhost_vdpa {
>  uint32_t msg_type;
>  bool iotlb_batch_begin_sent;
>  MemoryListener listener;
> +struct vhost_vdpa_iova_range iova_range;
>  struct vhost_dev *dev;
>  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index be7c63b4ba..dbf773d032 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const 
> MemoryRegionSection *section)
>  return llend;
>  }
>
> -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
> -{
> -return (!memory_region_is_ram(section->mr) &&
> -!memory_region_is_iommu(section->mr)) ||
> -memory_region_is_protected(section->mr) ||
> -   /* vhost-vDPA doesn't allow MMIO to be mapped  */
> -memory_region_is_ram_device(section->mr) ||
> -   /*
> -* Sizing an enabled 64-bit BAR can cause spurious mappings to
> -* addresses in the upper part of the 64-bit address space.  These
> -* are never accessed by the CPU and beyond the address width of
> -* some IOMMU hardware.  TODO: VDPA should tell us the IOMMU 
> width.
> -*/
> -   section->offset_within_address_space & (1ULL << 63);
> +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> +uint64_t iova_min,
> +uint64_t iova_max)
> +{
> +Int128 llend;
> +
> +if ((!memory_region_is_ram(section->mr) &&
> + !memory_region_is_iommu(section->mr)) ||
> +memory_region_is_protected(section->mr) ||
> +/* vhost-vDPA doesn't allow MMIO to be mapped  */
> +memory_region_is_ram_device(section->mr)) {
> +return true;
> +}
> +
> +if (section->offset_within_address_space < iova_min) {
> +error_report("RAM section out of device range (min=%lu, addr=%lu)",
> + iova_min, section->offset_within_address_space);
> +return true;
> +}
> +
> +llend = vhost_vdpa_section_end(section);
> +if (int128_gt(llend, int128_make64(iova_max))) {
> +error_report("RAM section out of device range (max=%lu, end 
> addr=%lu)",
> + iova_max, int128_get64(llend));
> +return true;
> +}
> +
> +return false;
>  }
>
>  static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> @@ -162,7 +176,8 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
> *listener,
>  void *vaddr;
>  int ret;
>
> -if (vhost_vdpa_listener_skipped_section(section)) {
> +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
> +v->iova_range.last)) {
>  return;
>  }
>
> @@ -220,7 +235,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
> *listener,
>  Int128 llend, llsize;
>  int ret;
>
> -if (vhost_vdpa_listener_skipped_section(section)) {
> +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
> +v->iova_range.last)) {
>  return;
>  }
>
> @@ -288,6 +304,19 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, 
> uint8_t status)
>  vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, );
>  }
>
> +static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> +{
> +int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
> +  >iova_range);
> +if (ret != 0) {
> +v->iova_range.first = 0;
> +v->iova_range.last = MAKE_64BIT_MASK(0, 63);

Nit:

ULLONG_MAX?

Others look good.

Thanks

> +}
> +
> +trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> +v->iova_range.last);
> +}
> +
>  static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>  {
>  struct vhost_vdpa *v;
> @@ -300,6 +329,7 @@ static int

RE: [PATCH 2/5] iommu/virtio: Support bypass domains

2021-10-13 Thread Tian, Kevin

> From: Tian, Kevin
> Sent: Thursday, October 14, 2021 11:25 AM
> 
> > From: Jean-Philippe Brucker 
> > Sent: Wednesday, October 13, 2021 8:11 PM
> >
> > The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the
> > ATTACH
> > request, that creates a bypass domain. Use it to enable identity
> > domains.
> >
> > When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device,
> > we
> > currently fail attaching to an identity domain. Future patches will
> > instead create identity mappings in this case.
> >
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/virtio-iommu.c | 20 +++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > index 80930ce04a16..ee8a7afd667b 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -71,6 +71,7 @@ struct viommu_domain {
> > struct rb_root_cached   mappings;
> >
> > unsigned long   nr_endpoints;
> > +   boolbypass;
> >  };
> >
> >  struct viommu_endpoint {
> > @@ -587,7 +588,9 @@ static struct iommu_domain
> > *viommu_domain_alloc(unsigned type)
> >  {
> > struct viommu_domain *vdomain;
> >
> > -   if (type != IOMMU_DOMAIN_UNMANAGED && type !=
> > IOMMU_DOMAIN_DMA)
> > +   if (type != IOMMU_DOMAIN_UNMANAGED &&
> > +   type != IOMMU_DOMAIN_DMA &&
> > +   type != IOMMU_DOMAIN_IDENTITY)
> > return NULL;
> >
> > vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
> > @@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct
> > viommu_endpoint *vdev,
> > vdomain->map_flags  = viommu->map_flags;
> > vdomain->viommu = viommu;
> >
> > +   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
> > +   if (!virtio_has_feature(viommu->vdev,
> > +   VIRTIO_IOMMU_F_BYPASS_CONFIG))
> > {
> > +   ida_free(>domain_ids, vdomain->id);
> > +   vdomain->viommu = 0;
> > +   return -EOPNOTSUPP;
> > +   }
> > +
> > +   vdomain->bypass = true;
> > +   }
> > +
> 
> move to the start of the function, then no need for above cleanup.
> 

forgot it as I see the reason now when looking at patch05
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH 2/5] iommu/virtio: Support bypass domains

2021-10-13 Thread Tian, Kevin

> From: Jean-Philippe Brucker 
> Sent: Wednesday, October 13, 2021 8:11 PM
> 
> The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the
> ATTACH
> request, that creates a bypass domain. Use it to enable identity
> domains.
> 
> When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device,
> we
> currently fail attaching to an identity domain. Future patches will
> instead create identity mappings in this case.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 80930ce04a16..ee8a7afd667b 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -71,6 +71,7 @@ struct viommu_domain {
>   struct rb_root_cached   mappings;
> 
>   unsigned long   nr_endpoints;
> + boolbypass;
>  };
> 
>  struct viommu_endpoint {
> @@ -587,7 +588,9 @@ static struct iommu_domain
> *viommu_domain_alloc(unsigned type)
>  {
>   struct viommu_domain *vdomain;
> 
> - if (type != IOMMU_DOMAIN_UNMANAGED && type !=
> IOMMU_DOMAIN_DMA)
> + if (type != IOMMU_DOMAIN_UNMANAGED &&
> + type != IOMMU_DOMAIN_DMA &&
> + type != IOMMU_DOMAIN_IDENTITY)
>   return NULL;
> 
>   vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
> @@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct
> viommu_endpoint *vdev,
>   vdomain->map_flags  = viommu->map_flags;
>   vdomain->viommu = viommu;
> 
> + if (domain->type == IOMMU_DOMAIN_IDENTITY) {
> + if (!virtio_has_feature(viommu->vdev,
> + VIRTIO_IOMMU_F_BYPASS_CONFIG))
> {
> + ida_free(>domain_ids, vdomain->id);
> + vdomain->viommu = 0;
> + return -EOPNOTSUPP;
> + }
> +
> + vdomain->bypass = true;
> + }
> +

move to the start of the function, then no need for above cleanup.

>   return 0;
>  }
> 
> @@ -691,6 +705,9 @@ static int viommu_attach_dev(struct iommu_domain
> *domain, struct device *dev)
>   .domain = cpu_to_le32(vdomain->id),
>   };
> 
> + if (vdomain->bypass)
> + req.flags |=
> cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
> +
>   for (i = 0; i < fwspec->num_ids; i++) {
>   req.endpoint = cpu_to_le32(fwspec->ids[i]);
> 
> @@ -1132,6 +1149,7 @@ static unsigned int features[] = {
>   VIRTIO_IOMMU_F_DOMAIN_RANGE,
>   VIRTIO_IOMMU_F_PROBE,
>   VIRTIO_IOMMU_F_MMIO,
> + VIRTIO_IOMMU_F_BYPASS_CONFIG,
>  };
> 
>  static struct virtio_device_id id_table[] = {
> --
> 2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-13 Thread Tian, Kevin

> From: Jean-Philippe Brucker 
> Sent: Wednesday, October 13, 2021 8:11 PM
> 
> Support identity domains, allowing to only enable IOMMU protection for a
> subset of endpoints (those assigned to userspace, for example). Users
> may enable identity domains at compile time
> (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> (iommu.passthrough=1) or
> runtime (/sys/kernel/iommu_groups/*/type = identity).

Do we want to use consistent terms between spec (bypass domain) 
and code (identity domain)? 

> 
> Patches 1-2 support identity domains using the optional
> VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
> spec, see [1] for the latest proposal.
> 
> Patches 3-5 add a fallback to identity mappings, when the feature is not
> supported.
> 
> Note that this series doesn't touch the global bypass bit added by
> VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU
> should
> be attached to a domain, so global bypass isn't in use after endpoints

I saw a concept of deferred attach in iommu core. See iommu_is_
attach_deferred(). Currently this is vendor specific and I haven't
looked into the exact reason why some vendor sets it now. Just
be curious whether the same reason might be applied to virtio-iommu.

> are probed. Before that, the global bypass policy is decided by the
> hypervisor and firmware. So I don't think Linux needs to touch the

This reminds me one thing. The spec says that the global bypass
bit is sticky and not affected by reset. This implies that in the case
of rebooting the VM into a different OS, the previous OS actually
has the right to override this setting for the next OS. Is it a right
design? Even the firmware itself is unable to identify the original
setting enforced by the hypervisor after reboot. I feel the hypervisor
setting should be recovered after reset since it reflects the 
security measure enforced by the virtual platform?

> global bypass bit, but there are some patches available on my
> virtio-iommu/bypass branch [2] to test it.
> 
> QEMU patches are on my virtio-iommu/bypass branch [3] (and the list)
> 
> [1] https://www.mail-archive.com/virtio-dev@lists.oasis-
> open.org/msg07898.html
> [2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass
> [3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass
> 
> Jean-Philippe Brucker (5):
>   iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
>   iommu/virtio: Support bypass domains
>   iommu/virtio: Sort reserved regions
>   iommu/virtio: Pass end address to viommu_add_mapping()
>   iommu/virtio: Support identity-mapped domains
> 
>  include/uapi/linux/virtio_iommu.h |   8 ++-
>  drivers/iommu/virtio-iommu.c  | 113 +-
>  2 files changed, 101 insertions(+), 20 deletions(-)
> 
> --
> 2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] Revert "virtio-blk: Add validation for block size in config space"

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 8:46 PM Michael S. Tsirkin  wrote:
>
> It turns out that access to config space before completing the feature
> negotiation is broken for big endian guests at least with QEMU hosts up
> to 6.1 inclusive.  This affects any device that accesses config space in
> the validate callback: at the moment that is virtio-net with
> VIRTIO_NET_F_MTU but since 82e89ea077b9 ("virtio-blk: Add validation for
> block size in config space") that also started affecting virtio-blk with
> VIRTIO_BLK_F_BLK_SIZE.

Ok, so it looks like I need to do something similar in my series to
validate max_nr_ports in probe() instead of validate()? (multi port is
not enabled by default).

I wonder if we need to document this and rename the validate to
validate_features() (since we can do very little thing if we can't
read config in .validate()).

> Further, unlike VIRTIO_NET_F_MTU which is off by
> default on QEMU, VIRTIO_BLK_F_BLK_SIZE is on by default, which resulted
> in lots of people not being able to boot VMs on BE.
>
> The spec is very clear that what we are doing is legal so QEMU needs to
> be fixed, but given it's been broken for so many years and no one
> noticed, we need to give QEMU a bit more time before applying this.
>
> Further, this patch is incomplete (does not check blk size is a power
> of two) and it duplicates the logic from nbd.
>
> Revert for now, and we'll reapply a cleaner logic in the next release.
>
> Cc: sta...@vger.kernel.org
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
> space")
> Cc: Xie Yongji 
> Signed-off-by: Michael S. Tsirkin 

Acked-by: jason Wang 

> ---
>  drivers/block/virtio_blk.c | 37 ++---
>  1 file changed, 6 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 9b3bd083b411..303caf2d17d0 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -689,28 +689,6 @@ static const struct blk_mq_ops virtio_mq_ops = {
>  static unsigned int virtblk_queue_depth;
>  module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
>
> -static int virtblk_validate(struct virtio_device *vdev)
> -{
> -   u32 blk_size;
> -
> -   if (!vdev->config->get) {
> -   dev_err(>dev, "%s failure: config access disabled\n",
> -   __func__);
> -   return -EINVAL;
> -   }
> -
> -   if (!virtio_has_feature(vdev, VIRTIO_BLK_F_BLK_SIZE))
> -   return 0;
> -
> -   blk_size = virtio_cread32(vdev,
> -   offsetof(struct virtio_blk_config, blk_size));
> -
> -   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE)
> -   __virtio_clear_bit(vdev, VIRTIO_BLK_F_BLK_SIZE);
> -
> -   return 0;
> -}
> -
>  static int virtblk_probe(struct virtio_device *vdev)
>  {
> struct virtio_blk *vblk;
> @@ -722,6 +700,12 @@ static int virtblk_probe(struct virtio_device *vdev)
> u8 physical_block_exp, alignment_offset;
> unsigned int queue_depth;
>
> +   if (!vdev->config->get) {
> +   dev_err(>dev, "%s failure: config access disabled\n",
> +   __func__);
> +   return -EINVAL;
> +   }
> +
> err = ida_simple_get(_index_ida, 0, minor_to_index(1 << MINORBITS),
>  GFP_KERNEL);
> if (err < 0)
> @@ -836,14 +820,6 @@ static int virtblk_probe(struct virtio_device *vdev)
> else
> blk_size = queue_logical_block_size(q);
>
> -   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE) {
> -   dev_err(>dev,
> -   "block size is changed unexpectedly, now is %u\n",
> -   blk_size);
> -   err = -EINVAL;
> -   goto out_cleanup_disk;
> -   }
> -
> /* Use topology information if available */
> err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
>struct virtio_blk_config, 
> physical_block_exp,
> @@ -1009,7 +985,6 @@ static struct virtio_driver virtio_blk = {
> .driver.name= KBUILD_MODNAME,
> .driver.owner   = THIS_MODULE,
> .id_table   = id_table,
> -   .validate   = virtblk_validate,
> .probe  = virtblk_probe,
> .remove = virtblk_remove,
> .config_changed = virtblk_config_changed,
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 07/12] virtio-pci: harden INTX interrupts

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 5:42 PM Michael S. Tsirkin  wrote:
>
> On Tue, Oct 12, 2021 at 02:52:22PM +0800, Jason Wang wrote:
> > This patch tries to make sure the virtio interrupt handler for INTX
> > won't be called after a reset and before virtio_device_ready(). We
> > can't use IRQF_NO_AUTOEN since we're using shared interrupt
> > (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> > intx_soft_enabled variable and toggle it during in
> > vp_disable/enable_vectors(). The INTX interrupt handler will check
> > intx_soft_enabled before processing the actual interrupt.
> >
> > Cc: Boqun Feng 
> > Cc: Thomas Gleixner 
> > Cc: Peter Zijlstra 
> > Cc: Paul E. McKenney 
> > Signed-off-by: Jason Wang 
> > ---
> >  drivers/virtio/virtio_pci_common.c | 24 ++--
> >  drivers/virtio/virtio_pci_common.h |  1 +
> >  2 files changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index 0b9523e6dd39..5ae6a2a4eb77 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -30,8 +30,16 @@ void vp_disable_vectors(struct virtio_device *vdev)
> >   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >   int i;
> >
> > - if (vp_dev->intx_enabled)
> > + if (vp_dev->intx_enabled) {
> > + /*
> > +  * The below synchronize() guarantees that any
> > +  * interrupt for this line arriving after
> > +  * synchronize_irq() has completed is guaranteed to see
> > +  * intx_soft_enabled == false.
> > +  */
> > + WRITE_ONCE(vp_dev->intx_soft_enabled, false);
> >   synchronize_irq(vp_dev->pci_dev->irq);
> > + }
> >
> >   for (i = 0; i < vp_dev->msix_vectors; ++i)
> >   disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > @@ -43,8 +51,16 @@ void vp_enable_vectors(struct virtio_device *vdev)
> >   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >   int i;
> >
> > - if (vp_dev->intx_enabled)
> > + if (vp_dev->intx_enabled) {
> > + disable_irq(vp_dev->pci_dev->irq);
> > + /*
> > +  * The above disable_irq() provides TSO ordering and
> > +  * as such promotes the below store to store-release.
> > +  */
> > + WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> > + enable_irq(vp_dev->pci_dev->irq);
> >   return;
> > + }
> >
> >   for (i = 0; i < vp_dev->msix_vectors; ++i)
> >   enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > @@ -97,6 +113,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
> >   struct virtio_pci_device *vp_dev = opaque;
> >   u8 isr;
> >
> > + /* read intx_soft_enabled before read others */
> > + if (!smp_load_acquire(_dev->intx_soft_enabled))
> > + return IRQ_NONE;
> > +
> >   /* reading the ISR has the effect of also clearing it so it's very
> >* important to save off the value. */
> >   isr = ioread8(vp_dev->isr);
>
> I don't see why we need this ordering guarantee here.
>
> synchronize_irq above makes sure no interrupt handler
> is in progress.

Yes.

> the handler itself thus does not need
> any specific order, it is ok if intx_soft_enabled is read
> after, not before the rest of it.

But the interrupt could be raised after synchronize_irq() which may
see a false of the intx_soft_enabled. In this case we still need the
make sure intx_soft_enbled to be read first instead of allowing other
operations to be done first, otherwise the intx_soft_enabled is
meaningless.

Thanks

>
> Just READ_ONCE should be enough, and we can drop the comment.
>
>
> > diff --git a/drivers/virtio/virtio_pci_common.h 
> > b/drivers/virtio/virtio_pci_common.h
> > index a235ce9ff6a5..3c06e0f92ee4 100644
> > --- a/drivers/virtio/virtio_pci_common.h
> > +++ b/drivers/virtio/virtio_pci_common.h
> > @@ -64,6 +64,7 @@ struct virtio_pci_device {
> >   /* MSI-X support */
> >   int msix_enabled;
> >   int intx_enabled;
> > + bool intx_soft_enabled;
> >   cpumask_var_t *msix_affinity_masks;
> >   /* Name strings for interrupts. This size should be enough,
> >* and I'm too lazy to allocate each name separately. */
> > --
> > 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 02/12] virtio: add doc for validate() method

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 6:09 PM Michael S. Tsirkin  wrote:
>
> On Tue, Oct 12, 2021 at 02:52:17PM +0800, Jason Wang wrote:
> > This patch adds doc for validate() method.
> >
> > Signed-off-by: Jason Wang 
>
> And maybe document __virtio_clear_bit : it says
> "for use by transports" and now it's also legal in the
> validate callback.

Ok, let me do that in v3.

Thanks

>
> > ---
> >  include/linux/virtio.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 41edbc01ffa4..0cd8685aeba4 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -152,6 +152,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev);
> >   * @feature_table_size: number of entries in the feature table array.
> >   * @feature_table_legacy: same as feature_table but when working in legacy 
> > mode.
> >   * @feature_table_size_legacy: number of entries in feature table legacy 
> > array.
> > + * @validate: optional function to do early checks
> >   * @probe: the function to call when a device is found.  Returns 0 or 
> > -errno.
> >   * @scan: optional function to call after successful probe; intended
> >   *for virtio-scsi to invoke a scan.
> > --
> > 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 01/12] virtio-blk: validate num_queues during probe

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 6:04 PM Michael S. Tsirkin  wrote:
>
> On Tue, Oct 12, 2021 at 02:52:16PM +0800, Jason Wang wrote:
> > If an untrusted device neogitates BLK_F_MQ but advertises a zero
> > num_queues, the driver may end up trying to allocating zero size
> > buffers where ZERO_SIZE_PTR is returned which may pass the checking
> > against the NULL. This will lead unexpected results.
> >
> > Fixing this by using single queue if num_queues is zero.
> >
> > Cc: Paolo Bonzini 
> > Cc: Stefan Hajnoczi 
> > Cc: Stefano Garzarella 
> > Reviewed-by: Stefan Hajnoczi 
> > Signed-off-by: Jason Wang 
>
> I'd rather fail probe so we don't need to support that.

I think we should be consistent among all virtio drivers.

E.g without this patch, we stick to 1 if virtio_create_feature() fail.
Do we need to fix that?

And we do something similar at least for the virtio-net and a lot of
other places.

/* We need at least 2 queue's */
if (err || max_queue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
max_queue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
max_queue_pairs = 1;

Thanks

>
> > ---
> >  drivers/block/virtio_blk.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 9b3bd083b411..9deff01a38cb 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -495,7 +495,8 @@ static int init_vq(struct virtio_blk *vblk)
> >   err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
> >  struct virtio_blk_config, num_queues,
> >  _vqs);
> > - if (err)
> > + /* We need at least one virtqueue */
> > + if (err || !num_vqs)
> >   num_vqs = 1;
> >
> >   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
> > --
> > 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 09/12] virtio_ring: validate used buffer length

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 6:02 PM Michael S. Tsirkin  wrote:
>
> On Tue, Oct 12, 2021 at 02:52:24PM +0800, Jason Wang wrote:
> > This patch validate the used buffer length provided by the device
> > before trying to use it. This is done by record the in buffer length
> > in a new field in desc_state structure during virtqueue_add(), then we
> > can fail the virtqueue_get_buf() when we find the device is trying to
> > give us a used buffer length which is greater than the in buffer
> > length.
> >
> > Since some drivers have already done the validation by itself, this
>
> by themselves
>
> > patch tries to makes the core validation optional. For the driver that
> > doesn't want the validation, it can set the validate_used to be
> > true (which could be overridden by force_used_validation). To be more
> > efficient, a dedicate array is used for storing the validate used
> > length, this helps to eliminate the cache stress if validation is done
> > by the driver.
> >
> > Signed-off-by: Jason Wang 
> > ---
> >  drivers/virtio/virtio_ring.c | 56 
> >  include/linux/virtio.h   |  2 ++
> >  2 files changed, 58 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index d2ca0a7365f8..dee962bd8b04 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -14,6 +14,9 @@
> >  #include 
> >  #include 
> >
> > +static bool force_used_validation = false;
> > +module_param(force_used_validation, bool, 0444);
> > +
> >  #ifdef DEBUG
> >  /* For development, we want to crash whenever the ring is screwed. */
> >  #define BAD_RING(_vq, fmt, args...)  \
> > @@ -182,6 +185,9 @@ struct vring_virtqueue {
> >   } packed;
> >   };
> >
> > + /* Per-descriptor in buffer length */
> > + u32 *buflen;
> > +
> >   /* How to notify other side. FIXME: commonalize hcalls! */
> >   bool (*notify)(struct virtqueue *vq);
> >
> > @@ -490,6 +496,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> >   unsigned int i, n, avail, descs_used, prev, err_idx;
> >   int head;
> >   bool indirect;
> > + u32 buflen = 0;
> >
> >   START_USE(vq);
> >
> > @@ -571,6 +578,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> >VRING_DESC_F_NEXT |
> >VRING_DESC_F_WRITE,
> >indirect);
> > + buflen += sg->length;
> >   }
> >   }
> >   /* Last one doesn't continue. */
> > @@ -610,6 +618,10 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> >   else
> >   vq->split.desc_state[head].indir_desc = ctx;
> >
> > + /* Store in buffer length if necessary */
> > + if (vq->buflen)
> > + vq->buflen[head] = buflen;
> > +
> >   /* Put entry in available array (but don't update avail->idx until 
> > they
> >* do sync). */
> >   avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > @@ -784,6 +796,11 @@ static void *virtqueue_get_buf_ctx_split(struct 
> > virtqueue *_vq,
> >   BAD_RING(vq, "id %u is not a head!\n", i);
> >   return NULL;
> >   }
> > + if (vq->buflen && unlikely(*len > vq->buflen[i])) {
> > + BAD_RING(vq, "used len %d is larger than in buflen %u\n",
> > + *len, vq->buflen[i]);
> > + return NULL;
> > + }
> >
> >   /* detach_buf_split clears data, so grab it now. */
> >   ret = vq->split.desc_state[i].data;
> > @@ -1062,6 +1079,7 @@ static int virtqueue_add_indirect_packed(struct 
> > vring_virtqueue *vq,
> >   unsigned int i, n, err_idx;
> >   u16 head, id;
> >   dma_addr_t addr;
> > + u32 buflen = 0;
> >
> >   head = vq->packed.next_avail_idx;
> >   desc = alloc_indirect_packed(total_sg, gfp);
> > @@ -1089,6 +1107,8 @@ static int virtqueue_add_indirect_packed(struct 
> > vring_virtqueue *vq,
> >   desc[i].addr = cpu_to_le64(addr);
> >   desc[i].len = cpu_to_le32(sg->length);
> >   i++;
> > + if (n >= out_sgs)
> > + buflen += sg->length;
> >   }
> >   }
> >
> > @@ -1142,6 +1162,10 @@ static int virtqueue_add_indirect_packed(struct 
> > vring_virtqueue *vq,
> >   vq->packed.desc_state[id].indir_desc = desc;
> >   vq->packed.desc_state[id].last = id;
> >
> > + /* Store in buffer length if necessary */
> > + if (vq->buflen)
> > + vq->buflen[id] = buflen;
> > +
> >   vq->num_added += 1;
> >
> >   pr_debug("Added buffer head %i to %p\n", head, vq);
> > @@ -1176,6 +1200,7 @@ static inline int virtqueue_add_packed(struct 
> > virtqueue *_vq,
> >   unsigned int i, n, c, descs_used, err_idx;
> >

Re: [PATCH V2 06/12] virtio_pci: harden MSI-X interrupts

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 5:59 PM Michael S. Tsirkin  wrote:
>
> On Tue, Oct 12, 2021 at 02:52:21PM +0800, Jason Wang wrote:
> > We used to synchronize pending MSI-X irq handlers via
> > synchronize_irq(), this may not work for the untrusted device which
> > may keep sending interrupts after reset which may lead unexpected
> > results. Similarly, we should not enable MSI-X interrupt until the
> > device is ready.
>
> You mean until the driver is ready.

RIght, I will fix it.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 03/12] virtio-console: switch to use .validate()

2021-10-13 Thread Jason Wang

On Wed, Oct 13, 2021 at 5:51 PM Michael S. Tsirkin  wrote:
>
> On Tue, Oct 12, 2021 at 02:52:18PM +0800, Jason Wang wrote:
> > This patch switches to use validate() to filter out the features that
> > is not supported by the rproc.
>
> are not supported
>
> >
> > Cc: Amit Shah 
> > Signed-off-by: Jason Wang 
>
>
> Does this have anything to do with hardening?
>
> It seems cleaner to not negotiate features we do not use,
> but given we did this for many years ... should we bother
> at this point?

It looks cleaner to do all the validation in one places:

1) check unsupported feature for rproc
2) validate the max_nr_ports (as has been done in patch 4)

>
>
> > ---
> >  drivers/char/virtio_console.c | 41 ++-
> >  1 file changed, 26 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index 7eaf303a7a86..daeed31df622 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1172,9 +1172,7 @@ static void resize_console(struct port *port)
> >
> >   vdev = port->portdev->vdev;
> >
> > - /* Don't test F_SIZE at all if we're rproc: not a valid feature! */
> > - if (!is_rproc_serial(vdev) &&
> > - virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
> > + if (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
> >   hvc_resize(port->cons.hvc, port->cons.ws);
> >  }
> >
> > @@ -1981,6 +1979,29 @@ static void virtcons_remove(struct virtio_device 
> > *vdev)
> >   kfree(portdev);
> >  }
> >
> > +static int virtcons_validate(struct virtio_device *vdev)
> > +{
> > + if (is_rproc_serial(vdev)) {
> > + /* Don't test F_SIZE at all if we're rproc: not a
> > +  * valid feature! */
>
>
> This comment needs to be fixed now. And the format's wrong
> since you made it a multi-line comment.
> Should be
> /*
>  * like
>  * this
>  */

Ok.

>
> > + __virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_SIZE);
> > + /* Don't test MULTIPORT at all if we're rproc: not a
> > +  * valid feature! */
> > + __virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
> > + }
> > +
> > + /* We only need a config space if features are offered */
> > + if (!vdev->config->get &&
> > + (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
> > +  || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
> > + dev_err(>dev, "%s failure: config access disabled\n",
> > + __func__);
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> >  /*
> >   * Once we're further in boot, we get probed like any other virtio
> >   * device.
>
> This switches the order of tests around, so if an rproc device
> offers VIRTIO_CONSOLE_F_SIZE or VIRTIO_CONSOLE_F_MULTIPORT
> without get it will now try to work instead of failing.

Ok, so we can fail the validation in this case.

Thanks

>
> Which is maybe a worthy goal, but given rproc does not support
> virtio 1.0 it also risks trying to drive something completely
> unreasonable.
>
> Overall does not feel like hardening which is supposed to make
> things more strict, not less.
>
>
> > @@ -1996,15 +2017,6 @@ static int virtcons_probe(struct virtio_device *vdev)
> >   bool multiport;
> >   bool early = early_put_chars != NULL;
> >
> > - /* We only need a config space if features are offered */
> > - if (!vdev->config->get &&
> > - (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
> > -  || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
> > - dev_err(>dev, "%s failure: config access disabled\n",
> > - __func__);
> > - return -EINVAL;
> > - }
> > -
> >   /* Ensure to read early_put_chars now */
> >   barrier();
> >
> > @@ -2031,9 +2043,7 @@ static int virtcons_probe(struct virtio_device *vdev)
> >   multiport = false;
> >   portdev->max_nr_ports = 1;
> >
> > - /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> > - if (!is_rproc_serial(vdev) &&
> > - virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
> > + if (virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
> >struct virtio_console_config, max_nr_ports,
> >>max_nr_ports) == 0) {
> >   multiport = true;
> > @@ -2210,6 +2220,7 @@ static struct virtio_driver virtio_console = {
> >   .driver.name =  KBUILD_MODNAME,
> >   .driver.owner = THIS_MODULE,
> >   .id_table = id_table,
> > + .validate = virtcons_validate,
> >   .probe =virtcons_probe,
> >   .remove =   virtcons_remove,
> >   .config_changed = config_intr,
> > --
> > 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

[mst-vhost:vhost 14/16] drivers/virtio/virtio.c:207:13: error: 'virtio_reset_device' defined but not used

2021-10-13 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   825f5e35666901a3cf6963c8ea3aea0898846a8e
commit: e958cd4aadfac0d2968c71208d39069e7968164c [14/16] virtio: wrap 
config->reset calls
config: microblaze-buildonly-randconfig-r001-20211013 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=e958cd4aadfac0d2968c71208d39069e7968164c
git remote add mst-vhost 
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
git fetch --no-tags mst-vhost vhost
git checkout e958cd4aadfac0d2968c71208d39069e7968164c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/virtio/virtio.c:207:13: error: static declaration of 
'virtio_reset_device' follows non-static declaration
 207 | static void virtio_reset_device(struct virtio_device *dev)
 | ^~~
   In file included from drivers/virtio/virtio.c:2:
   include/linux/virtio.h:141:6: note: previous declaration of 
'virtio_reset_device' with type 'void(struct virtio_device *)'
 141 | void virtio_reset_device(struct virtio_device *dev);
 |  ^~~
>> drivers/virtio/virtio.c:207:13: error: 'virtio_reset_device' defined but not 
>> used [-Werror=unused-function]
 207 | static void virtio_reset_device(struct virtio_device *dev)
 | ^~~
   cc1: all warnings being treated as errors


vim +/virtio_reset_device +207 drivers/virtio/virtio.c

   206  
 > 207  static void virtio_reset_device(struct virtio_device *dev)
   208  {
   209  dev->config->reset(dev);
   210  }
   211  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread David Hildenbrand

On 13.10.21 14:17, Michael S. Tsirkin wrote:
> On Wed, Oct 13, 2021 at 01:03:46PM +0200, David Hildenbrand wrote:
>> On 13.10.21 12:55, Michael S. Tsirkin wrote:
>>> This will enable cleanups down the road.
>>> The idea is to disable cbs, then add "flush_queued_cbs" callback
>>> as a parameter, this way drivers can flush any work
>>> queued after callbacks have been disabled.
>>>
>>> Signed-off-by: Michael S. Tsirkin 
>>> ---
>>>   arch/um/drivers/virt-pci.c | 2 +-
>>>   drivers/block/virtio_blk.c | 4 ++--
>>>   drivers/bluetooth/virtio_bt.c  | 2 +-
>>>   drivers/char/hw_random/virtio-rng.c| 2 +-
>>>   drivers/char/virtio_console.c  | 4 ++--
>>>   drivers/crypto/virtio/virtio_crypto_core.c | 8 
>>>   drivers/firmware/arm_scmi/virtio.c | 2 +-
>>>   drivers/gpio/gpio-virtio.c | 2 +-
>>>   drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
>>>   drivers/i2c/busses/i2c-virtio.c| 2 +-
>>>   drivers/iommu/virtio-iommu.c   | 2 +-
>>>   drivers/net/caif/caif_virtio.c | 2 +-
>>>   drivers/net/virtio_net.c   | 4 ++--
>>>   drivers/net/wireless/mac80211_hwsim.c  | 2 +-
>>>   drivers/nvdimm/virtio_pmem.c   | 2 +-
>>>   drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
>>>   drivers/scsi/virtio_scsi.c | 2 +-
>>>   drivers/virtio/virtio.c| 5 +
>>>   drivers/virtio/virtio_balloon.c| 2 +-
>>>   drivers/virtio/virtio_input.c  | 2 +-
>>>   drivers/virtio/virtio_mem.c| 2 +-
>>>   fs/fuse/virtio_fs.c| 4 ++--
>>>   include/linux/virtio.h | 1 +
>>>   net/9p/trans_virtio.c  | 2 +-
>>>   net/vmw_vsock/virtio_transport.c   | 4 ++--
>>>   sound/virtio/virtio_card.c | 4 ++--
>>>   26 files changed, 39 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
>>> index c08066633023..22c4d87c9c15 100644
>>> --- a/arch/um/drivers/virt-pci.c
>>> +++ b/arch/um/drivers/virt-pci.c
>>> @@ -616,7 +616,7 @@ static void um_pci_virtio_remove(struct virtio_device 
>>> *vdev)
>>> int i;
>>>   /* Stop all virtqueues */
>>> -vdev->config->reset(vdev);
>>> +virtio_reset_device(vdev);
>>>   vdev->config->del_vqs(vdev);
>>
>> Nit: virtio_device_reset()?
>>
>> Because I see:
>>
>> int virtio_device_freeze(struct virtio_device *dev);
>> int virtio_device_restore(struct virtio_device *dev);
>> void virtio_device_ready(struct virtio_device *dev)
>>
>> But well, there is:
>> void virtio_break_device(struct virtio_device *dev);
> 
> Exactly. I don't know what's best, so I opted for plain english :)

Fair enough, LGTM

Reviewed-by: David Hildenbrand 


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread Vivek Goyal

On Wed, Oct 13, 2021 at 06:55:31AM -0400, Michael S. Tsirkin wrote:
> This will enable cleanups down the road.
> The idea is to disable cbs, then add "flush_queued_cbs" callback
> as a parameter, this way drivers can flush any work
> queued after callbacks have been disabled.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  arch/um/drivers/virt-pci.c | 2 +-
>  drivers/block/virtio_blk.c | 4 ++--
>  drivers/bluetooth/virtio_bt.c  | 2 +-
>  drivers/char/hw_random/virtio-rng.c| 2 +-
>  drivers/char/virtio_console.c  | 4 ++--
>  drivers/crypto/virtio/virtio_crypto_core.c | 8 
>  drivers/firmware/arm_scmi/virtio.c | 2 +-
>  drivers/gpio/gpio-virtio.c | 2 +-
>  drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
>  drivers/i2c/busses/i2c-virtio.c| 2 +-
>  drivers/iommu/virtio-iommu.c   | 2 +-
>  drivers/net/caif/caif_virtio.c | 2 +-
>  drivers/net/virtio_net.c   | 4 ++--
>  drivers/net/wireless/mac80211_hwsim.c  | 2 +-
>  drivers/nvdimm/virtio_pmem.c   | 2 +-
>  drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
>  drivers/scsi/virtio_scsi.c | 2 +-
>  drivers/virtio/virtio.c| 5 +
>  drivers/virtio/virtio_balloon.c| 2 +-
>  drivers/virtio/virtio_input.c  | 2 +-
>  drivers/virtio/virtio_mem.c| 2 +-
>  fs/fuse/virtio_fs.c| 4 ++--

fs/fuse/virtio_fs.c changes look good to me.

Reviewed-by: Vivek Goyal 

Vivek

[..]
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 0ad89c6629d7..27c3b74070a2 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -895,7 +895,7 @@ static int virtio_fs_probe(struct virtio_device *vdev)
>   return 0;
>  
>  out_vqs:
> - vdev->config->reset(vdev);
> + virtio_reset_device(vdev);
>   virtio_fs_cleanup_vqs(vdev, fs);
>   kfree(fs->vqs);
>  
> @@ -927,7 +927,7 @@ static void virtio_fs_remove(struct virtio_device *vdev)
>   list_del_init(>list);
>   virtio_fs_stop_all_queues(fs);
>   virtio_fs_drain_all_queues_locked(fs);
> - vdev->config->reset(vdev);
> + virtio_reset_device(vdev);
>   virtio_fs_cleanup_vqs(vdev, fs);
>  
>   vdev->priv = NULL;


Thanks
Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Michael S. Tsirkin

On Wed, Oct 13, 2021 at 02:52:38PM +0200, Cornelia Huck wrote:
> On Wed, Oct 13 2021, "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Oct 13, 2021 at 01:23:50PM +0200, Christian Borntraeger wrote:
> >> Can we get this kernel patch queued for 5.15 and stable without waiting 
> >> for the QEMU patch
> >> as we have a regression with 4.14?
> >
> > Probably. Still trying to decide between this and plain revert for 5.15
> > and back. Maybe both?
> 
> Probably better queue this one, in case we have some undiscovered
> problems with the config space access in virtio-net?

So both then. I think you are right. Pushed out to -next. Will do a pull
towards end of the week.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Cornelia Huck

On Wed, Oct 13 2021, "Michael S. Tsirkin"  wrote:

> On Wed, Oct 13, 2021 at 01:23:50PM +0200, Christian Borntraeger wrote:
>> Can we get this kernel patch queued for 5.15 and stable without waiting for 
>> the QEMU patch
>> as we have a regression with 4.14?
>
> Probably. Still trying to decide between this and plain revert for 5.15
> and back. Maybe both?

Probably better queue this one, in case we have some undiscovered
problems with the config space access in virtio-net?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5] virtio-blk: Add validation for block size in config space

2021-10-13 Thread Michael S. Tsirkin

On Wed, Oct 13, 2021 at 08:34:22PM +0800, Yongji Xie wrote:
> On Wed, Oct 13, 2021 at 8:21 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Oct 11, 2021 at 01:40:41PM +0200, Christoph Hellwig wrote:
> > > On Tue, Oct 05, 2021 at 06:42:43AM -0400, Michael S. Tsirkin wrote:
> > > > Stefan also pointed out this duplicates the logic from
> > > >
> > > > if (blksize < 512 || blksize > PAGE_SIZE || 
> > > > !is_power_of_2(blksize))
> > > > return -EINVAL;
> > > >
> > > >
> > > > and a bunch of other places.
> > > >
> > > >
> > > > Would it be acceptable for blk layer to validate the input
> > > > instead of having each driver do it's own thing?
> > > > Maybe inside blk_queue_logical_block_size?
> > >
> > > I'm pretty sure we want down that before.  Let's just add a helper
> > > just for that check for now as part of this series.  Actually validating
> > > in in blk_queue_logical_block_size seems like a good idea, but returning
> > > errors from that has a long tail.
> >
> > Xie Yongji, I think I will revert this patch for now - can you
> > please work out adding that helper and using it in virtio?
> >
> 
> Fine, I will do it.
> 
> Thanks,
> Yongji

Great, thanks! And while at it, pls research a bit more and mention
in the commit log what is the result of an illegal blk size?
Is it memory corruption? A catastrophic failure?
If it's one of these cases, then it's ok to just fail probe.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Michael S. Tsirkin

On Wed, Oct 13, 2021 at 02:44:08PM +0200, Halil Pasic wrote:
> On Wed, 13 Oct 2021 08:24:53 -0400
> "Michael S. Tsirkin"  wrote:
> 
> > > > OK this looks good! How about a QEMU patch to make it spec compliant on
> > > > BE?  
> > > 
> > > Who is going to do that? Halil? you? Conny?  
> > 
> > Halil said he'll do it... Right, Halil?
> 
> I can do it but not right away. Maybe in a couple of weeks. I have some
> other bugs to hunt down, before proceeding to this. If somebody else
> wants to do it, I'm fine with that as well.
> 
> Regards,
> Halil

Couple of weeks is ok I think.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] Revert "virtio-blk: Add validation for block size in config space"

2021-10-13 Thread Michael S. Tsirkin

It turns out that access to config space before completing the feature
negotiation is broken for big endian guests at least with QEMU hosts up
to 6.1 inclusive.  This affects any device that accesses config space in
the validate callback: at the moment that is virtio-net with
VIRTIO_NET_F_MTU but since 82e89ea077b9 ("virtio-blk: Add validation for
block size in config space") that also started affecting virtio-blk with
VIRTIO_BLK_F_BLK_SIZE. Further, unlike VIRTIO_NET_F_MTU which is off by
default on QEMU, VIRTIO_BLK_F_BLK_SIZE is on by default, which resulted
in lots of people not being able to boot VMs on BE.

The spec is very clear that what we are doing is legal so QEMU needs to
be fixed, but given it's been broken for so many years and no one
noticed, we need to give QEMU a bit more time before applying this.

Further, this patch is incomplete (does not check blk size is a power
of two) and it duplicates the logic from nbd.

Revert for now, and we'll reapply a cleaner logic in the next release.

Cc: sta...@vger.kernel.org
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
space")
Cc: Xie Yongji 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/block/virtio_blk.c | 37 ++---
 1 file changed, 6 insertions(+), 31 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9b3bd083b411..303caf2d17d0 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -689,28 +689,6 @@ static const struct blk_mq_ops virtio_mq_ops = {
 static unsigned int virtblk_queue_depth;
 module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
 
-static int virtblk_validate(struct virtio_device *vdev)
-{
-   u32 blk_size;
-
-   if (!vdev->config->get) {
-   dev_err(>dev, "%s failure: config access disabled\n",
-   __func__);
-   return -EINVAL;
-   }
-
-   if (!virtio_has_feature(vdev, VIRTIO_BLK_F_BLK_SIZE))
-   return 0;
-
-   blk_size = virtio_cread32(vdev,
-   offsetof(struct virtio_blk_config, blk_size));
-
-   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE)
-   __virtio_clear_bit(vdev, VIRTIO_BLK_F_BLK_SIZE);
-
-   return 0;
-}
-
 static int virtblk_probe(struct virtio_device *vdev)
 {
struct virtio_blk *vblk;
@@ -722,6 +700,12 @@ static int virtblk_probe(struct virtio_device *vdev)
u8 physical_block_exp, alignment_offset;
unsigned int queue_depth;
 
+   if (!vdev->config->get) {
+   dev_err(>dev, "%s failure: config access disabled\n",
+   __func__);
+   return -EINVAL;
+   }
+
err = ida_simple_get(_index_ida, 0, minor_to_index(1 << MINORBITS),
 GFP_KERNEL);
if (err < 0)
@@ -836,14 +820,6 @@ static int virtblk_probe(struct virtio_device *vdev)
else
blk_size = queue_logical_block_size(q);
 
-   if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE) {
-   dev_err(>dev,
-   "block size is changed unexpectedly, now is %u\n",
-   blk_size);
-   err = -EINVAL;
-   goto out_cleanup_disk;
-   }
-
/* Use topology information if available */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
   struct virtio_blk_config, physical_block_exp,
@@ -1009,7 +985,6 @@ static struct virtio_driver virtio_blk = {
.driver.name= KBUILD_MODNAME,
.driver.owner   = THIS_MODULE,
.id_table   = id_table,
-   .validate   = virtblk_validate,
.probe  = virtblk_probe,
.remove = virtblk_remove,
.config_changed = virtblk_config_changed,
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Halil Pasic

On Wed, 13 Oct 2021 08:24:53 -0400
"Michael S. Tsirkin"  wrote:

> > > OK this looks good! How about a QEMU patch to make it spec compliant on
> > > BE?  
> > 
> > Who is going to do that? Halil? you? Conny?  
> 
> Halil said he'll do it... Right, Halil?

I can do it but not right away. Maybe in a couple of weeks. I have some
other bugs to hunt down, before proceeding to this. If somebody else
wants to do it, I'm fine with that as well.

Regards,
Halil
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vduse: Fix race condition between resetting and irq injecting

2021-10-13 Thread Michael S. Tsirkin

On Wed, Oct 13, 2021 at 08:30:40PM +0800, Yongji Xie wrote:
> On Wed, Oct 13, 2021 at 7:10 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Sep 29, 2021 at 04:30:50PM +0800, Xie Yongji wrote:
> > > The interrupt might be triggered after a reset since there is
> > > no synchronization between resetting and irq injecting.
> >
> > In fact, irq_lock is already used to synchronize with
> > irqs. Why isn't taking and releasing it enough?
> >
> 
> For example:
> 
> CPU 0
>   CPU1
> -
>   
> vduse_dev_ioctl()
>   check DRIVER_OK
> 
> vduse_dev_reset()
> 
>   flush_work(>inject);
> queue_work(vduse_irq_wq, >inject);
> 
> virtio_vdpa_probe()
> 
>   virtio_vdpa_find_vqs()
>vduse_vq_irq_inject()
>  vq->cb.callback(vq->cb.private);
> 
> set DRIVER_OK
> 
> In the above case, the irq callback is still triggered before DRIVER_OK is 
> set.
> 
> But now I found it seems to be better to just check DRIVER_OK again in
> vduse_vq_irq_inject().

And then pesumably make sure each time we set status
it's done under the irq lock?

> > > And it
> > > might break something if the interrupt is delayed until a new
> > > round of device initialization.
> > >
> > > Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> > > Signed-off-by: Xie Yongji 
> > > ---
> > >  drivers/vdpa/vdpa_user/vduse_dev.c | 37 
> > > +
> > >  1 file changed, 25 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> > > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index cefb301b2ee4..841667a896dd 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -80,6 +80,7 @@ struct vduse_dev {
> > >   struct vdpa_callback config_cb;
> > >   struct work_struct inject;
> > >   spinlock_t irq_lock;
> > > + struct rw_semaphore rwsem;
> > >   int minor;
> > >   bool broken;
> > >   bool connected;
> >
> > What does this lock protect? Use a more descriptive name pls,
> > and maybe add a comment.
> >
> 
> This lock is used to ensure there is no more inflight irq kwork after reset.
> 
> >
> > > @@ -410,6 +411,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
> > >   if (domain->bounce_map)
> > >   vduse_domain_reset_bounce_map(domain);
> > >
> > > + down_write(>rwsem);
> > > +
> > >   dev->status = 0;
> > >   dev->driver_features = 0;
> > >   dev->generation++;
> > > @@ -443,6 +446,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
> > >   flush_work(>inject);
> > >   flush_work(>kick);
> > >   }
> > > +
> > > + up_write(>rwsem);
> > >  }
> > >
> > >  static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > @@ -885,6 +890,23 @@ static void vduse_vq_irq_inject(struct work_struct 
> > > *work)
> > >   spin_unlock_irq(>irq_lock);
> > >  }
> > >
> > > +static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> > > + struct work_struct *irq_work)
> > > +{
> > > + int ret = -EINVAL;
> > > +
> > > + down_read(>rwsem);
> > > + if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > > + goto unlock;
> > > +
> > > + ret = 0;
> > > + queue_work(vduse_irq_wq, irq_work);
> > > +unlock:
> > > + up_read(>rwsem);
> > > +
> > > + return ret;
> > > +}
> > > +
> > >  static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> > >   unsigned long arg)
> > >  {
> >
> >
> > so that's a lot of overhead for an irq.
> > Normally the way to address races like this is to add
> > flushing to the reset path, not locking to irq path.
> >
> 
> Yes, we already call flush_work() in the reset path.
> 
> Thanks,
> Yongji

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Michael S. Tsirkin

On Wed, Oct 13, 2021 at 01:23:50PM +0200, Christian Borntraeger wrote:
> 
> 
> Am 13.10.21 um 12:10 schrieb Michael S. Tsirkin:
> > On Mon, Oct 11, 2021 at 07:39:21AM +0200, Halil Pasic wrote:
> > > The virtio specification virtio-v1.1-cs01 states: "Transitional devices
> > > MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
> > > been acknowledged by the driver."  This is exactly what QEMU as of 6.1
> > > has done relying solely on VIRTIO_F_VERSION_1 for detecting that.
> > > 
> > > However, the specification also says: "... the driver MAY read (but MUST
> > > NOT write) the device-specific configuration fields to check that it can
> > > support the device ..." before setting FEATURES_OK.
> > > 
> > > In that case, any transitional device relying solely on
> > > VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
> > > legacy format.  In particular, this implies that it is in big endian
> > > format for big endian guests. This naturally confuses the driver which
> > > expects little endian in the modern mode.
> > > 
> > > It is probably a good idea to amend the spec to clarify that
> > > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> > > is complete. Before validate callback existed, config space was only
> > > read after FEATURES_OK. However, we already have two regressions, so
> > > let's address this here as well.
> > > 
> > > The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
> > > the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
> > > virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
> > > with DASD backing, because things simply don't work with the default.
> > > See Fixes tags for relevant commits.
> > > 
> > > For QEMU, we can work around the issue by writing out the feature bits
> > > with VIRTIO_F_VERSION_1 bit set.  We (ab)use the finalize_features
> > > config op for this. This isn't enough to address all vhost devices since
> > > these do not get the features until FEATURES_OK, however it looks like
> > > the affected devices actually never handled the endianness for legacy
> > > mode correctly, so at least that's not a regression.
> > > 
> > > No devices except virtio net and virtio blk seem to be affected.
> > > 
> > > Long term the right thing to do is to fix the hypervisors.
> > > 
> > > Cc:  #v4.11
> > > Signed-off-by: Halil Pasic 
> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
> > > space")
> > > Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
> > > Reported-by: mark...@us.ibm.com
> > > Reviewed-by: Cornelia Huck 
> > 
> > OK this looks good! How about a QEMU patch to make it spec compliant on
> > BE?
> 
> Who is going to do that? Halil? you? Conny?

Halil said he'll do it... Right, Halil?

> Can we get this kernel patch queued for 5.15 and stable without waiting for 
> the QEMU patch
> as we have a regression with 4.14?

Probably. Still trying to decide between this and plain revert for 5.15
and back. Maybe both?

> > 
> > > ---
> > > 
> > > @Connie: I made some more commit message changes to accommodate Michael's
> > > requests. I just assumed these will work or you as well and kept your
> > > r-b. Please shout at me if it needs to be dropped :)
> > > ---
> > >   drivers/virtio/virtio.c | 11 +++
> > >   1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index 0a5b54034d4b..236081afe9a2 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d)
> > >   driver_features_legacy = driver_features;
> > >   }
> > > + /*
> > > +  * Some devices detect legacy solely via F_VERSION_1. Write
> > > +  * F_VERSION_1 to force LE config space accesses before FEATURES_OK for
> > > +  * these when needed.
> > > +  */
> > > + if (drv->validate && !virtio_legacy_is_little_endian()
> > > +   && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) {
> > > + dev->features = BIT_ULL(VIRTIO_F_VERSION_1);
> > > + dev->config->finalize_features(dev);
> > > + }
> > > +
> > >   if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> > >   dev->features = driver_features & device_features;
> > >   else
> > > 
> > > base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51
> > > -- 
> > > 2.25.1
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5] virtio-blk: Add validation for block size in config space

2021-10-13 Thread Michael S. Tsirkin

On Mon, Oct 11, 2021 at 01:40:41PM +0200, Christoph Hellwig wrote:
> On Tue, Oct 05, 2021 at 06:42:43AM -0400, Michael S. Tsirkin wrote:
> > Stefan also pointed out this duplicates the logic from 
> > 
> > if (blksize < 512 || blksize > PAGE_SIZE || !is_power_of_2(blksize))
> > return -EINVAL;
> > 
> > 
> > and a bunch of other places.
> > 
> > 
> > Would it be acceptable for blk layer to validate the input
> > instead of having each driver do it's own thing?
> > Maybe inside blk_queue_logical_block_size?
> 
> I'm pretty sure we want down that before.  Let's just add a helper
> just for that check for now as part of this series.  Actually validating
> in in blk_queue_logical_block_size seems like a good idea, but returning
> errors from that has a long tail.

Xie Yongji, I think I will revert this patch for now - can you
please work out adding that helper and using it in virtio?

Thanks,

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 2/5] iommu/virtio: Support bypass domains

2021-10-13 Thread Jean-Philippe Brucker

The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the ATTACH
request, that creates a bypass domain. Use it to enable identity
domains.

When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device, we
currently fail attaching to an identity domain. Future patches will
instead create identity mappings in this case.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 80930ce04a16..ee8a7afd667b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -71,6 +71,7 @@ struct viommu_domain {
struct rb_root_cached   mappings;
 
unsigned long   nr_endpoints;
+   boolbypass;
 };
 
 struct viommu_endpoint {
@@ -587,7 +588,9 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
 {
struct viommu_domain *vdomain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   if (type != IOMMU_DOMAIN_UNMANAGED &&
+   type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
@@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->map_flags  = viommu->map_flags;
vdomain->viommu = viommu;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+   if (!virtio_has_feature(viommu->vdev,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   ida_free(>domain_ids, vdomain->id);
+   vdomain->viommu = 0;
+   return -EOPNOTSUPP;
+   }
+
+   vdomain->bypass = true;
+   }
+
return 0;
 }
 
@@ -691,6 +705,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
.domain = cpu_to_le32(vdomain->id),
};
 
+   if (vdomain->bypass)
+   req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
+
for (i = 0; i < fwspec->num_ids; i++) {
req.endpoint = cpu_to_le32(fwspec->ids[i]);
 
@@ -1132,6 +1149,7 @@ static unsigned int features[] = {
VIRTIO_IOMMU_F_DOMAIN_RANGE,
VIRTIO_IOMMU_F_PROBE,
VIRTIO_IOMMU_F_MMIO,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG,
 };
 
 static struct virtio_device_id id_table[] = {
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 3/5] iommu/virtio: Sort reserved regions

2021-10-13 Thread Jean-Philippe Brucker

To ease identity mapping support, keep the list of reserved regions
sorted.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ee8a7afd667b..d63ec4d11b00 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -423,7 +423,7 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
size_t size;
u64 start64, end64;
phys_addr_t start, end;
-   struct iommu_resv_region *region = NULL;
+   struct iommu_resv_region *region = NULL, *next;
unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
start = start64 = le64_to_cpu(mem->start);
@@ -454,7 +454,12 @@ static int viommu_add_resv_mem(struct viommu_endpoint 
*vdev,
if (!region)
return -ENOMEM;
 
-   list_add(>list, >resv_regions);
+   /* Keep the list sorted */
+   list_for_each_entry(next, >resv_regions, list) {
+   if (next->start > region->start)
+   break;
+   }
+   list_add_tail(>list, >list);
return 0;
 }
 
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 0/5] iommu/virtio: Add identity domains

2021-10-13 Thread Jean-Philippe Brucker

Support identity domains, allowing to only enable IOMMU protection for a
subset of endpoints (those assigned to userspace, for example). Users
may enable identity domains at compile time
(CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
runtime (/sys/kernel/iommu_groups/*/type = identity).

Patches 1-2 support identity domains using the optional
VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
spec, see [1] for the latest proposal.

Patches 3-5 add a fallback to identity mappings, when the feature is not
supported.

Note that this series doesn't touch the global bypass bit added by
VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU should
be attached to a domain, so global bypass isn't in use after endpoints
are probed. Before that, the global bypass policy is decided by the
hypervisor and firmware. So I don't think Linux needs to touch the
global bypass bit, but there are some patches available on my
virtio-iommu/bypass branch [2] to test it.

QEMU patches are on my virtio-iommu/bypass branch [3] (and the list)

[1] https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07898.html
[2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass
[3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass

Jean-Philippe Brucker (5):
  iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
  iommu/virtio: Support bypass domains
  iommu/virtio: Sort reserved regions
  iommu/virtio: Pass end address to viommu_add_mapping()
  iommu/virtio: Support identity-mapped domains

 include/uapi/linux/virtio_iommu.h |   8 ++-
 drivers/iommu/virtio-iommu.c  | 113 +-
 2 files changed, 101 insertions(+), 20 deletions(-)

-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 5/5] iommu/virtio: Support identity-mapped domains

2021-10-13 Thread Jean-Philippe Brucker

Support identity domains for devices that do not offer the
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, by creating 1:1 mappings between
the virtual and physical address space. Identity domains created this
way still perform noticeably better than DMA domains, because they don't
have the overhead of setting up and tearing down mappings at runtime.
The performance difference between this and bypass is minimal in
comparison.

It does not matter that the physical addresses in the identity mappings
do not all correspond to memory. By enabling passthrough we are trusting
the device driver and the device itself to only perform DMA to suitable
locations. In some cases it may even be desirable to perform DMA to MMIO
regions.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 61 +---
 1 file changed, 57 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index eceb9281c8c1..c9e8367d2962 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -375,6 +375,55 @@ static size_t viommu_del_mappings(struct viommu_domain 
*vdomain,
return unmapped;
 }
 
+/*
+ * Fill the domain with identity mappings, skipping the device's reserved
+ * regions.
+ */
+static int viommu_domain_map_identity(struct viommu_endpoint *vdev,
+ struct viommu_domain *vdomain)
+{
+   int ret;
+   struct iommu_resv_region *resv;
+   u64 iova = vdomain->domain.geometry.aperture_start;
+   u64 limit = vdomain->domain.geometry.aperture_end;
+   u32 flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
+   unsigned long granule = 1UL << __ffs(vdomain->domain.pgsize_bitmap);
+
+   iova = ALIGN(iova, granule);
+   limit = ALIGN_DOWN(limit + 1, granule) - 1;
+
+   list_for_each_entry(resv, >resv_regions, list) {
+   u64 resv_start = ALIGN_DOWN(resv->start, granule);
+   u64 resv_end = ALIGN(resv->start + resv->length, granule) - 1;
+
+   if (resv_end < iova || resv_start > limit)
+   /* No overlap */
+   continue;
+
+   if (resv_start > iova) {
+   ret = viommu_add_mapping(vdomain, iova, resv_start - 1,
+(phys_addr_t)iova, flags);
+   if (ret)
+   goto err_unmap;
+   }
+
+   if (resv_end >= limit)
+   return 0;
+
+   iova = resv_end + 1;
+   }
+
+   ret = viommu_add_mapping(vdomain, iova, limit, (phys_addr_t)iova,
+flags);
+   if (ret)
+   goto err_unmap;
+   return 0;
+
+err_unmap:
+   viommu_del_mappings(vdomain, 0, iova);
+   return ret;
+}
+
 /*
  * viommu_replay_mappings - re-send MAP requests
  *
@@ -637,14 +686,18 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->viommu = viommu;
 
if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-   if (!virtio_has_feature(viommu->vdev,
-   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   if (virtio_has_feature(viommu->vdev,
+  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   vdomain->bypass = true;
+   return 0;
+   }
+
+   ret = viommu_domain_map_identity(vdev, vdomain);
+   if (ret) {
ida_free(>domain_ids, vdomain->id);
vdomain->viommu = 0;
return -EOPNOTSUPP;
}
-
-   vdomain->bypass = true;
}
 
return 0;
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

2021-10-13 Thread Jean-Philippe Brucker

To support identity mappings, the virtio-iommu driver must be able to
represent full 64-bit ranges internally. Pass (start, end) instead of
(start, size) to viommu_add/del_mapping().

Clean comments. The one about the returned size was never true: when
sweeping the whole address space the returned size will most certainly
be smaller than 2^64.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index d63ec4d11b00..eceb9281c8c1 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -311,8 +311,8 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, 
void *buf,
  *
  * On success, return the new mapping. Otherwise return NULL.
  */
-static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned long 
iova,
- phys_addr_t paddr, size_t size, u32 flags)
+static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, u64 end,
+ phys_addr_t paddr, u32 flags)
 {
unsigned long irqflags;
struct viommu_mapping *mapping;
@@ -323,7 +323,7 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
 
mapping->paddr  = paddr;
mapping->iova.start = iova;
-   mapping->iova.last  = iova + size - 1;
+   mapping->iova.last  = end;
mapping->flags  = flags;
 
spin_lock_irqsave(>mappings_lock, irqflags);
@@ -338,26 +338,24 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
  *
  * @vdomain: the domain
  * @iova: start of the range
- * @size: size of the range. A size of 0 corresponds to the entire address
- * space.
+ * @end: end of the range
  *
- * On success, returns the number of unmapped bytes (>= size)
+ * On success, returns the number of unmapped bytes
  */
 static size_t viommu_del_mappings(struct viommu_domain *vdomain,
- unsigned long iova, size_t size)
+ u64 iova, u64 end)
 {
size_t unmapped = 0;
unsigned long flags;
-   unsigned long last = iova + size - 1;
struct viommu_mapping *mapping = NULL;
struct interval_tree_node *node, *next;
 
spin_lock_irqsave(>mappings_lock, flags);
-   next = interval_tree_iter_first(>mappings, iova, last);
+   next = interval_tree_iter_first(>mappings, iova, end);
while (next) {
node = next;
mapping = container_of(node, struct viommu_mapping, iova);
-   next = interval_tree_iter_next(node, iova, last);
+   next = interval_tree_iter_next(node, iova, end);
 
/* Trying to split a mapping? */
if (mapping->iova.start < iova)
@@ -656,8 +654,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   /* Free all remaining mappings (size 2^64) */
-   viommu_del_mappings(vdomain, 0, 0);
+   /* Free all remaining mappings */
+   viommu_del_mappings(vdomain, 0, ULLONG_MAX);
 
if (vdomain->viommu)
ida_free(>viommu->domain_ids, vdomain->id);
@@ -742,6 +740,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 {
int ret;
u32 flags;
+   u64 end = iova + size - 1;
struct virtio_iommu_req_map map;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -752,7 +751,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
if (flags & ~vdomain->map_flags)
return -EINVAL;
 
-   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
+   ret = viommu_add_mapping(vdomain, iova, end, paddr, flags);
if (ret)
return ret;
 
@@ -761,7 +760,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
.domain = cpu_to_le32(vdomain->id),
.virt_start = cpu_to_le64(iova),
.phys_start = cpu_to_le64(paddr),
-   .virt_end   = cpu_to_le64(iova + size - 1),
+   .virt_end   = cpu_to_le64(end),
.flags  = cpu_to_le32(flags),
};
 
@@ -770,7 +769,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 
ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
if (ret)
-   viommu_del_mappings(vdomain, iova, size);
+   viommu_del_mappings(vdomain, iova, end);
 
return ret;
 }
@@ -783,7 +782,7 @@ static size_t viommu_unmap(struct iommu_domain *domain, 
unsigned long iova,
struct virtio_iommu_req_unmap unmap;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   unmapped =

[PATCH 1/5] iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG

2021-10-13 Thread Jean-Philippe Brucker

Add definitions for the VIRTIO_IOMMU_F_BYPASS_CONFIG, which supersedes
VIRTIO_IOMMU_F_BYPASS.

Signed-off-by: Jean-Philippe Brucker 
---
 include/uapi/linux/virtio_iommu.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..cafd8cf7febf 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS  3
 #define VIRTIO_IOMMU_F_PROBE   4
 #define VIRTIO_IOMMU_F_MMIO5
+#define VIRTIO_IOMMU_F_BYPASS_CONFIG   6
 
 struct virtio_iommu_range_64 {
__le64  start;
@@ -36,6 +37,8 @@ struct virtio_iommu_config {
struct virtio_iommu_range_32domain_range;
/* Probe buffer size */
__le32  probe_size;
+   __u8bypass;
+   __u8reserved[7];
 };
 
 /* Request types */
@@ -66,11 +69,14 @@ struct virtio_iommu_req_tail {
__u8reserved[3];
 };
 
+#define VIRTIO_IOMMU_ATTACH_F_BYPASS   (1 << 0)
+
 struct virtio_iommu_req_attach {
struct virtio_iommu_req_headhead;
__le32  domain;
__le32  endpoint;
-   __u8reserved[8];
+   __le32  flags;
+   __u8reserved[4];
struct virtio_iommu_req_tailtail;
 };
 
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread Michael S. Tsirkin

On Wed, Oct 13, 2021 at 01:03:46PM +0200, David Hildenbrand wrote:
> On 13.10.21 12:55, Michael S. Tsirkin wrote:
> > This will enable cleanups down the road.
> > The idea is to disable cbs, then add "flush_queued_cbs" callback
> > as a parameter, this way drivers can flush any work
> > queued after callbacks have been disabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >   arch/um/drivers/virt-pci.c | 2 +-
> >   drivers/block/virtio_blk.c | 4 ++--
> >   drivers/bluetooth/virtio_bt.c  | 2 +-
> >   drivers/char/hw_random/virtio-rng.c| 2 +-
> >   drivers/char/virtio_console.c  | 4 ++--
> >   drivers/crypto/virtio/virtio_crypto_core.c | 8 
> >   drivers/firmware/arm_scmi/virtio.c | 2 +-
> >   drivers/gpio/gpio-virtio.c | 2 +-
> >   drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
> >   drivers/i2c/busses/i2c-virtio.c| 2 +-
> >   drivers/iommu/virtio-iommu.c   | 2 +-
> >   drivers/net/caif/caif_virtio.c | 2 +-
> >   drivers/net/virtio_net.c   | 4 ++--
> >   drivers/net/wireless/mac80211_hwsim.c  | 2 +-
> >   drivers/nvdimm/virtio_pmem.c   | 2 +-
> >   drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
> >   drivers/scsi/virtio_scsi.c | 2 +-
> >   drivers/virtio/virtio.c| 5 +
> >   drivers/virtio/virtio_balloon.c| 2 +-
> >   drivers/virtio/virtio_input.c  | 2 +-
> >   drivers/virtio/virtio_mem.c| 2 +-
> >   fs/fuse/virtio_fs.c| 4 ++--
> >   include/linux/virtio.h | 1 +
> >   net/9p/trans_virtio.c  | 2 +-
> >   net/vmw_vsock/virtio_transport.c   | 4 ++--
> >   sound/virtio/virtio_card.c | 4 ++--
> >   26 files changed, 39 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
> > index c08066633023..22c4d87c9c15 100644
> > --- a/arch/um/drivers/virt-pci.c
> > +++ b/arch/um/drivers/virt-pci.c
> > @@ -616,7 +616,7 @@ static void um_pci_virtio_remove(struct virtio_device 
> > *vdev)
> > int i;
> >   /* Stop all virtqueues */
> > -vdev->config->reset(vdev);
> > +virtio_reset_device(vdev);
> >   vdev->config->del_vqs(vdev);
> 
> Nit: virtio_device_reset()?
> 
> Because I see:
> 
> int virtio_device_freeze(struct virtio_device *dev);
> int virtio_device_restore(struct virtio_device *dev);
> void virtio_device_ready(struct virtio_device *dev)
> 
> But well, there is:
> void virtio_break_device(struct virtio_device *dev);

Exactly. I don't know what's best, so I opted for plain english :)


> -- 
> Thanks,
> 
> David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vduse: Disallow injecting interrupt before DRIVER_OK is set

2021-10-13 Thread Michael S. Tsirkin

On Thu, Sep 23, 2021 at 03:57:22PM +0800, Xie Yongji wrote:
> The interrupt callback should not be triggered before DRIVER_OK
> is set. Otherwise, it might break the virtio device driver.
> So let's add a check to avoid the unexpected behavior.
> 
> Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> Signed-off-by: Xie Yongji 

OK I guess, except - what about locking/ordering in all this?

> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 29a38ecba19e..972c13a7e5da 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -968,6 +968,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
> int cmd,
>   break;
>   }
>   case VDUSE_DEV_INJECT_CONFIG_IRQ:
> + ret = -EINVAL;
> + if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + break;
> +
>   ret = 0;
>   queue_work(vduse_irq_wq, >inject);
>   break;
> @@ -1047,6 +1051,10 @@ static long vduse_dev_ioctl(struct file *file, 
> unsigned int cmd,
>   case VDUSE_VQ_INJECT_IRQ: {
>   u32 index;
>  
> + ret = -EINVAL;
> + if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + break;
> +
>   ret = -EFAULT;
>   if (get_user(index, (u32 __user *)argp))
>   break;
> -- 
> 2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Christian Borntraeger





Am 13.10.21 um 12:10 schrieb Michael S. Tsirkin:

On Mon, Oct 11, 2021 at 07:39:21AM +0200, Halil Pasic wrote:

The virtio specification virtio-v1.1-cs01 states: "Transitional devices
MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
been acknowledged by the driver."  This is exactly what QEMU as of 6.1
has done relying solely on VIRTIO_F_VERSION_1 for detecting that.

However, the specification also says: "... the driver MAY read (but MUST
NOT write) the device-specific configuration fields to check that it can
support the device ..." before setting FEATURES_OK.

In that case, any transitional device relying solely on
VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
legacy format.  In particular, this implies that it is in big endian
format for big endian guests. This naturally confuses the driver which
expects little endian in the modern mode.

It is probably a good idea to amend the spec to clarify that
VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
is complete. Before validate callback existed, config space was only
read after FEATURES_OK. However, we already have two regressions, so
let's address this here as well.

The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
with DASD backing, because things simply don't work with the default.
See Fixes tags for relevant commits.

For QEMU, we can work around the issue by writing out the feature bits
with VIRTIO_F_VERSION_1 bit set.  We (ab)use the finalize_features
config op for this. This isn't enough to address all vhost devices since
these do not get the features until FEATURES_OK, however it looks like
the affected devices actually never handled the endianness for legacy
mode correctly, so at least that's not a regression.

No devices except virtio net and virtio blk seem to be affected.

Long term the right thing to do is to fix the hypervisors.

Cc:  #v4.11
Signed-off-by: Halil Pasic 
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
space")
Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
Reported-by: mark...@us.ibm.com
Reviewed-by: Cornelia Huck 


OK this looks good! How about a QEMU patch to make it spec compliant on
BE?


Who is going to do that? Halil? you? Conny?

Can we get this kernel patch queued for 5.15 and stable without waiting for the 
QEMU patch
as we have a regression with 4.14?



---

@Connie: I made some more commit message changes to accommodate Michael's
requests. I just assumed these will work or you as well and kept your
r-b. Please shout at me if it needs to be dropped :)
---
  drivers/virtio/virtio.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..236081afe9a2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d)
driver_features_legacy = driver_features;
}
  
+	/*

+* Some devices detect legacy solely via F_VERSION_1. Write
+* F_VERSION_1 to force LE config space accesses before FEATURES_OK for
+* these when needed.
+*/
+   if (drv->validate && !virtio_legacy_is_little_endian()
+ && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) {
+   dev->features = BIT_ULL(VIRTIO_F_VERSION_1);
+   dev->config->finalize_features(dev);
+   }
+
if (device_features & (1ULL << VIRTIO_F_VERSION_1))
dev->features = driver_features & device_features;
else

base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vduse: Fix race condition between resetting and irq injecting

2021-10-13 Thread Michael S. Tsirkin

On Wed, Sep 29, 2021 at 04:30:50PM +0800, Xie Yongji wrote:
> The interrupt might be triggered after a reset since there is
> no synchronization between resetting and irq injecting.

In fact, irq_lock is already used to synchronize with
irqs. Why isn't taking and releasing it enough?

> And it
> might break something if the interrupt is delayed until a new
> round of device initialization.
> 
> Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> Signed-off-by: Xie Yongji 
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 37 +
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index cefb301b2ee4..841667a896dd 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -80,6 +80,7 @@ struct vduse_dev {
>   struct vdpa_callback config_cb;
>   struct work_struct inject;
>   spinlock_t irq_lock;
> + struct rw_semaphore rwsem;
>   int minor;
>   bool broken;
>   bool connected;

What does this lock protect? Use a more descriptive name pls,
and maybe add a comment.


> @@ -410,6 +411,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
>   if (domain->bounce_map)
>   vduse_domain_reset_bounce_map(domain);
>  
> + down_write(>rwsem);
> +
>   dev->status = 0;
>   dev->driver_features = 0;
>   dev->generation++;
> @@ -443,6 +446,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
>   flush_work(>inject);
>   flush_work(>kick);
>   }
> +
> + up_write(>rwsem);
>  }
>  
>  static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> @@ -885,6 +890,23 @@ static void vduse_vq_irq_inject(struct work_struct *work)
>   spin_unlock_irq(>irq_lock);
>  }
>  
> +static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> + struct work_struct *irq_work)
> +{
> + int ret = -EINVAL;
> +
> + down_read(>rwsem);
> + if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + goto unlock;
> +
> + ret = 0;
> + queue_work(vduse_irq_wq, irq_work);
> +unlock:
> + up_read(>rwsem);
> +
> + return ret;
> +}
> +
>  static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
>   unsigned long arg)
>  {


so that's a lot of overhead for an irq.
Normally the way to address races like this is to add
flushing to the reset path, not locking to irq path.


> @@ -966,12 +988,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
> int cmd,
>   break;
>   }
>   case VDUSE_DEV_INJECT_CONFIG_IRQ:
> - ret = -EINVAL;
> - if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> - break;
> -
> - ret = 0;
> - queue_work(vduse_irq_wq, >inject);
> + ret = vduse_dev_queue_irq_work(dev, >inject);
>   break;
>   case VDUSE_VQ_SETUP: {
>   struct vduse_vq_config config;
> @@ -1049,10 +1066,6 @@ static long vduse_dev_ioctl(struct file *file, 
> unsigned int cmd,
>   case VDUSE_VQ_INJECT_IRQ: {
>   u32 index;
>  
> - ret = -EINVAL;
> - if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> - break;
> -
>   ret = -EFAULT;
>   if (get_user(index, (u32 __user *)argp))
>   break;
> @@ -1061,9 +1074,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
> int cmd,
>   if (index >= dev->vq_num)
>   break;
>  
> - ret = 0;
>   index = array_index_nospec(index, dev->vq_num);
> - queue_work(vduse_irq_wq, >vqs[index].inject);
> + ret = vduse_dev_queue_irq_work(dev, >vqs[index].inject);
>   break;
>   }
>   default:
> @@ -1144,6 +1156,7 @@ static struct vduse_dev *vduse_dev_create(void)
>   INIT_LIST_HEAD(>send_list);
>   INIT_LIST_HEAD(>recv_list);
>   spin_lock_init(>irq_lock);
> + init_rwsem(>rwsem);
>  
>   INIT_WORK(>inject, vduse_dev_irq_inject);
>   init_waitqueue_head(>waitq);
> -- 
> 2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vduse: Fix race condition between resetting and irq injecting

2021-10-13 Thread Michael S. Tsirkin

On Wed, Sep 29, 2021 at 04:50:24PM +0800, Yongji Xie wrote:
> On Wed, Sep 29, 2021 at 4:41 PM Jason Wang  wrote:
> >
> > On Wed, Sep 29, 2021 at 4:32 PM Xie Yongji  wrote:
> > >
> > > The interrupt might be triggered after a reset since there is
> > > no synchronization between resetting and irq injecting. And it
> > > might break something if the interrupt is delayed until a new
> > > round of device initialization.
> > >
> > > Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> > > Signed-off-by: Xie Yongji 
> > > ---
> > >  drivers/vdpa/vdpa_user/vduse_dev.c | 37 
> > > +
> > >  1 file changed, 25 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> > > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index cefb301b2ee4..841667a896dd 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -80,6 +80,7 @@ struct vduse_dev {
> > > struct vdpa_callback config_cb;
> > > struct work_struct inject;
> > > spinlock_t irq_lock;
> > > +   struct rw_semaphore rwsem;
> > > int minor;
> > > bool broken;
> > > bool connected;
> > > @@ -410,6 +411,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
> > > if (domain->bounce_map)
> > > vduse_domain_reset_bounce_map(domain);
> > >
> > > +   down_write(>rwsem);
> > > +
> > > dev->status = 0;
> > > dev->driver_features = 0;
> > > dev->generation++;
> > > @@ -443,6 +446,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
> > > flush_work(>inject);
> > > flush_work(>kick);
> > > }
> > > +
> > > +   up_write(>rwsem);
> >
> > Rethink about this, do we need to sync set_status() as well?
> >
> 
> I'm not sure. But I don't see any case that needs synchronization.
> 
> Thanks,
> Yongji

Well you are testing under a lock but it can change at any time...

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread David Hildenbrand


On 13.10.21 12:55, Michael S. Tsirkin wrote:

This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.

Signed-off-by: Michael S. Tsirkin 
---
  arch/um/drivers/virt-pci.c | 2 +-
  drivers/block/virtio_blk.c | 4 ++--
  drivers/bluetooth/virtio_bt.c  | 2 +-
  drivers/char/hw_random/virtio-rng.c| 2 +-
  drivers/char/virtio_console.c  | 4 ++--
  drivers/crypto/virtio/virtio_crypto_core.c | 8 
  drivers/firmware/arm_scmi/virtio.c | 2 +-
  drivers/gpio/gpio-virtio.c | 2 +-
  drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
  drivers/i2c/busses/i2c-virtio.c| 2 +-
  drivers/iommu/virtio-iommu.c   | 2 +-
  drivers/net/caif/caif_virtio.c | 2 +-
  drivers/net/virtio_net.c   | 4 ++--
  drivers/net/wireless/mac80211_hwsim.c  | 2 +-
  drivers/nvdimm/virtio_pmem.c   | 2 +-
  drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
  drivers/scsi/virtio_scsi.c | 2 +-
  drivers/virtio/virtio.c| 5 +
  drivers/virtio/virtio_balloon.c| 2 +-
  drivers/virtio/virtio_input.c  | 2 +-
  drivers/virtio/virtio_mem.c| 2 +-
  fs/fuse/virtio_fs.c| 4 ++--
  include/linux/virtio.h | 1 +
  net/9p/trans_virtio.c  | 2 +-
  net/vmw_vsock/virtio_transport.c   | 4 ++--
  sound/virtio/virtio_card.c | 4 ++--
  26 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
index c08066633023..22c4d87c9c15 100644
--- a/arch/um/drivers/virt-pci.c
+++ b/arch/um/drivers/virt-pci.c
@@ -616,7 +616,7 @@ static void um_pci_virtio_remove(struct virtio_device *vdev)
int i;
  
  /* Stop all virtqueues */

-vdev->config->reset(vdev);
+virtio_reset_device(vdev);
  vdev->config->del_vqs(vdev);


Nit: virtio_device_reset()?

Because I see:

int virtio_device_freeze(struct virtio_device *dev);
int virtio_device_restore(struct virtio_device *dev);
void virtio_device_ready(struct virtio_device *dev)

But well, there is:
void virtio_break_device(struct virtio_device *dev);

--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread Viresh Kumar

On 13-10-21, 06:55, Michael S. Tsirkin wrote:
> This will enable cleanups down the road.
> The idea is to disable cbs, then add "flush_queued_cbs" callback
> as a parameter, this way drivers can flush any work
> queued after callbacks have been disabled.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/gpio/gpio-virtio.c | 2 +-
>  drivers/i2c/busses/i2c-virtio.c| 2 +-

Reviewed-by: Viresh Kumar 

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread Michael S. Tsirkin

This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.

Signed-off-by: Michael S. Tsirkin 
---
 arch/um/drivers/virt-pci.c | 2 +-
 drivers/block/virtio_blk.c | 4 ++--
 drivers/bluetooth/virtio_bt.c  | 2 +-
 drivers/char/hw_random/virtio-rng.c| 2 +-
 drivers/char/virtio_console.c  | 4 ++--
 drivers/crypto/virtio/virtio_crypto_core.c | 8 
 drivers/firmware/arm_scmi/virtio.c | 2 +-
 drivers/gpio/gpio-virtio.c | 2 +-
 drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
 drivers/i2c/busses/i2c-virtio.c| 2 +-
 drivers/iommu/virtio-iommu.c   | 2 +-
 drivers/net/caif/caif_virtio.c | 2 +-
 drivers/net/virtio_net.c   | 4 ++--
 drivers/net/wireless/mac80211_hwsim.c  | 2 +-
 drivers/nvdimm/virtio_pmem.c   | 2 +-
 drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
 drivers/scsi/virtio_scsi.c | 2 +-
 drivers/virtio/virtio.c| 5 +
 drivers/virtio/virtio_balloon.c| 2 +-
 drivers/virtio/virtio_input.c  | 2 +-
 drivers/virtio/virtio_mem.c| 2 +-
 fs/fuse/virtio_fs.c| 4 ++--
 include/linux/virtio.h | 1 +
 net/9p/trans_virtio.c  | 2 +-
 net/vmw_vsock/virtio_transport.c   | 4 ++--
 sound/virtio/virtio_card.c | 4 ++--
 26 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
index c08066633023..22c4d87c9c15 100644
--- a/arch/um/drivers/virt-pci.c
+++ b/arch/um/drivers/virt-pci.c
@@ -616,7 +616,7 @@ static void um_pci_virtio_remove(struct virtio_device *vdev)
int i;
 
 /* Stop all virtqueues */
-vdev->config->reset(vdev);
+virtio_reset_device(vdev);
 vdev->config->del_vqs(vdev);
 
device_set_wakeup_enable(>dev, false);
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 303caf2d17d0..83d0af3fbf30 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -910,7 +910,7 @@ static void virtblk_remove(struct virtio_device *vdev)
mutex_lock(>vdev_mutex);
 
/* Stop all the virtqueues. */
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
/* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
vblk->vdev = NULL;
@@ -929,7 +929,7 @@ static int virtblk_freeze(struct virtio_device *vdev)
struct virtio_blk *vblk = vdev->priv;
 
/* Ensure we don't receive any more interrupts */
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
/* Make sure no work handler is accessing the device. */
flush_work(>config_work);
diff --git a/drivers/bluetooth/virtio_bt.c b/drivers/bluetooth/virtio_bt.c
index 57908ce4fae8..24a9258962fa 100644
--- a/drivers/bluetooth/virtio_bt.c
+++ b/drivers/bluetooth/virtio_bt.c
@@ -364,7 +364,7 @@ static void virtbt_remove(struct virtio_device *vdev)
struct hci_dev *hdev = vbt->hdev;
 
hci_unregister_dev(hdev);
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
hci_free_dev(hdev);
vbt->hdev = NULL;
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index a90001e02bf7..95980489514b 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -134,7 +134,7 @@ static void remove_common(struct virtio_device *vdev)
vi->hwrng_removed = true;
vi->data_avail = 0;
complete(>have_data);
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
vi->busy = false;
if (vi->hwrng_register_done)
hwrng_unregister(>hwrng);
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7a86..08bbd693436f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1957,7 +1957,7 @@ static void virtcons_remove(struct virtio_device *vdev)
spin_unlock_irq(_lock);
 
/* Disable interrupts for vqs */
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
/* Finish up work that's lined up */
if (use_multiport(portdev))
cancel_work_sync(>control_work);
@@ -2139,7 +2139,7 @@ static int virtcons_freeze(struct virtio_device *vdev)
 
portdev = vdev->priv;
 
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
if (use_multiport(portdev))
virtqueue_disable_cb(portdev->c_ivq);
diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
b/drivers/crypto/virtio/virtio_crypto_core.c
index e2375d992308..8e977b7627cb 100644
--- a/drivers/crypto/virtio/virtio_crypto_core.c
+++

Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-13 Thread Michael S. Tsirkin

On Mon, Oct 11, 2021 at 07:39:21AM +0200, Halil Pasic wrote:
> The virtio specification virtio-v1.1-cs01 states: "Transitional devices
> MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
> been acknowledged by the driver."  This is exactly what QEMU as of 6.1
> has done relying solely on VIRTIO_F_VERSION_1 for detecting that.
> 
> However, the specification also says: "... the driver MAY read (but MUST
> NOT write) the device-specific configuration fields to check that it can
> support the device ..." before setting FEATURES_OK.
> 
> In that case, any transitional device relying solely on
> VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
> legacy format.  In particular, this implies that it is in big endian
> format for big endian guests. This naturally confuses the driver which
> expects little endian in the modern mode.
> 
> It is probably a good idea to amend the spec to clarify that
> VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> is complete. Before validate callback existed, config space was only
> read after FEATURES_OK. However, we already have two regressions, so
> let's address this here as well.
> 
> The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
> the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
> virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
> with DASD backing, because things simply don't work with the default.
> See Fixes tags for relevant commits.
> 
> For QEMU, we can work around the issue by writing out the feature bits
> with VIRTIO_F_VERSION_1 bit set.  We (ab)use the finalize_features
> config op for this. This isn't enough to address all vhost devices since
> these do not get the features until FEATURES_OK, however it looks like
> the affected devices actually never handled the endianness for legacy
> mode correctly, so at least that's not a regression.
> 
> No devices except virtio net and virtio blk seem to be affected.
> 
> Long term the right thing to do is to fix the hypervisors.
> 
> Cc:  #v4.11
> Signed-off-by: Halil Pasic 
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
> space")
> Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
> Reported-by: mark...@us.ibm.com
> Reviewed-by: Cornelia Huck 

OK this looks good! How about a QEMU patch to make it spec compliant on
BE?

> ---
> 
> @Connie: I made some more commit message changes to accommodate Michael's
> requests. I just assumed these will work or you as well and kept your
> r-b. Please shout at me if it needs to be dropped :)
> ---
>  drivers/virtio/virtio.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..236081afe9a2 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d)
>   driver_features_legacy = driver_features;
>   }
>  
> + /*
> +  * Some devices detect legacy solely via F_VERSION_1. Write
> +  * F_VERSION_1 to force LE config space accesses before FEATURES_OK for
> +  * these when needed.
> +  */
> + if (drv->validate && !virtio_legacy_is_little_endian()
> +   && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) {
> + dev->features = BIT_ULL(VIRTIO_F_VERSION_1);
> + dev->config->finalize_features(dev);
> + }
> +
>   if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>   dev->features = driver_features & device_features;
>   else
> 
> base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 02/12] virtio: add doc for validate() method

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:17PM +0800, Jason Wang wrote:
> This patch adds doc for validate() method.
> 
> Signed-off-by: Jason Wang 

And maybe document __virtio_clear_bit : it says
"for use by transports" and now it's also legal in the
validate callback.

> ---
>  include/linux/virtio.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..0cd8685aeba4 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -152,6 +152,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev);
>   * @feature_table_size: number of entries in the feature table array.
>   * @feature_table_legacy: same as feature_table but when working in legacy 
> mode.
>   * @feature_table_size_legacy: number of entries in feature table legacy 
> array.
> + * @validate: optional function to do early checks
>   * @probe: the function to call when a device is found.  Returns 0 or -errno.
>   * @scan: optional function to call after successful probe; intended
>   *for virtio-scsi to invoke a scan.
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 01/12] virtio-blk: validate num_queues during probe

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:16PM +0800, Jason Wang wrote:
> If an untrusted device neogitates BLK_F_MQ but advertises a zero
> num_queues, the driver may end up trying to allocating zero size
> buffers where ZERO_SIZE_PTR is returned which may pass the checking
> against the NULL. This will lead unexpected results.
> 
> Fixing this by using single queue if num_queues is zero.
> 
> Cc: Paolo Bonzini 
> Cc: Stefan Hajnoczi 
> Cc: Stefano Garzarella 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Jason Wang 

I'd rather fail probe so we don't need to support that.

> ---
>  drivers/block/virtio_blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 9b3bd083b411..9deff01a38cb 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -495,7 +495,8 @@ static int init_vq(struct virtio_blk *vblk)
>   err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
>  struct virtio_blk_config, num_queues,
>  _vqs);
> - if (err)
> + /* We need at least one virtqueue */
> + if (err || !num_vqs)
>   num_vqs = 1;
>  
>   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 09/12] virtio_ring: validate used buffer length

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:24PM +0800, Jason Wang wrote:
> This patch validate the used buffer length provided by the device
> before trying to use it. This is done by record the in buffer length
> in a new field in desc_state structure during virtqueue_add(), then we
> can fail the virtqueue_get_buf() when we find the device is trying to
> give us a used buffer length which is greater than the in buffer
> length.
> 
> Since some drivers have already done the validation by itself, this

by themselves

> patch tries to makes the core validation optional. For the driver that
> doesn't want the validation, it can set the validate_used to be
> true (which could be overridden by force_used_validation). To be more
> efficient, a dedicate array is used for storing the validate used
> length, this helps to eliminate the cache stress if validation is done
> by the driver.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio_ring.c | 56 
>  include/linux/virtio.h   |  2 ++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d2ca0a7365f8..dee962bd8b04 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -14,6 +14,9 @@
>  #include 
>  #include 
>  
> +static bool force_used_validation = false;
> +module_param(force_used_validation, bool, 0444);
> +
>  #ifdef DEBUG
>  /* For development, we want to crash whenever the ring is screwed. */
>  #define BAD_RING(_vq, fmt, args...)  \
> @@ -182,6 +185,9 @@ struct vring_virtqueue {
>   } packed;
>   };
>  
> + /* Per-descriptor in buffer length */
> + u32 *buflen;
> +
>   /* How to notify other side. FIXME: commonalize hcalls! */
>   bool (*notify)(struct virtqueue *vq);
>  
> @@ -490,6 +496,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   unsigned int i, n, avail, descs_used, prev, err_idx;
>   int head;
>   bool indirect;
> + u32 buflen = 0;
>  
>   START_USE(vq);
>  
> @@ -571,6 +578,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>VRING_DESC_F_NEXT |
>VRING_DESC_F_WRITE,
>indirect);
> + buflen += sg->length;
>   }
>   }
>   /* Last one doesn't continue. */
> @@ -610,6 +618,10 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   else
>   vq->split.desc_state[head].indir_desc = ctx;
>  
> + /* Store in buffer length if necessary */
> + if (vq->buflen)
> + vq->buflen[head] = buflen;
> +
>   /* Put entry in available array (but don't update avail->idx until they
>* do sync). */
>   avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> @@ -784,6 +796,11 @@ static void *virtqueue_get_buf_ctx_split(struct 
> virtqueue *_vq,
>   BAD_RING(vq, "id %u is not a head!\n", i);
>   return NULL;
>   }
> + if (vq->buflen && unlikely(*len > vq->buflen[i])) {
> + BAD_RING(vq, "used len %d is larger than in buflen %u\n",
> + *len, vq->buflen[i]);
> + return NULL;
> + }
>  
>   /* detach_buf_split clears data, so grab it now. */
>   ret = vq->split.desc_state[i].data;
> @@ -1062,6 +1079,7 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
>   unsigned int i, n, err_idx;
>   u16 head, id;
>   dma_addr_t addr;
> + u32 buflen = 0;
>  
>   head = vq->packed.next_avail_idx;
>   desc = alloc_indirect_packed(total_sg, gfp);
> @@ -1089,6 +1107,8 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
>   desc[i].addr = cpu_to_le64(addr);
>   desc[i].len = cpu_to_le32(sg->length);
>   i++;
> + if (n >= out_sgs)
> + buflen += sg->length;
>   }
>   }
>  
> @@ -1142,6 +1162,10 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
>   vq->packed.desc_state[id].indir_desc = desc;
>   vq->packed.desc_state[id].last = id;
>  
> + /* Store in buffer length if necessary */
> + if (vq->buflen)
> + vq->buflen[id] = buflen;
> +
>   vq->num_added += 1;
>  
>   pr_debug("Added buffer head %i to %p\n", head, vq);
> @@ -1176,6 +1200,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
>   unsigned int i, n, c, descs_used, err_idx;
>   __le16 head_flags, flags;
>   u16 head, id, prev, curr, avail_used_flags;
> + u32 buflen = 0;
>  
>   START_USE(vq);
>  
> @@ -1250,6 +1275,8 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
>   1 <<

Re: [PATCH V2 06/12] virtio_pci: harden MSI-X interrupts

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:21PM +0800, Jason Wang wrote:
> We used to synchronize pending MSI-X irq handlers via
> synchronize_irq(), this may not work for the untrusted device which
> may keep sending interrupts after reset which may lead unexpected
> results. Similarly, we should not enable MSI-X interrupt until the
> device is ready.

You mean until the driver is ready.

> So this patch fixes those two issues by:
> 
> 1) switching to use disable_irq() to prevent the virtio interrupt
>handlers to be called after the device is reset.
> 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> 
> This can make sure the virtio interrupt handler won't be called before
> virtio_device_ready() and after reset.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Paul E. McKenney 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio_pci_common.c | 27 +--
>  drivers/virtio/virtio_pci_common.h |  6 --
>  drivers/virtio/virtio_pci_legacy.c |  5 +++--
>  drivers/virtio/virtio_pci_modern.c |  6 --
>  4 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index b35bb2d57f62..0b9523e6dd39 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
>"Force legacy mode for transitional virtio 1 devices");
>  #endif
>  
> -/* wait for pending irq handlers */
> -void vp_synchronize_vectors(struct virtio_device *vdev)
> +/* disable irq handlers */
> +void vp_disable_vectors(struct virtio_device *vdev)
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   int i;
> @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>   synchronize_irq(vp_dev->pci_dev->irq);
>  
>   for (i = 0; i < vp_dev->msix_vectors; ++i)
> - synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> + disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +}
> +
> +/* enable irq handlers */
> +void vp_enable_vectors(struct virtio_device *vdev)
> +{
> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> + int i;
> +
> + if (vp_dev->intx_enabled)
> + return;
> +
> + for (i = 0; i < vp_dev->msix_vectors; ++i)
> + enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>  }
>  
>  /* the notify function used when creating a virt queue */
> @@ -141,7 +154,8 @@ static int vp_request_msix_vectors(struct virtio_device 
> *vdev, int nvectors,
>   snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
>"%s-config", name);
>   err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> -   vp_config_changed, 0, vp_dev->msix_names[v],
> +   vp_config_changed, IRQF_NO_AUTOEN,
> +   vp_dev->msix_names[v],
> vp_dev);
>   if (err)
>   goto error;
> @@ -160,7 +174,8 @@ static int vp_request_msix_vectors(struct virtio_device 
> *vdev, int nvectors,
>   snprintf(vp_dev->msix_names[v], sizeof *vp_dev->msix_names,
>"%s-virtqueues", name);
>   err = request_irq(pci_irq_vector(vp_dev->pci_dev, v),
> -   vp_vring_interrupt, 0, vp_dev->msix_names[v],
> +   vp_vring_interrupt, IRQF_NO_AUTOEN,
> +   vp_dev->msix_names[v],
> vp_dev);
>   if (err)
>   goto error;
> @@ -337,7 +352,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>"%s-%s",
>dev_name(_dev->vdev.dev), names[i]);
>   err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
> -   vring_interrupt, 0,
> +   vring_interrupt, IRQF_NO_AUTOEN,
> vp_dev->msix_names[msix_vec],
> vqs[i]);
>   if (err)
> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index beec047a8f8d..a235ce9ff6a5 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -102,8 +102,10 @@ static struct virtio_pci_device *to_vp_device(struct 
> virtio_device *vdev)
>   return container_of(vdev, struct virtio_pci_device, vdev);
>  }
>  
> -/* wait for pending irq handlers */
> -void vp_synchronize_vectors(struct virtio_device *vdev);
> +/* disable irq handlers */
> +void vp_disable_vectors(struct virtio_device *vdev);
> +/* enable irq handlers */
> +void vp_enable_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq);
>  /* the config->del_vqs() implementation */
> diff --git

Re: [PATCH V2 05/12] virtio_config: introduce a new ready method

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:20PM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang 
> ---
>  include/linux/virtio_config.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 8519b3ae5d52..f2891c6221a1 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -23,6 +23,8 @@ struct virtio_shm_region {
>   *   any of @get/@set, @get_status/@set_status, or @get_features/
>   *   @finalize_features are NOT safe to be called from an atomic
>   *   context.
> + * @ready: make the device ready
> + *  vdev: the virtio_device
>   * @get: read the value of a configuration field
>   *   vdev: the virtio_device
>   *   offset: the offset of the configuration field

I think it's too vague. device_ready makes device ready to receive
kicks. What does this one do? I don't like calling things by where
they are called from, I prefer calling them by what they do.
It actually enables callbacks, right?
So enable_cbs?


> @@ -75,6 +77,7 @@ struct virtio_shm_region {
>   */
>  typedef void vq_callback_t(struct virtqueue *);
>  struct virtio_config_ops {
> + void (*ready)(struct virtio_device *vdev);
>   void (*get)(struct virtio_device *vdev, unsigned offset,
>   void *buf, unsigned len);
>   void (*set)(struct virtio_device *vdev, unsigned offset,
> @@ -229,6 +232,9 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>   unsigned status = dev->config->get_status(dev);
>  
> + if (dev->config->ready)
> +  dev->config->ready(dev);
> +
>   BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
>   dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>  }
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 03/12] virtio-console: switch to use .validate()

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:18PM +0800, Jason Wang wrote:
> This patch switches to use validate() to filter out the features that
> is not supported by the rproc.

are not supported

> 
> Cc: Amit Shah 
> Signed-off-by: Jason Wang 


Does this have anything to do with hardening?

It seems cleaner to not negotiate features we do not use,
but given we did this for many years ... should we bother
at this point?


> ---
>  drivers/char/virtio_console.c | 41 ++-
>  1 file changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index 7eaf303a7a86..daeed31df622 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -1172,9 +1172,7 @@ static void resize_console(struct port *port)
>  
>   vdev = port->portdev->vdev;
>  
> - /* Don't test F_SIZE at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> - virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
> + if (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE))
>   hvc_resize(port->cons.hvc, port->cons.ws);
>  }
>  
> @@ -1981,6 +1979,29 @@ static void virtcons_remove(struct virtio_device *vdev)
>   kfree(portdev);
>  }
>  
> +static int virtcons_validate(struct virtio_device *vdev)
> +{
> + if (is_rproc_serial(vdev)) {
> + /* Don't test F_SIZE at all if we're rproc: not a
> +  * valid feature! */


This comment needs to be fixed now. And the format's wrong
since you made it a multi-line comment.
Should be
/*
 * like
 * this
 */

> + __virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_SIZE);
> + /* Don't test MULTIPORT at all if we're rproc: not a
> +  * valid feature! */
> + __virtio_clear_bit(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
> + }
> +
> + /* We only need a config space if features are offered */
> + if (!vdev->config->get &&
> + (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
> +  || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
> + dev_err(>dev, "%s failure: config access disabled\n",
> + __func__);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  /*
>   * Once we're further in boot, we get probed like any other virtio
>   * device.

This switches the order of tests around, so if an rproc device
offers VIRTIO_CONSOLE_F_SIZE or VIRTIO_CONSOLE_F_MULTIPORT
without get it will now try to work instead of failing.

Which is maybe a worthy goal, but given rproc does not support
virtio 1.0 it also risks trying to drive something completely
unreasonable.

Overall does not feel like hardening which is supposed to make
things more strict, not less.


> @@ -1996,15 +2017,6 @@ static int virtcons_probe(struct virtio_device *vdev)
>   bool multiport;
>   bool early = early_put_chars != NULL;
>  
> - /* We only need a config space if features are offered */
> - if (!vdev->config->get &&
> - (virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
> -  || virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
> - dev_err(>dev, "%s failure: config access disabled\n",
> - __func__);
> - return -EINVAL;
> - }
> -
>   /* Ensure to read early_put_chars now */
>   barrier();
>  
> @@ -2031,9 +2043,7 @@ static int virtcons_probe(struct virtio_device *vdev)
>   multiport = false;
>   portdev->max_nr_ports = 1;
>  
> - /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> - virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
> + if (virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>>max_nr_ports) == 0) {
>   multiport = true;
> @@ -2210,6 +2220,7 @@ static struct virtio_driver virtio_console = {
>   .driver.name =  KBUILD_MODNAME,
>   .driver.owner = THIS_MODULE,
>   .id_table = id_table,
> + .validate = virtcons_validate,
>   .probe =virtcons_probe,
>   .remove =   virtcons_remove,
>   .config_changed = config_intr,
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 07/12] virtio-pci: harden INTX interrupts

2021-10-13 Thread Michael S. Tsirkin

On Tue, Oct 12, 2021 at 02:52:22PM +0800, Jason Wang wrote:
> This patch tries to make sure the virtio interrupt handler for INTX
> won't be called after a reset and before virtio_device_ready(). We
> can't use IRQF_NO_AUTOEN since we're using shared interrupt
> (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
> intx_soft_enabled variable and toggle it during in
> vp_disable/enable_vectors(). The INTX interrupt handler will check
> intx_soft_enabled before processing the actual interrupt.
> 
> Cc: Boqun Feng 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Paul E. McKenney 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio_pci_common.c | 24 ++--
>  drivers/virtio/virtio_pci_common.h |  1 +
>  2 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 0b9523e6dd39..5ae6a2a4eb77 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -30,8 +30,16 @@ void vp_disable_vectors(struct virtio_device *vdev)
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   int i;
>  
> - if (vp_dev->intx_enabled)
> + if (vp_dev->intx_enabled) {
> + /*
> +  * The below synchronize() guarantees that any
> +  * interrupt for this line arriving after
> +  * synchronize_irq() has completed is guaranteed to see
> +  * intx_soft_enabled == false.
> +  */
> + WRITE_ONCE(vp_dev->intx_soft_enabled, false);
>   synchronize_irq(vp_dev->pci_dev->irq);
> + }
>  
>   for (i = 0; i < vp_dev->msix_vectors; ++i)
>   disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> @@ -43,8 +51,16 @@ void vp_enable_vectors(struct virtio_device *vdev)
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   int i;
>  
> - if (vp_dev->intx_enabled)
> + if (vp_dev->intx_enabled) {
> + disable_irq(vp_dev->pci_dev->irq);
> + /*
> +  * The above disable_irq() provides TSO ordering and
> +  * as such promotes the below store to store-release.
> +  */
> + WRITE_ONCE(vp_dev->intx_soft_enabled, true);
> + enable_irq(vp_dev->pci_dev->irq);
>   return;
> + }
>  
>   for (i = 0; i < vp_dev->msix_vectors; ++i)
>   enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> @@ -97,6 +113,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
>   struct virtio_pci_device *vp_dev = opaque;
>   u8 isr;
>  
> + /* read intx_soft_enabled before read others */
> + if (!smp_load_acquire(_dev->intx_soft_enabled))
> + return IRQ_NONE;
> +
>   /* reading the ISR has the effect of also clearing it so it's very
>* important to save off the value. */
>   isr = ioread8(vp_dev->isr);

I don't see why we need this ordering guarantee here.

synchronize_irq above makes sure no interrupt handler
is in progress. the handler itself thus does not need
any specific order, it is ok if intx_soft_enabled is read
after, not before the rest of it.

Just READ_ONCE should be enough, and we can drop the comment.


> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index a235ce9ff6a5..3c06e0f92ee4 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -64,6 +64,7 @@ struct virtio_pci_device {
>   /* MSI-X support */
>   int msix_enabled;
>   int intx_enabled;
> + bool intx_soft_enabled;
>   cpumask_var_t *msix_affinity_masks;
>   /* Name strings for interrupts. This size should be enough,
>* and I'm too lazy to allocate each name separately. */
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

49 matches

Mail list logo