Re: virtio-gpu dedicated heap

2022-03-03 Thread Michael S. Tsirkin
$ ./scripts/get_maintainer.pl -f ./drivers/gpu/drm/virtio/

David Airlie  (maintainer:VIRTIO GPU DRIVER)
Gerd Hoffmann  (maintainer:VIRTIO GPU DRIVER)
Daniel Vetter  (maintainer:DRM DRIVERS)
dri-de...@lists.freedesktop.org (open list:VIRTIO GPU DRIVER)
virtualizat...@lists.linux-foundation.org (open list:VIRTIO GPU DRIVER)
linux-ker...@vger.kernel.org (open list)

You might want to CC these people.

On Thu, Mar 03, 2022 at 08:07:03PM -0800, Gurchetan Singh wrote:
> +iommu@lists.linux-foundation.org not iommu-request
> 
> On Thu, Mar 3, 2022 at 8:05 PM Gurchetan Singh 
> wrote:
> 
> Hi everyone,
> 
> With the current virtio setup, all of guest memory is shared with host
> devices.  There has been interest in changing this, to improve isolation 
> of
> guest memory and increase confidentiality.  
> 
> The recently introduced restricted DMA mechanism makes excellent progress
> in this area:
> 
> https://patchwork.kernel.org/project/xen-devel/cover/
> 20210624155526.2775863-1-tien...@chromium.org/  
> 
> Devices without an IOMMU (traditional virtio devices for example) would
> allocate from a specially designated region.  Swiotlb bouncing is done for
> all DMA transfers.  This is controlled by the VIRTIO_F_ACCESS_PLATFORM
> feature bit.
> 
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/
> 3064198
> 
> This mechanism works great for the devices it was designed for, such as
> virtio-net.  However, when trying to adapt to it for other devices, there
> are some limitations.  
> 
> It would be great to have a dedicated heap for virtio-gpu rather than
> allocating from guest memory.  
> 
> We would like to use dma_alloc_noncontiguous on the restricted dma pool,
> ideally with page-level granularity somehow.  Continuous buffers are
> definitely going out of fashion.
> 
> There are two considerations when using it with the restricted DMA
> approach:
> 
> 1) No bouncing (aka memcpy)
> 
> Expensive with graphics buffers, since guest user space would designate
> shareable graphics buffers with the host.  We plan to use
> DMA_ATTR_SKIP_CPU_SYNC when doing any DMA transactions with GPU buffers.
> 
> Bounce buffering will be utilized with virtio-cmds, like the other virtio
> devices that use the restricted DMA mechanism.
> 
> 2) IO_TLB_SEGSIZE is too small for graphics buffers
> 
> This issue was hit before here too:
> 
> https://www.spinics.net/lists/kernel/msg4154086.html
> 
> The suggestion was to use shared-dma-pool rather than restricted DMA.  But
> we're not sure a single device can have restricted DMA (for
> VIRTIO_F_ACCESS_PLATFORM) and shared-dma-pool (for larger buffers) at the
> same time.  Does anyone know? 
> 
> If not, it sounds like "splitting the allocation into dma_max_mapping_size
> () chunks" for restricted-dma is also possible.  What is the preferred
> method?
> 
> More generally, we would love more feedback on the proposed design or
> consider alternatives!
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/iova: Separate out rcache init

2022-02-03 Thread Michael S. Tsirkin
On Thu, Feb 03, 2022 at 05:59:20PM +0800, John Garry wrote:
> Currently the rcache structures are allocated for all IOVA domains, even if
> they do not use "fast" alloc+free interface. This is wasteful of memory.
> 
> In addition, fails in init_iova_rcaches() are not handled safely, which is
> less than ideal.
> 
> Make "fast" users call a separate rcache init explicitly, which includes
> error checking.
> 
> Signed-off-by: John Garry 

virtio things:

Acked-by: Michael S. Tsirkin 

> ---
> Differences to v1:
> - Drop stubs for iova_domain_init_rcaches() and iova_domain_free_rcaches()
> - Use put_iova_domain() in vdpa code
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index d85d54f2b549..b22034975301 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -525,6 +525,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   unsigned long order, base_pfn;
>   struct iova_domain *iovad;
> + int ret;
>  
>   if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
>   return -EINVAL;
> @@ -559,6 +560,9 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   }
>  
>   init_iova_domain(iovad, 1UL << order, base_pfn);
> + ret = iova_domain_init_rcaches(iovad);
> + if (ret)
> + return ret;
>  
>   /* If the FQ fails we can simply fall back to strict mode */
>   if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b28c9435b898..7e9c3a97c040 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -15,13 +15,14 @@
>  /* The anchor node sits above the top of the usable address space */
>  #define IOVA_ANCHOR  ~0UL
>  
> +#define IOVA_RANGE_CACHE_MAX_SIZE 6  /* log of max cached IOVA range size 
> (in pages) */
> +
>  static bool iova_rcache_insert(struct iova_domain *iovad,
>  unsigned long pfn,
>  unsigned long size);
>  static unsigned long iova_rcache_get(struct iova_domain *iovad,
>unsigned long size,
>unsigned long limit_pfn);
> -static void init_iova_rcaches(struct iova_domain *iovad);
>  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
> *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  
> @@ -64,8 +65,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
> granule,
>   iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
>   rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
>   rb_insert_color(>anchor.node, >rbroot);
> - cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
> >cpuhp_dead);
> - init_iova_rcaches(iovad);
>  }
>  EXPORT_SYMBOL_GPL(init_iova_domain);
>  
> @@ -488,6 +487,13 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
> pfn, unsigned long size)
>  }
>  EXPORT_SYMBOL_GPL(free_iova_fast);
>  
> +static void iova_domain_free_rcaches(struct iova_domain *iovad)
> +{
> + cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,
> + >cpuhp_dead);
> + free_iova_rcaches(iovad);
> +}
> +
>  /**
>   * put_iova_domain - destroys the iova domain
>   * @iovad: - iova domain in question.
> @@ -497,9 +503,9 @@ void put_iova_domain(struct iova_domain *iovad)
>  {
>   struct iova *iova, *tmp;
>  
> - cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,
> - >cpuhp_dead);
> - free_iova_rcaches(iovad);
> + if (iovad->rcaches)
> + iova_domain_free_rcaches(iovad);
> +
>   rbtree_postorder_for_each_entry_safe(iova, tmp, >rbroot, node)
>   free_iova_mem(iova);
>  }
> @@ -608,6 +614,7 @@ EXPORT_SYMBOL_GPL(reserve_iova);
>   */
>  
>  #define IOVA_MAG_SIZE 128
> +#define MAX_GLOBAL_MAGS 32   /* magazines per bin */
>  
>  struct iova_magazine {
>   unsigned long size;
> @@ -620,6 +627,13 @@ struct iova_cpu_rcache {
>   struct iova_magazine *prev;
>  };
>  
> +struct iova_rcache {
> + spinlock_t lock;
> + unsigned long depot_size;
> + struct iova_magazine *depot[MAX_GLOBAL_MAGS];
> + struct iova_cpu_rcache __percpu *cpu_rcaches;
> +};
> +
>  static struct iova_magazine *iova_magazine_alloc(gfp_t flags)
>  {
>   return kzalloc(sizeof(struct iova_magazine), flags);

Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace

2022-01-11 Thread Michael S. Tsirkin
On Tue, Jan 11, 2022 at 08:57:49PM +0800, Yongji Xie wrote:
> On Tue, Jan 11, 2022 at 7:54 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jan 11, 2022 at 11:31:37AM +0800, Yongji Xie wrote:
> > > On Mon, Jan 10, 2022 at 11:44 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Jan 10, 2022 at 11:24:40PM +0800, Yongji Xie wrote:
> > > > > On Mon, Jan 10, 2022 at 11:10 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote:
> > > > > > > On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote:
> > > > > > > > > This series introduces a framework that makes it possible to 
> > > > > > > > > implement
> > > > > > > > > software-emulated vDPA devices in userspace. And to make the 
> > > > > > > > > device
> > > > > > > > > emulation more secure, the emulated vDPA device's control 
> > > > > > > > > path is handled
> > > > > > > > > in the kernel and only the data path is implemented in the 
> > > > > > > > > userspace.
> > > > > > > > >
> > > > > > > > > Since the emuldated vDPA device's control path is handled in 
> > > > > > > > > the kernel,
> > > > > > > > > a message mechnism is introduced to make userspace be aware 
> > > > > > > > > of the data
> > > > > > > > > path related changes. Userspace can use read()/write() to 
> > > > > > > > > receive/reply
> > > > > > > > > the control messages.
> > > > > > > > >
> > > > > > > > > In the data path, the core is mapping dma buffer into VDUSE 
> > > > > > > > > daemon's
> > > > > > > > > address space, which can be implemented in different ways 
> > > > > > > > > depending on
> > > > > > > > > the vdpa bus to which the vDPA device is attached.
> > > > > > > > >
> > > > > > > > > In virtio-vdpa case, we implements a MMU-based software IOTLB 
> > > > > > > > > with
> > > > > > > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa 
> > > > > > > > > case, the dma
> > > > > > > > > buffer is reside in a userspace memory region which can be 
> > > > > > > > > shared to the
> > > > > > > > > VDUSE userspace processs via transferring the shmfd.
> > > > > > > > >
> > > > > > > > > The details and our user case is shown below:
> > > > > > > > >
> > > > > > > > > -   
> > > > > > > > > --
> > > > > > > > > |Container ||  QEMU(VM) |   | 
> > > > > > > > >   VDUSE daemon |
> > > > > > > > > |   -  ||  ---  |   | 
> > > > > > > > > -  |
> > > > > > > > > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | 
> > > > > > > > > vDPA device emulation | | block driver | |
> > > > > > > > > +--- ---+   
> > > > > > > > > -+--+-
> > > > > > > > > |   | 
> > > > > > > > >|  |
> > > > > > > > > |   | 
> > > > > > > > >|  |
> > > > > > > > > +-

Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace

2022-01-11 Thread Michael S. Tsirkin
On Tue, Jan 11, 2022 at 11:31:37AM +0800, Yongji Xie wrote:
> On Mon, Jan 10, 2022 at 11:44 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jan 10, 2022 at 11:24:40PM +0800, Yongji Xie wrote:
> > > On Mon, Jan 10, 2022 at 11:10 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote:
> > > > > On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote:
> > > > > > > This series introduces a framework that makes it possible to 
> > > > > > > implement
> > > > > > > software-emulated vDPA devices in userspace. And to make the 
> > > > > > > device
> > > > > > > emulation more secure, the emulated vDPA device's control path is 
> > > > > > > handled
> > > > > > > in the kernel and only the data path is implemented in the 
> > > > > > > userspace.
> > > > > > >
> > > > > > > Since the emuldated vDPA device's control path is handled in the 
> > > > > > > kernel,
> > > > > > > a message mechnism is introduced to make userspace be aware of 
> > > > > > > the data
> > > > > > > path related changes. Userspace can use read()/write() to 
> > > > > > > receive/reply
> > > > > > > the control messages.
> > > > > > >
> > > > > > > In the data path, the core is mapping dma buffer into VDUSE 
> > > > > > > daemon's
> > > > > > > address space, which can be implemented in different ways 
> > > > > > > depending on
> > > > > > > the vdpa bus to which the vDPA device is attached.
> > > > > > >
> > > > > > > In virtio-vdpa case, we implements a MMU-based software IOTLB with
> > > > > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa 
> > > > > > > case, the dma
> > > > > > > buffer is reside in a userspace memory region which can be shared 
> > > > > > > to the
> > > > > > > VDUSE userspace processs via transferring the shmfd.
> > > > > > >
> > > > > > > The details and our user case is shown below:
> > > > > > >
> > > > > > > -   
> > > > > > > --
> > > > > > > |Container ||  QEMU(VM) |   | 
> > > > > > >   VDUSE daemon |
> > > > > > > |   -  ||  ---  |   | 
> > > > > > > -  |
> > > > > > > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA 
> > > > > > > device emulation | | block driver | |
> > > > > > > +--- ---+   
> > > > > > > -+--+-
> > > > > > > |   | 
> > > > > > >|  |
> > > > > > > |   | 
> > > > > > >|  |
> > > > > > > +---++--+-
> > > > > > > || block device |   |  vhost device || 
> > > > > > > vduse driver |  | TCP/IP ||
> > > > > > > |---+   +
> > > > > > > ---+  -+|
> > > > > > > |   |   | 
> > > > > > >   |   ||
> > > > > > > | --+--   --+--- 
> > > > > > > ---+---||
> > > > > > > | | virtio-blk driver |   |  vhost-vdpa driv

Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace

2022-01-10 Thread Michael S. Tsirkin
On Mon, Jan 10, 2022 at 11:24:40PM +0800, Yongji Xie wrote:
> On Mon, Jan 10, 2022 at 11:10 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote:
> > > On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote:
> > > > > This series introduces a framework that makes it possible to implement
> > > > > software-emulated vDPA devices in userspace. And to make the device
> > > > > emulation more secure, the emulated vDPA device's control path is 
> > > > > handled
> > > > > in the kernel and only the data path is implemented in the userspace.
> > > > >
> > > > > Since the emuldated vDPA device's control path is handled in the 
> > > > > kernel,
> > > > > a message mechnism is introduced to make userspace be aware of the 
> > > > > data
> > > > > path related changes. Userspace can use read()/write() to 
> > > > > receive/reply
> > > > > the control messages.
> > > > >
> > > > > In the data path, the core is mapping dma buffer into VDUSE daemon's
> > > > > address space, which can be implemented in different ways depending on
> > > > > the vdpa bus to which the vDPA device is attached.
> > > > >
> > > > > In virtio-vdpa case, we implements a MMU-based software IOTLB with
> > > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, 
> > > > > the dma
> > > > > buffer is reside in a userspace memory region which can be shared to 
> > > > > the
> > > > > VDUSE userspace processs via transferring the shmfd.
> > > > >
> > > > > The details and our user case is shown below:
> > > > >
> > > > > -   
> > > > > --
> > > > > |Container ||  QEMU(VM) |   | 
> > > > >   VDUSE daemon |
> > > > > |   -  ||  ---  |   | 
> > > > > -  |
> > > > > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA 
> > > > > device emulation | | block driver | |
> > > > > +--- ---+   
> > > > > -+--+-
> > > > > |   |
> > > > > |  |
> > > > > |   |
> > > > > |  |
> > > > > +---++--+-
> > > > > || block device |   |  vhost device || vduse 
> > > > > driver |  | TCP/IP ||
> > > > > |---+   +
> > > > > ---+  -+|
> > > > > |   |   |   | 
> > > > >   ||
> > > > > | --+--   --+--- 
> > > > > ---+---||
> > > > > | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa 
> > > > > device |||
> > > > > | --+--   --+--- 
> > > > > ---+---||
> > > > > |   |  virtio bus   |   | 
> > > > >   ||
> > > > > |   ++---   |   | 
> > > > >   ||
> > > > > ||  |   | 
> > > > >   ||
> > >

Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace

2022-01-10 Thread Michael S. Tsirkin
On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote:
> On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote:
> > > This series introduces a framework that makes it possible to implement
> > > software-emulated vDPA devices in userspace. And to make the device
> > > emulation more secure, the emulated vDPA device's control path is handled
> > > in the kernel and only the data path is implemented in the userspace.
> > >
> > > Since the emuldated vDPA device's control path is handled in the kernel,
> > > a message mechnism is introduced to make userspace be aware of the data
> > > path related changes. Userspace can use read()/write() to receive/reply
> > > the control messages.
> > >
> > > In the data path, the core is mapping dma buffer into VDUSE daemon's
> > > address space, which can be implemented in different ways depending on
> > > the vdpa bus to which the vDPA device is attached.
> > >
> > > In virtio-vdpa case, we implements a MMU-based software IOTLB with
> > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the 
> > > dma
> > > buffer is reside in a userspace memory region which can be shared to the
> > > VDUSE userspace processs via transferring the shmfd.
> > >
> > > The details and our user case is shown below:
> > >
> > > -   
> > > --
> > > |Container ||  QEMU(VM) |   | 
> > >   VDUSE daemon |
> > > |   -  ||  ---  |   | 
> > > -  |
> > > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> > > emulation | | block driver | |
> > > +--- ---+   
> > > -+--+-
> > > |   ||
> > >   |
> > > |   ||
> > >   |
> > > +---++--+-
> > > || block device |   |  vhost device || vduse 
> > > driver |  | TCP/IP ||
> > > |---+   +
> > > ---+  -+|
> > > |   |   |   | 
> > >   ||
> > > | --+--   --+--- 
> > > ---+---||
> > > | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa 
> > > device |||
> > > | --+--   --+--- 
> > > ---+---||
> > > |   |  virtio bus   |   | 
> > >   ||
> > > |   ++---   |   | 
> > >   ||
> > > ||  |   | 
> > >   ||
> > > |  --+--|   | 
> > >   ||
> > > |  | virtio-blk device ||   | 
> > >   ||
> > > |  --+--|   | 
> > >   ||
> > > ||  |   | 
> > >   ||
> > > | ---+---   |   | 
> > >   ||
> > > | |  virtio-vdpa driver |   |   | 
>

Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace

2022-01-10 Thread Michael S. Tsirkin
On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote:
> This series introduces a framework that makes it possible to implement
> software-emulated vDPA devices in userspace. And to make the device
> emulation more secure, the emulated vDPA device's control path is handled
> in the kernel and only the data path is implemented in the userspace.
> 
> Since the emuldated vDPA device's control path is handled in the kernel,
> a message mechnism is introduced to make userspace be aware of the data
> path related changes. Userspace can use read()/write() to receive/reply
> the control messages.
> 
> In the data path, the core is mapping dma buffer into VDUSE daemon's
> address space, which can be implemented in different ways depending on
> the vdpa bus to which the vDPA device is attached.
> 
> In virtio-vdpa case, we implements a MMU-based software IOTLB with
> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> buffer is reside in a userspace memory region which can be shared to the
> VDUSE userspace processs via transferring the shmfd.
> 
> The details and our user case is shown below:
> 
> -   
> --
> |Container ||  QEMU(VM) |   | 
>   VDUSE daemon |
> |   -  ||  ---  |   | 
> -  |
> |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> emulation | | block driver | |
> +--- ---+   
> -+--+-
> |   ||
>   |
> |   ||
>   |
> +---++--+-
> || block device |   |  vhost device || vduse driver | 
>  | TCP/IP ||
> |---+   +---+ 
>  -+|
> |   |   |   | 
>   ||
> | --+--   --+--- ---+---  
>   ||
> | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |  
>   ||
> | --+--   --+--- ---+---  
>   ||
> |   |  virtio bus   |   | 
>   ||
> |   ++---   |   | 
>   ||
> ||  |   | 
>   ||
> |  --+--|   | 
>   ||
> |  | virtio-blk device ||   | 
>   ||
> |  --+--|   | 
>   ||
> ||  |   | 
>   ||
> | ---+---   |   | 
>   ||
> | |  virtio-vdpa driver |   |   | 
>   ||
> | ---+---   |   | 
>   ||
> ||  |   |vdpa 
> bus   ||
> | 
> ---+--+---+   
> ||
> | 
>---+--- |
> -|
>  NIC |--
>   
>---+---
>   
>   |
>   
>  -+-
>   
>  | Remote Storages |
>   
>  ---
> 
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used both in containers and
> VMs. Thus, we can have an unified technology stack in this two cases.
> 
> To test it with null-blk:
> 
>   $ qemu-storage-daemon \
>   --chardev 

Re: [PATCH v2 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

2021-11-27 Thread Michael S. Tsirkin
On Sat, Nov 27, 2021 at 06:09:40PM +0100, Eric Auger wrote:
> 
> 
> On 11/23/21 4:53 PM, Jean-Philippe Brucker wrote:
> > To support identity mappings, the virtio-iommu driver must be able to
> > represent full 64-bit ranges internally. Pass (start, end) instead of
> > (start, size) to viommu_add/del_mapping().
> >
> > Clean comments. The one about the returned size was never true: when
> > sweeping the whole address space the returned size will most certainly
> > be smaller than 2^64.
> >
> > Reviewed-by: Kevin Tian 
> > Signed-off-by: Jean-Philippe Brucker 
> Reviewed-by: Eric Auger 
> 
> Eric
> 
> > ---
> >  drivers/iommu/virtio-iommu.c | 31 +++
> >  1 file changed, 15 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > index d63ec4d11b00..eceb9281c8c1 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -311,8 +311,8 @@ static int viommu_send_req_sync(struct viommu_dev 
> > *viommu, void *buf,
> >   *
> >   * On success, return the new mapping. Otherwise return NULL.
> >   */
> > -static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned long 
> > iova,
> > - phys_addr_t paddr, size_t size, u32 flags)
> > +static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, u64 
> > end,
> > + phys_addr_t paddr, u32 flags)
> >  {
> > unsigned long irqflags;
> > struct viommu_mapping *mapping;

I am worried that API changes like that will cause subtle
bugs since types of arguments change but not their
number. If we forgot to update some callers it will all be messed up.

How about passing struct Range instead?

> > @@ -323,7 +323,7 @@ static int viommu_add_mapping(struct viommu_domain 
> > *vdomain, unsigned long iova,
> >  
> > mapping->paddr  = paddr;
> > mapping->iova.start = iova;
> > -   mapping->iova.last  = iova + size - 1;
> > +   mapping->iova.last  = end;
> > mapping->flags  = flags;
> >  
> > spin_lock_irqsave(>mappings_lock, irqflags);
> > @@ -338,26 +338,24 @@ static int viommu_add_mapping(struct viommu_domain 
> > *vdomain, unsigned long iova,
> >   *
> >   * @vdomain: the domain
> >   * @iova: start of the range
> > - * @size: size of the range. A size of 0 corresponds to the entire address
> > - * space.
> > + * @end: end of the range
> >   *
> > - * On success, returns the number of unmapped bytes (>= size)
> > + * On success, returns the number of unmapped bytes
> >   */
> >  static size_t viommu_del_mappings(struct viommu_domain *vdomain,
> > - unsigned long iova, size_t size)
> > + u64 iova, u64 end)
> >  {
> > size_t unmapped = 0;
> > unsigned long flags;
> > -   unsigned long last = iova + size - 1;
> > struct viommu_mapping *mapping = NULL;
> > struct interval_tree_node *node, *next;
> >  
> > spin_lock_irqsave(>mappings_lock, flags);
> > -   next = interval_tree_iter_first(>mappings, iova, last);
> > +   next = interval_tree_iter_first(>mappings, iova, end);
> > while (next) {
> > node = next;
> > mapping = container_of(node, struct viommu_mapping, iova);
> > -   next = interval_tree_iter_next(node, iova, last);
> > +   next = interval_tree_iter_next(node, iova, end);
> >  
> > /* Trying to split a mapping? */
> > if (mapping->iova.start < iova)
> > @@ -656,8 +654,8 @@ static void viommu_domain_free(struct iommu_domain 
> > *domain)
> >  {
> > struct viommu_domain *vdomain = to_viommu_domain(domain);
> >  
> > -   /* Free all remaining mappings (size 2^64) */
> > -   viommu_del_mappings(vdomain, 0, 0);
> > +   /* Free all remaining mappings */
> > +   viommu_del_mappings(vdomain, 0, ULLONG_MAX);
> >  
> > if (vdomain->viommu)
> > ida_free(>viommu->domain_ids, vdomain->id);
> > @@ -742,6 +740,7 @@ static int viommu_map(struct iommu_domain *domain, 
> > unsigned long iova,
> >  {
> > int ret;
> > u32 flags;
> > +   u64 end = iova + size - 1;
> > struct virtio_iommu_req_map map;
> > struct viommu_domain *vdomain = to_viommu_domain(domain);
> >  
> > @@ -752,7 +751,7 @@ static int viommu_map(struct iommu_domain *domain, 
> > unsigned long iova,
> > if (flags & ~vdomain->map_flags)
> > return -EINVAL;
> >  
> > -   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
> > +   ret = viommu_add_mapping(vdomain, iova, end, paddr, flags);
> > if (ret)
> > return ret;
> >  
> > @@ -761,7 +760,7 @@ static int viommu_map(struct iommu_domain *domain, 
> > unsigned long iova,
> > .domain = cpu_to_le32(vdomain->id),
> > .virt_start = cpu_to_le64(iova),
> > .phys_start = cpu_to_le64(paddr),
> > -   .virt_end   = cpu_to_le64(iova + size - 1),
> > +   .virt_end   = 

Re: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-22 Thread Michael S. Tsirkin
On Wed, Oct 13, 2021 at 01:10:48PM +0100, Jean-Philippe Brucker wrote:
> Support identity domains, allowing to only enable IOMMU protection for a
> subset of endpoints (those assigned to userspace, for example). Users
> may enable identity domains at compile time
> (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
> runtime (/sys/kernel/iommu_groups/*/type = identity).


I put this in my branch so it can get testing under linux-next,
but pls notice if the ballot does not conclude in time
for the merge window I won't send it to Linus.

> Patches 1-2 support identity domains using the optional
> VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
> spec, see [1] for the latest proposal.
> 
> Patches 3-5 add a fallback to identity mappings, when the feature is not
> supported.
> 
> Note that this series doesn't touch the global bypass bit added by
> VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU should
> be attached to a domain, so global bypass isn't in use after endpoints
> are probed. Before that, the global bypass policy is decided by the
> hypervisor and firmware. So I don't think Linux needs to touch the
> global bypass bit, but there are some patches available on my
> virtio-iommu/bypass branch [2] to test it.
> 
> QEMU patches are on my virtio-iommu/bypass branch [3] (and the list)
> 
> [1] https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07898.html
> [2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass
> [3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass
> 
> Jean-Philippe Brucker (5):
>   iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
>   iommu/virtio: Support bypass domains
>   iommu/virtio: Sort reserved regions
>   iommu/virtio: Pass end address to viommu_add_mapping()
>   iommu/virtio: Support identity-mapped domains
> 
>  include/uapi/linux/virtio_iommu.h |   8 ++-
>  drivers/iommu/virtio-iommu.c  | 113 +-
>  2 files changed, 101 insertions(+), 20 deletions(-)
> 
> -- 
> 2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iova: Move fast alloc size roundup into alloc_iova_fast()

2021-10-18 Thread Michael S. Tsirkin
On Fri, Sep 24, 2021 at 06:01:53PM +0800, John Garry wrote:
> It really is a property of the IOVA rcache code that we need to alloc a
> power-of-2 size, so relocate the functionality to resize into
> alloc_iova_fast(), rather than the callsites.
> 
> Signed-off-by: John Garry 

for vdpa code:

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/iommu/dma-iommu.c| 8 
>  drivers/iommu/iova.c | 9 +
>  drivers/vdpa/vdpa_user/iova_domain.c | 8 
>  3 files changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 896bea04c347..a99b3445fef8 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -444,14 +444,6 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
> iommu_domain *domain,
>  
>   shift = iova_shift(iovad);
>   iova_len = size >> shift;
> - /*
> -  * Freeing non-power-of-two-sized allocations back into the IOVA caches
> -  * will come back to bite us badly, so we have to waste a bit of space
> -  * rounding up anything cacheable to make sure that can't happen. The
> -  * order of the unadjusted size will still match upon freeing.
> -  */
> - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> - iova_len = roundup_pow_of_two(iova_len);
>  
>   dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>  
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 9e8bc802ac05..ff567cbc42f7 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -497,6 +497,15 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
> size,
>   unsigned long iova_pfn;
>   struct iova *new_iova;
>  
> + /*
> +  * Freeing non-power-of-two-sized allocations back into the IOVA caches
> +  * will come back to bite us badly, so we have to waste a bit of space
> +  * rounding up anything cacheable to make sure that can't happen. The
> +  * order of the unadjusted size will still match upon freeing.
> +  */
> + if (size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> + size = roundup_pow_of_two(size);
> +
>   iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1);
>   if (iova_pfn)
>   return iova_pfn;
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
> b/drivers/vdpa/vdpa_user/iova_domain.c
> index 1daae2608860..2b1143f11d8f 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -292,14 +292,6 @@ vduse_domain_alloc_iova(struct iova_domain *iovad,
>   unsigned long iova_len = iova_align(iovad, size) >> shift;
>   unsigned long iova_pfn;
>  
> - /*
> -  * Freeing non-power-of-two-sized allocations back into the IOVA caches
> -  * will come back to bite us badly, so we have to waste a bit of space
> -  * rounding up anything cacheable to make sure that can't happen. The
> -  * order of the unadjusted size will still match upon freeing.
> -  */
> - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> - iova_len = roundup_pow_of_two(iova_len);
>   iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true);
>  
>   return iova_pfn << shift;
> -- 
> 2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-18 Thread Michael S. Tsirkin
On Mon, Oct 18, 2021 at 04:23:41PM +0100, Jean-Philippe Brucker wrote:
> On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker 
> > > Sent: Wednesday, October 13, 2021 8:11 PM
> > > 
> > > Support identity domains, allowing to only enable IOMMU protection for a
> > > subset of endpoints (those assigned to userspace, for example). Users
> > > may enable identity domains at compile time
> > > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> > > (iommu.passthrough=1) or
> > > runtime (/sys/kernel/iommu_groups/*/type = identity).
> > 
> > Do we want to use consistent terms between spec (bypass domain) 
> > and code (identity domain)? 
> 
> I don't think we have to. Linux uses "identity" domains and "passthrough"
> IOMMU. The old virtio-iommu feature was "bypass" so we should keep that
> for the new one, to be consistent. And then I've used "bypass" for domains
> as well, in the spec, because it would look strange to use a different
> term for the same concept. I find that it sort of falls into place: Linux'
> identity domains can be implemented either with bypass or identity-mapped
> virtio-iommu domains.
> 
> > > 
> > > Patches 1-2 support identity domains using the optional
> > > VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
> > > spec, see [1] for the latest proposal.
> > > 
> > > Patches 3-5 add a fallback to identity mappings, when the feature is not
> > > supported.
> > > 
> > > Note that this series doesn't touch the global bypass bit added by
> > > VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU
> > > should
> > > be attached to a domain, so global bypass isn't in use after endpoints
> > 
> > I saw a concept of deferred attach in iommu core. See iommu_is_
> > attach_deferred(). Currently this is vendor specific and I haven't
> > looked into the exact reason why some vendor sets it now. Just
> > be curious whether the same reason might be applied to virtio-iommu.
> > 
> > > are probed. Before that, the global bypass policy is decided by the
> > > hypervisor and firmware. So I don't think Linux needs to touch the
> > 
> > This reminds me one thing. The spec says that the global bypass
> > bit is sticky and not affected by reset.
> 
> The spec talks about *device* reset, triggered by software writing 0 to
> the status register, but it doesn't mention system reset. Would be good to
> clarify that. Something like:
> 
> If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it MAY
> initialize the \field{bypass} field to 1. Field \field{bypass} SHOULD
> NOT change on device reset, but SHOULD be restored to its initial
> value on system reset.
> 
> > This implies that in the case
> > of rebooting the VM into a different OS, the previous OS actually
> > has the right to override this setting for the next OS. Is it a right
> > design? Even the firmware itself is unable to identify the original
> > setting enforced by the hypervisor after reboot. I feel the hypervisor
> > setting should be recovered after reset since it reflects the 
> > security measure enforced by the virtual platform?
> 
> So I think clarifying system reset should address your questions.
> I believe we should leave bypass sticky across device reset, so a FW->OS
> transition, where the OS resets the device, does not open a vulnerability
> (if bypass was enabled at boot and then left disabled by FW.)
> 
> It's still a good idea for the OS to restore on shutdown the bypass value
> it was booted with. So it can kexec into an OS that doesn't support
> virtio-iommu, for example.
> 
> Thanks,
> Jean

Is this stickiness really important? Can't this be addressed just by
hypervisor disabling bypass at boot?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread Michael S. Tsirkin
On Wed, Oct 13, 2021 at 01:03:46PM +0200, David Hildenbrand wrote:
> On 13.10.21 12:55, Michael S. Tsirkin wrote:
> > This will enable cleanups down the road.
> > The idea is to disable cbs, then add "flush_queued_cbs" callback
> > as a parameter, this way drivers can flush any work
> > queued after callbacks have been disabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >   arch/um/drivers/virt-pci.c | 2 +-
> >   drivers/block/virtio_blk.c | 4 ++--
> >   drivers/bluetooth/virtio_bt.c  | 2 +-
> >   drivers/char/hw_random/virtio-rng.c| 2 +-
> >   drivers/char/virtio_console.c  | 4 ++--
> >   drivers/crypto/virtio/virtio_crypto_core.c | 8 
> >   drivers/firmware/arm_scmi/virtio.c | 2 +-
> >   drivers/gpio/gpio-virtio.c | 2 +-
> >   drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
> >   drivers/i2c/busses/i2c-virtio.c| 2 +-
> >   drivers/iommu/virtio-iommu.c   | 2 +-
> >   drivers/net/caif/caif_virtio.c | 2 +-
> >   drivers/net/virtio_net.c   | 4 ++--
> >   drivers/net/wireless/mac80211_hwsim.c  | 2 +-
> >   drivers/nvdimm/virtio_pmem.c   | 2 +-
> >   drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
> >   drivers/scsi/virtio_scsi.c | 2 +-
> >   drivers/virtio/virtio.c| 5 +
> >   drivers/virtio/virtio_balloon.c| 2 +-
> >   drivers/virtio/virtio_input.c  | 2 +-
> >   drivers/virtio/virtio_mem.c| 2 +-
> >   fs/fuse/virtio_fs.c| 4 ++--
> >   include/linux/virtio.h | 1 +
> >   net/9p/trans_virtio.c  | 2 +-
> >   net/vmw_vsock/virtio_transport.c   | 4 ++--
> >   sound/virtio/virtio_card.c | 4 ++--
> >   26 files changed, 39 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
> > index c08066633023..22c4d87c9c15 100644
> > --- a/arch/um/drivers/virt-pci.c
> > +++ b/arch/um/drivers/virt-pci.c
> > @@ -616,7 +616,7 @@ static void um_pci_virtio_remove(struct virtio_device 
> > *vdev)
> > int i;
> >   /* Stop all virtqueues */
> > -vdev->config->reset(vdev);
> > +virtio_reset_device(vdev);
> >   vdev->config->del_vqs(vdev);
> 
> Nit: virtio_device_reset()?
> 
> Because I see:
> 
> int virtio_device_freeze(struct virtio_device *dev);
> int virtio_device_restore(struct virtio_device *dev);
> void virtio_device_ready(struct virtio_device *dev)
> 
> But well, there is:
> void virtio_break_device(struct virtio_device *dev);

Exactly. I don't know what's best, so I opted for plain english :)


> -- 
> Thanks,
> 
> David / dhildenb

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC] virtio: wrap config->reset calls

2021-10-13 Thread Michael S. Tsirkin
This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.

Signed-off-by: Michael S. Tsirkin 
---
 arch/um/drivers/virt-pci.c | 2 +-
 drivers/block/virtio_blk.c | 4 ++--
 drivers/bluetooth/virtio_bt.c  | 2 +-
 drivers/char/hw_random/virtio-rng.c| 2 +-
 drivers/char/virtio_console.c  | 4 ++--
 drivers/crypto/virtio/virtio_crypto_core.c | 8 
 drivers/firmware/arm_scmi/virtio.c | 2 +-
 drivers/gpio/gpio-virtio.c | 2 +-
 drivers/gpu/drm/virtio/virtgpu_kms.c   | 2 +-
 drivers/i2c/busses/i2c-virtio.c| 2 +-
 drivers/iommu/virtio-iommu.c   | 2 +-
 drivers/net/caif/caif_virtio.c | 2 +-
 drivers/net/virtio_net.c   | 4 ++--
 drivers/net/wireless/mac80211_hwsim.c  | 2 +-
 drivers/nvdimm/virtio_pmem.c   | 2 +-
 drivers/rpmsg/virtio_rpmsg_bus.c   | 2 +-
 drivers/scsi/virtio_scsi.c | 2 +-
 drivers/virtio/virtio.c| 5 +
 drivers/virtio/virtio_balloon.c| 2 +-
 drivers/virtio/virtio_input.c  | 2 +-
 drivers/virtio/virtio_mem.c| 2 +-
 fs/fuse/virtio_fs.c| 4 ++--
 include/linux/virtio.h | 1 +
 net/9p/trans_virtio.c  | 2 +-
 net/vmw_vsock/virtio_transport.c   | 4 ++--
 sound/virtio/virtio_card.c | 4 ++--
 26 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
index c08066633023..22c4d87c9c15 100644
--- a/arch/um/drivers/virt-pci.c
+++ b/arch/um/drivers/virt-pci.c
@@ -616,7 +616,7 @@ static void um_pci_virtio_remove(struct virtio_device *vdev)
int i;
 
 /* Stop all virtqueues */
-vdev->config->reset(vdev);
+virtio_reset_device(vdev);
 vdev->config->del_vqs(vdev);
 
device_set_wakeup_enable(>dev, false);
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 303caf2d17d0..83d0af3fbf30 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -910,7 +910,7 @@ static void virtblk_remove(struct virtio_device *vdev)
mutex_lock(>vdev_mutex);
 
/* Stop all the virtqueues. */
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
/* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
vblk->vdev = NULL;
@@ -929,7 +929,7 @@ static int virtblk_freeze(struct virtio_device *vdev)
struct virtio_blk *vblk = vdev->priv;
 
/* Ensure we don't receive any more interrupts */
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
/* Make sure no work handler is accessing the device. */
flush_work(>config_work);
diff --git a/drivers/bluetooth/virtio_bt.c b/drivers/bluetooth/virtio_bt.c
index 57908ce4fae8..24a9258962fa 100644
--- a/drivers/bluetooth/virtio_bt.c
+++ b/drivers/bluetooth/virtio_bt.c
@@ -364,7 +364,7 @@ static void virtbt_remove(struct virtio_device *vdev)
struct hci_dev *hdev = vbt->hdev;
 
hci_unregister_dev(hdev);
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
hci_free_dev(hdev);
vbt->hdev = NULL;
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index a90001e02bf7..95980489514b 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -134,7 +134,7 @@ static void remove_common(struct virtio_device *vdev)
vi->hwrng_removed = true;
vi->data_avail = 0;
complete(>have_data);
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
vi->busy = false;
if (vi->hwrng_register_done)
hwrng_unregister(>hwrng);
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7a86..08bbd693436f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1957,7 +1957,7 @@ static void virtcons_remove(struct virtio_device *vdev)
spin_unlock_irq(_lock);
 
/* Disable interrupts for vqs */
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
/* Finish up work that's lined up */
if (use_multiport(portdev))
cancel_work_sync(>control_work);
@@ -2139,7 +2139,7 @@ static int virtcons_freeze(struct virtio_device *vdev)
 
portdev = vdev->priv;
 
-   vdev->config->reset(vdev);
+   virtio_reset_device(vdev);
 
if (use_multiport(portdev))
virtqueue_disable_cb(portdev->c_ivq);
diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
b/drivers/crypto/virtio/virtio_crypto_core.c

Re: [PATCH v13 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 04:45:55PM +0800, Yongji Xie wrote:
> On Mon, Sep 6, 2021 at 4:01 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 06, 2021 at 03:06:44PM +0800, Yongji Xie wrote:
> > > On Mon, Sep 6, 2021 at 2:37 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Sep 06, 2021 at 02:09:25PM +0800, Yongji Xie wrote:
> > > > > On Mon, Sep 6, 2021 at 1:56 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Aug 31, 2021 at 06:36:26PM +0800, Xie Yongji wrote:
> > > > > > > This adds a new callback to support device specific reset
> > > > > > > behavior. The vdpa bus driver will call the reset function
> > > > > > > instead of setting status to zero during resetting.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji 
> > > > > >
> > > > > >
> > > > > > This does gloss over a significant change though:
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > > @@ -348,12 +352,12 @@ static inline struct device 
> > > > > > > *vdpa_get_dma_dev(struct vdpa_device *vdev)
> > > > > > >   return vdev->dma_dev;
> > > > > > >  }
> > > > > > >
> > > > > > > -static inline void vdpa_reset(struct vdpa_device *vdev)
> > > > > > > +static inline int vdpa_reset(struct vdpa_device *vdev)
> > > > > > >  {
> > > > > > >   const struct vdpa_config_ops *ops = vdev->config;
> > > > > > >
> > > > > > >   vdev->features_valid = false;
> > > > > > > - ops->set_status(vdev, 0);
> > > > > > > + return ops->reset(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > >  static inline int vdpa_set_features(struct vdpa_device *vdev, 
> > > > > > > u64 features)
> > > > > >
> > > > > >
> > > > > > Unfortunately this breaks virtio_vdpa:
> > > > > >
> > > > > >
> > > > > > static void virtio_vdpa_reset(struct virtio_device *vdev)
> > > > > > {
> > > > > > struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> > > > > >
> > > > > > vdpa_reset(vdpa);
> > > > > > }
> > > > > >
> > > > > >
> > > > > > and there's no easy way to fix this, kernel can't recover
> > > > > > from a reset failure e.g. during driver unbind.
> > > > > >
> > > > >
> > > > > Yes, but it should be safe with the protection of software IOTLB even
> > > > > if the reset() fails during driver unbind.
> > > > >
> > > > > Thanks,
> > > > > Yongji
> > > >
> > > > Hmm. I don't see it.
> > > > What exactly will happen? What prevents device from poking at
> > > > memory after reset? Note that dma unmap in e.g. del_vqs happens
> > > > too late.
> > >
> > > But I didn't see any problems with touching the memory for virtqueues.
> >
> > Drivers make the assumption that after reset returns no new
> > buffers will be consumed. For example a bunch of drivers
> > call virtqueue_detach_unused_buf.
> 
> I'm not sure if I get your point. But it looks like
> virtqueue_detach_unused_buf() will check the driver's metadata first
> rather than read the memory from virtqueue.
> 
> > I can't say whether block makes this assumption anywhere.
> > Needs careful auditing.
> >
> > > The memory should not be freed after dma unmap?
> >
> > But unmap does not happen until after the reset.
> >
> 
> I mean the memory is totally allocated and controlled by the VDUSE
> driver. The VDUSE driver will not return them to the buddy system
> unless userspace unmap it.

Right. But what stops VDUSE from poking at memory after
reset failed?



> >
> > > And the memory for the bounce buffer should also be safe to be
> > > accessed by userspace in this case.
> > >
> > > > And what about e.g. interrupts?
> > > > E.g. we have this:
> > > >
> > > > /* Virtqueues are stopped, nothing can use vblk->vdev anymore. 
> &g

Re: [PATCH v13 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 03:06:44PM +0800, Yongji Xie wrote:
> On Mon, Sep 6, 2021 at 2:37 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 06, 2021 at 02:09:25PM +0800, Yongji Xie wrote:
> > > On Mon, Sep 6, 2021 at 1:56 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Tue, Aug 31, 2021 at 06:36:26PM +0800, Xie Yongji wrote:
> > > > > This adds a new callback to support device specific reset
> > > > > behavior. The vdpa bus driver will call the reset function
> > > > > instead of setting status to zero during resetting.
> > > > >
> > > > > Signed-off-by: Xie Yongji 
> > > >
> > > >
> > > > This does gloss over a significant change though:
> > > >
> > > >
> > > > > ---
> > > > > @@ -348,12 +352,12 @@ static inline struct device 
> > > > > *vdpa_get_dma_dev(struct vdpa_device *vdev)
> > > > >   return vdev->dma_dev;
> > > > >  }
> > > > >
> > > > > -static inline void vdpa_reset(struct vdpa_device *vdev)
> > > > > +static inline int vdpa_reset(struct vdpa_device *vdev)
> > > > >  {
> > > > >   const struct vdpa_config_ops *ops = vdev->config;
> > > > >
> > > > >   vdev->features_valid = false;
> > > > > - ops->set_status(vdev, 0);
> > > > > + return ops->reset(vdev);
> > > > >  }
> > > > >
> > > > >  static inline int vdpa_set_features(struct vdpa_device *vdev, u64 
> > > > > features)
> > > >
> > > >
> > > > Unfortunately this breaks virtio_vdpa:
> > > >
> > > >
> > > > static void virtio_vdpa_reset(struct virtio_device *vdev)
> > > > {
> > > > struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> > > >
> > > > vdpa_reset(vdpa);
> > > > }
> > > >
> > > >
> > > > and there's no easy way to fix this, kernel can't recover
> > > > from a reset failure e.g. during driver unbind.
> > > >
> > >
> > > Yes, but it should be safe with the protection of software IOTLB even
> > > if the reset() fails during driver unbind.
> > >
> > > Thanks,
> > > Yongji
> >
> > Hmm. I don't see it.
> > What exactly will happen? What prevents device from poking at
> > memory after reset? Note that dma unmap in e.g. del_vqs happens
> > too late.
> 
> But I didn't see any problems with touching the memory for virtqueues.

Drivers make the assumption that after reset returns no new
buffers will be consumed. For example a bunch of drivers
call virtqueue_detach_unused_buf.
I can't say whether block makes this assumption anywhere.
Needs careful auditing.

> The memory should not be freed after dma unmap?

But unmap does not happen until after the reset.


> And the memory for the bounce buffer should also be safe to be
> accessed by userspace in this case.
> 
> > And what about e.g. interrupts?
> > E.g. we have this:
> >
> > /* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
> > vblk->vdev = NULL;
> >
> > and this is no longer true at this point.
> >
> 
> You're right. But I didn't see where the interrupt handler will use
> the vblk->vdev.

static void virtblk_done(struct virtqueue *vq)
{
struct virtio_blk *vblk = vq->vdev->priv;

vq->vdev is the same as vblk->vdev.


> So it seems to be not too late to fix it:
> 
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 5c25ff6483ad..ea41a7389a26 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -665,13 +665,13 @@ static void vduse_vdpa_set_config(struct
> vdpa_device *vdpa, unsigned int offset,
>  static int vduse_vdpa_reset(struct vdpa_device *vdpa)
>  {
> struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +   int ret;
> 
> -   if (vduse_dev_set_status(dev, 0))
> -   return -EIO;
> +   ret = vduse_dev_set_status(dev, 0);
> 
> vduse_dev_reset(dev);
> 
> -   return 0;
> +   return ret;
>  }
> 
>  static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
> 
> Thanks,
> Yongji

Needs some comments to explain why it's done like this.

BTW device is generally wedged at this point right?
E.g. if reset during initialization fails, userspace
will still get the reset at some later point and be
confused ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 02:09:25PM +0800, Yongji Xie wrote:
> On Mon, Sep 6, 2021 at 1:56 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Aug 31, 2021 at 06:36:26PM +0800, Xie Yongji wrote:
> > > This adds a new callback to support device specific reset
> > > behavior. The vdpa bus driver will call the reset function
> > > instead of setting status to zero during resetting.
> > >
> > > Signed-off-by: Xie Yongji 
> >
> >
> > This does gloss over a significant change though:
> >
> >
> > > ---
> > > @@ -348,12 +352,12 @@ static inline struct device 
> > > *vdpa_get_dma_dev(struct vdpa_device *vdev)
> > >   return vdev->dma_dev;
> > >  }
> > >
> > > -static inline void vdpa_reset(struct vdpa_device *vdev)
> > > +static inline int vdpa_reset(struct vdpa_device *vdev)
> > >  {
> > >   const struct vdpa_config_ops *ops = vdev->config;
> > >
> > >   vdev->features_valid = false;
> > > - ops->set_status(vdev, 0);
> > > + return ops->reset(vdev);
> > >  }
> > >
> > >  static inline int vdpa_set_features(struct vdpa_device *vdev, u64 
> > > features)
> >
> >
> > Unfortunately this breaks virtio_vdpa:
> >
> >
> > static void virtio_vdpa_reset(struct virtio_device *vdev)
> > {
> > struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> >
> > vdpa_reset(vdpa);
> > }
> >
> >
> > and there's no easy way to fix this, kernel can't recover
> > from a reset failure e.g. during driver unbind.
> >
> 
> Yes, but it should be safe with the protection of software IOTLB even
> if the reset() fails during driver unbind.
> 
> Thanks,
> Yongji

Hmm. I don't see it.
What exactly will happen? What prevents device from poking at
memory after reset? Note that dma unmap in e.g. del_vqs happens
too late.  And what about e.g. interrupts?
E.g. we have this:

/* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
vblk->vdev = NULL;

and this is no longer true at this point.


-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-05 Thread Michael S. Tsirkin
On Tue, Aug 31, 2021 at 06:36:26PM +0800, Xie Yongji wrote:
> This adds a new callback to support device specific reset
> behavior. The vdpa bus driver will call the reset function
> instead of setting status to zero during resetting.
> 
> Signed-off-by: Xie Yongji 


This does gloss over a significant change though:


> ---
> @@ -348,12 +352,12 @@ static inline struct device *vdpa_get_dma_dev(struct 
> vdpa_device *vdev)
>   return vdev->dma_dev;
>  }
>  
> -static inline void vdpa_reset(struct vdpa_device *vdev)
> +static inline int vdpa_reset(struct vdpa_device *vdev)
>  {
>   const struct vdpa_config_ops *ops = vdev->config;
>  
>   vdev->features_valid = false;
> - ops->set_status(vdev, 0);
> + return ops->reset(vdev);
>  }
>  
>  static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)


Unfortunately this breaks virtio_vdpa:


static void virtio_vdpa_reset(struct virtio_device *vdev)
{
struct vdpa_device *vdpa = vd_get_vdpa(vdev);

vdpa_reset(vdpa);
}


and there's no easy way to fix this, kernel can't recover
from a reset failure e.g. during driver unbind.

Find a way to disable virtio_vdpa for now?


> -- 
> 2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-05 Thread Michael S. Tsirkin
On Mon, Aug 30, 2021 at 10:17:29PM +0800, Xie Yongji wrote:
> This adds a new callback to support device specific reset
> behavior. The vdpa bus driver will call the reset function
> instead of setting status to zero during resetting.
> 
> Signed-off-by: Xie Yongji 

This does gloss over a significant change though:

> @@ -348,12 +352,12 @@ static inline struct device *vdpa_get_dma_dev(struct 
> vdpa_device *vdev)
>   return vdev->dma_dev;
>  }
>  
> -static inline void vdpa_reset(struct vdpa_device *vdev)
> +static inline int vdpa_reset(struct vdpa_device *vdev)
>  {
>   const struct vdpa_config_ops *ops = vdev->config;
>  
>   vdev->features_valid = false;
> - ops->set_status(vdev, 0);
> + return ops->reset(vdev);
>  }
>  
>  static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)

Unfortunately this breaks virtio_vdpa:


static void virtio_vdpa_reset(struct virtio_device *vdev)
{
struct vdpa_device *vdpa = vd_get_vdpa(vdev);

vdpa_reset(vdpa);
}


and there's no easy way to fix this, kernel can't recover
from a reset failure e.g. during driver unbind.

Find a way to disable virtio_vdpa for now?

> -- 
> 2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 03/13] file: Export receive_fd() to modules

2021-09-05 Thread Michael S. Tsirkin
On Tue, Aug 31, 2021 at 06:36:24PM +0800, Xie Yongji wrote:
> Export receive_fd() so that some modules can use
> it to pass file descriptor between processes without
> missing any security stuffs.
> 
> Signed-off-by: Xie Yongji 
> Acked-by: Jason Wang 

This needs some acks from fs devels.
Viro?


> ---
>  fs/file.c| 6 ++
>  include/linux/file.h | 7 +++
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index 86dc9956af32..210e540672aa 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, 
> unsigned int o_flags)
>   return new_fd;
>  }
>  
> +int receive_fd(struct file *file, unsigned int o_flags)
> +{
> + return __receive_fd(file, NULL, o_flags);
> +}
> +EXPORT_SYMBOL_GPL(receive_fd);
> +
>  static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
>  {
>   int err = -EBADF;
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 2de2e4613d7b..51e830b4fe3a 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -94,6 +94,9 @@ extern void fd_install(unsigned int fd, struct file *file);
>  
>  extern int __receive_fd(struct file *file, int __user *ufd,
>   unsigned int o_flags);
> +
> +extern int receive_fd(struct file *file, unsigned int o_flags);
> +
>  static inline int receive_fd_user(struct file *file, int __user *ufd,
> unsigned int o_flags)
>  {
> @@ -101,10 +104,6 @@ static inline int receive_fd_user(struct file *file, int 
> __user *ufd,
>   return -EFAULT;
>   return __receive_fd(file, ufd, o_flags);
>  }
> -static inline int receive_fd(struct file *file, unsigned int o_flags)
> -{
> - return __receive_fd(file, NULL, o_flags);
> -}
>  int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags);
>  
>  extern void flush_delayed_fput(void);
> -- 
> 2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-08-24 Thread Michael S. Tsirkin
On Wed, Aug 18, 2021 at 08:06:41PM +0800, Xie Yongji wrote:
> This VDUSE driver enables implementing software-emulated vDPA
> devices in userspace. The vDPA device is created by
> ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
> interface (/dev/vduse/$NAME) is exported to userspace for device
> emulation.
> 
> In order to make the device emulation more secure, the device's
> control path is handled in kernel. A message mechnism is introduced
> to forward some dataplane related control messages to userspace.
> 
> And in the data path, the DMA buffer will be mapped into userspace
> address space through different ways depending on the vDPA bus to
> which the vDPA device is attached. In virtio-vdpa case, the MMU-based
> software IOTLB is used to achieve that. And in vhost-vdpa case, the
> DMA buffer is reside in a userspace memory region which can be shared
> to the VDUSE userspace processs via transferring the shmfd.
> 
> For more details on VDUSE design and usage, please see the follow-on
> Documentation commit.
> 
> Signed-off-by: Xie Yongji 

Build bot seems unhappy with this patch.

> ---
>  Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
>  drivers/vdpa/Kconfig   |   10 +
>  drivers/vdpa/Makefile  |1 +
>  drivers/vdpa/vdpa_user/Makefile|5 +
>  drivers/vdpa/vdpa_user/vduse_dev.c | 1611 
> 
>  include/uapi/linux/vduse.h |  304 
>  6 files changed, 1932 insertions(+)
>  create mode 100644 drivers/vdpa/vdpa_user/Makefile
>  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
>  create mode 100644 include/uapi/linux/vduse.h
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 1409e40e6345..293ca3aef358 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -300,6 +300,7 @@ Code  Seq#Include File
>Comments
>  'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
> conflict!
>  '|'   00-7F  linux/media.h
>  0x80  00-1F  linux/fb.h
> +0x81  00-1F  linux/vduse.h
>  0x89  00-06  arch/x86/include/asm/sockios.h
>  0x89  0B-DF  linux/sockios.h
>  0x89  E0-EF  linux/sockios.h 
> SIOCPROTOPRIVATE range
> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> index a503c1b2bfd9..6e23bce6433a 100644
> --- a/drivers/vdpa/Kconfig
> +++ b/drivers/vdpa/Kconfig
> @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
> vDPA block device simulator which terminates IO request in a
> memory buffer.
>  
> +config VDPA_USER
> + tristate "VDUSE (vDPA Device in Userspace) support"
> + depends on EVENTFD && MMU && HAS_DMA
> + select DMA_OPS
> + select VHOST_IOTLB
> + select IOMMU_IOVA
> + help
> +   With VDUSE it is possible to emulate a vDPA Device
> +   in a userspace program.
> +
>  config IFCVF
>   tristate "Intel IFC VF vDPA driver"
>   depends on PCI_MSI
> diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> index 67fe7f3d6943..f02ebed33f19 100644
> --- a/drivers/vdpa/Makefile
> +++ b/drivers/vdpa/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_VDPA) += vdpa.o
>  obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
> +obj-$(CONFIG_VDPA_USER) += vdpa_user/
>  obj-$(CONFIG_IFCVF)+= ifcvf/
>  obj-$(CONFIG_MLX5_VDPA) += mlx5/
>  obj-$(CONFIG_VP_VDPA)+= virtio_pci/
> diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
> new file mode 100644
> index ..260e0b26af99
> --- /dev/null
> +++ b/drivers/vdpa/vdpa_user/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +vduse-y := vduse_dev.o iova_domain.o
> +
> +obj-$(CONFIG_VDPA_USER) += vduse.o
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> new file mode 100644
> index ..ce081b7895d5
> --- /dev/null
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -0,0 +1,1611 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * VDUSE: vDPA Device in Userspace
> + *
> + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
> reserved.
> + *
> + * Author: Xie Yongji 
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "iova_domain.h"
> +
> +#define DRV_AUTHOR   "Yongji Xie "
> +#define DRV_DESC "vDPA Device in Userspace"
> +#define DRV_LICENSE  "GPL v2"
> +
> +#define VDUSE_DEV_MAX (1U << MINORBITS)
> +#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
> +#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
> +#define VDUSE_MSG_DEFAULT_TIMEOUT 

Re: [PATCH v11 01/12] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-24 Thread Michael S. Tsirkin
On Wed, Aug 18, 2021 at 08:06:31PM +0800, Xie Yongji wrote:
> Export alloc_iova_fast() and free_iova_fast() so that
> some modules can make use of the per-CPU cache to get
> rid of rbtree spinlock in alloc_iova() and free_iova()
> during IOVA allocation.
> 
> Signed-off-by: Xie Yongji 


This needs ack from iommu maintainers. Guys?

> ---
>  drivers/iommu/iova.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b6cf5f16123b..3941ed6bb99b 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -521,6 +521,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
> size,
>  
>   return new_iova->pfn_lo;
>  }
> +EXPORT_SYMBOL_GPL(alloc_iova_fast);
>  
>  /**
>   * free_iova_fast - free iova pfn range into rcache
> @@ -538,6 +539,7 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
> pfn, unsigned long size)
>  
>   free_iova(iovad, pfn);
>  }
> +EXPORT_SYMBOL_GPL(free_iova_fast);
>  
>  #define fq_ring_for_each(i, fq) \
>   for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
> IOVA_FQ_SIZE)
> -- 
> 2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Michael S. Tsirkin
On Wed, Jul 14, 2021 at 01:45:39PM +0800, Jason Wang wrote:
> > +static int vduse_dev_msg_sync(struct vduse_dev *dev,
> > + struct vduse_dev_msg *msg)
> > +{
> > +   int ret;
> > +
> > +   init_waitqueue_head(>waitq);
> > +   spin_lock(>msg_lock);
> > +   msg->req.request_id = dev->msg_unique++;
> > +   vduse_enqueue_msg(>send_list, msg);
> > +   wake_up(>waitq);
> > +   spin_unlock(>msg_lock);
> > +
> > +   wait_event_killable_timeout(msg->waitq, msg->completed,
> > +   VDUSE_REQUEST_TIMEOUT * HZ);
> > +   spin_lock(>msg_lock);
> > +   if (!msg->completed) {
> > +   list_del(>list);
> > +   msg->resp.result = VDUSE_REQ_RESULT_FAILED;
> > +   }
> > +   ret = (msg->resp.result == VDUSE_REQ_RESULT_OK) ? 0 : -EIO;
> 
> 
> I think we should mark the device as malfunction when there is a timeout and
> forbid any userspace operations except for the destroy aftwards for safety.

This looks like if one tried to run gdb on the program the behaviour
will change completely because kernel wants it to respond within
specific time. Looks like a receipe for heisenbugs.

Let's not build interfaces with arbitrary timeouts like that.
Interruptible wait exists for this very reason. Let userspace have its
own code to set and use timers. This does mean that userspace will
likely have to change a bit to support this driver, such is life.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] eventfd: Enlarge recursion limit to allow vhost to work

2021-07-03 Thread Michael S. Tsirkin
On Fri, Jun 18, 2021 at 04:44:12PM +0800, He Zhe wrote:
> commit b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth")
> introduces a percpu counter that tracks the percpu recursion depth and
> warn if it greater than zero, to avoid potential deadlock and stack
> overflow.
> 
> However sometimes different eventfds may be used in parallel. Specifically,
> when heavy network load goes through kvm and vhost, working as below, it
> would trigger the following call trace.
> 
> -  100.00%
>- 66.51%
> ret_from_fork
> kthread
>   - vhost_worker
>  - 33.47% handle_tx_kick
>   handle_tx
>   handle_tx_copy
>   vhost_tx_batch.isra.0
>   vhost_add_used_and_signal_n
>   eventfd_signal
>  - 33.05% handle_rx_net
>   handle_rx
>   vhost_add_used_and_signal_n
>   eventfd_signal
>- 33.49%
> ioctl
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_ioctl
> ksys_ioctl
> do_vfs_ioctl
> kvm_vcpu_ioctl
> kvm_arch_vcpu_ioctl_run
> vmx_handle_exit
> handle_ept_misconfig
> kvm_io_bus_write
> __kvm_io_bus_write
> eventfd_signal
> 
> 001: WARNING: CPU: 1 PID: 1503 at fs/eventfd.c:73 eventfd_signal+0x85/0xa0
>  snip 
> 001: Call Trace:
> 001:  vhost_signal+0x15e/0x1b0 [vhost]
> 001:  vhost_add_used_and_signal_n+0x2b/0x40 [vhost]
> 001:  handle_rx+0xb9/0x900 [vhost_net]
> 001:  handle_rx_net+0x15/0x20 [vhost_net]
> 001:  vhost_worker+0xbe/0x120 [vhost]
> 001:  kthread+0x106/0x140
> 001:  ? log_used.part.0+0x20/0x20 [vhost]
> 001:  ? kthread_park+0x90/0x90
> 001:  ret_from_fork+0x35/0x40
> 001: ---[ end trace 0003 ]---
> 
> This patch enlarges the limit to 1 which is the maximum recursion depth we
> have found so far.
> 
> The credit of modification for eventfd_signal_count goes to
> Xie Yongji 
> 

And maybe:

Fixes: b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth")

who's merging this?

> Signed-off-by: He Zhe 
> ---
>  fs/eventfd.c| 3 ++-
>  include/linux/eventfd.h | 5 -
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index e265b6dd4f34..add6af91cacf 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -71,7 +71,8 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
>* it returns true, the eventfd_signal() call should be deferred to a
>* safe context.
>*/
> - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
> + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) >
> + EFD_WAKE_COUNT_MAX))
>   return 0;
>  
>   spin_lock_irqsave(>wqh.lock, flags);
> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index fa0a524baed0..74be152ebe87 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h
> @@ -29,6 +29,9 @@
>  #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
>  #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
>  
> +/* This is the maximum recursion depth we find so far */
> +#define EFD_WAKE_COUNT_MAX 1
> +
>  struct eventfd_ctx;
>  struct file;
>  
> @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
>  
>  static inline bool eventfd_signal_count(void)
>  {
> - return this_cpu_read(eventfd_wake_count);
> + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_COUNT_MAX;
>  }
>  
>  #else /* CONFIG_EVENTFD */
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace

2021-05-25 Thread Michael S. Tsirkin
On Tue, May 25, 2021 at 02:40:57PM +0800, Jason Wang wrote:
> 
> 在 2021/5/20 下午5:06, Yongji Xie 写道:
> > On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin  wrote:
> > > On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> > > > This series introduces a framework, which can be used to implement
> > > > vDPA Devices in a userspace program. The work consist of two parts:
> > > > control path forwarding and data path offloading.
> > > > 
> > > > In the control path, the VDUSE driver will make use of message
> > > > mechnism to forward the config operation from vdpa bus driver
> > > > to userspace. Userspace can use read()/write() to receive/reply
> > > > those control messages.
> > > > 
> > > > In the data path, the core is mapping dma buffer into VDUSE
> > > > daemon's address space, which can be implemented in different ways
> > > > depending on the vdpa bus to which the vDPA device is attached.
> > > > 
> > > > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the 
> > > > dma
> > > > buffer is reside in a userspace memory region which can be shared to the
> > > > VDUSE userspace processs via transferring the shmfd.
> > > > 
> > > > The details and our user case is shown below:
> > > > 
> > > > -   
> > > > --
> > > > |Container ||  QEMU(VM) |   |   
> > > > VDUSE daemon |
> > > > |   -  ||  ---  |   | 
> > > > -  |
> > > > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> > > > emulation | | block driver | |
> > > > +--- ---+   
> > > > -+--+-
> > > >  |   || 
> > > >  |
> > > >  |   || 
> > > >  |
> > > > +---++--+-
> > > > || block device |   |  vhost device || vduse 
> > > > driver |  | TCP/IP ||
> > > > |---+   +
> > > > ---+  -+|
> > > > |   |   |   |   
> > > > ||
> > > > | --+--   --+--- 
> > > > ---+---||
> > > > | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa 
> > > > device |||
> > > > | --+--   --+--- 
> > > > ---+---||
> > > > |   |  virtio bus   |   |   
> > > > ||
> > > > |   ++---   |   |   
> > > > ||
> > > > ||  |   |   
> > > > ||
> > > > |  --+--|   |   
> > > > ||
> > > > |  | virtio-blk device ||   |   
> > > > ||
> > > > |  --+--|   |   
> > > > ||
> > > > ||  |   |   
> > > > ||
> > > > | ---+---   |  

Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace

2021-05-20 Thread Michael S. Tsirkin
On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> This series introduces a framework, which can be used to implement
> vDPA Devices in a userspace program. The work consist of two parts:
> control path forwarding and data path offloading.
> 
> In the control path, the VDUSE driver will make use of message
> mechnism to forward the config operation from vdpa bus driver
> to userspace. Userspace can use read()/write() to receive/reply
> those control messages.
> 
> In the data path, the core is mapping dma buffer into VDUSE
> daemon's address space, which can be implemented in different ways
> depending on the vdpa bus to which the vDPA device is attached.
> 
> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> buffer is reside in a userspace memory region which can be shared to the
> VDUSE userspace processs via transferring the shmfd.
> 
> The details and our user case is shown below:
> 
> -   
> --
> |Container ||  QEMU(VM) |   | 
>   VDUSE daemon |
> |   -  ||  ---  |   | 
> -  |
> |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> emulation | | block driver | |
> +--- ---+   
> -+--+-
> |   ||
>   |
> |   ||
>   |
> +---++--+-
> || block device |   |  vhost device || vduse driver | 
>  | TCP/IP ||
> |---+   +---+ 
>  -+|
> |   |   |   | 
>   ||
> | --+--   --+--- ---+---  
>   ||
> | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |  
>   ||
> | --+--   --+--- ---+---  
>   ||
> |   |  virtio bus   |   | 
>   ||
> |   ++---   |   | 
>   ||
> ||  |   | 
>   ||
> |  --+--|   | 
>   ||
> |  | virtio-blk device ||   | 
>   ||
> |  --+--|   | 
>   ||
> ||  |   | 
>   ||
> | ---+---   |   | 
>   ||
> | |  virtio-vdpa driver |   |   | 
>   ||
> | ---+---   |   | 
>   ||
> ||  |   |vdpa 
> bus   ||
> | 
> ---+--+---+   
> ||
> | 
>---+--- |
> -|
>  NIC |--
>   
>---+---
>   
>   |
>   
>  -+-
>   
>  | Remote Storages |
>   
>  ---
> 
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used both in containers and
> VMs. Thus, we can have an unified technology stack in this two cases.
> 
> To test it with null-blk:
> 
>   $ qemu-storage-daemon \
>   --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>   --monitor chardev=charmonitor \
>   --blockdev 
> 

Re: Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space

2021-05-19 Thread Michael S. Tsirkin
On Thu, May 20, 2021 at 01:25:16PM +0800, Yongji Xie wrote:
> On Wed, May 19, 2021 at 10:42 PM Dan Carpenter  
> wrote:
> >
> > On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> > > On Mon, May 17, 2021 at 5:56 PM Xie Yongji  
> > > wrote:
> > > >
> > > > This ensures that we will not use an invalid block size
> > > > in config space (might come from an untrusted device).
> >
> > I looked at if I should add this as an untrusted function so that Smatch
> > could find these sorts of bugs but this is reading data from the host so
> > there has to be some level of trust...
> >
> 
> It would be great if Smatch could detect this case if possible. The
> data might be trusted in traditional VM cases. But now the data can be
> read from a userspace daemon when VDUSE is enabled.
> 
> > I should add some more untrusted data kvm functions to Smatch.  Right
> > now I only have kvm_register_read() and I've added kvm_read_guest_virt()
> > just now.
> >
> > > >
> > > > Signed-off-by: Xie Yongji 
> > > > ---
> > > >  drivers/block/virtio_blk.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index ebb4d3fe803f..c848aa36d49b 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> > > >struct virtio_blk_config, blk_size,
> > > >_size);
> > > > -   if (!err)
> > > > +   if (!err && blk_size > 0 && blk_size <= max_size)
> > >
> > > The check here is incorrect. I will use PAGE_SIZE as the maximum
> > > boundary in the new version.
> >
> > What does this bug look like to the user?
> 
> The kernel will panic if the block size is larger than PAGE_SIZE.

Kernel panic at this point is par for the course IMHO.
Let's focus on eliminating data corruption for starters.

> > A minimum block size of 1 seems pretty crazy.  Surely the minimum should be 
> > > higher?
> >
> 
> Yes, 512 is better here.
> 
> Thanks,
> Yongji

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/6] Add support for ACPI VIOT

2021-05-14 Thread Michael S. Tsirkin
On Fri, Apr 23, 2021 at 01:38:31PM +0200, Jean-Philippe Brucker wrote:
> Add a driver for the ACPI VIOT table, which provides topology
> information for para-virtual platforms. Enable virtio-iommu on
> non-devicetree platforms, including x86.

Acked-by: Michael S. Tsirkin 

Mostly ACPI stuff so I assume it's best to merge through that tree.

> Since v1 [1]:
> * The VIOT header definitions have been picked for v5.13
> * Share more code with IORT. Patches 1 and 2 extract the common code
>   from IORT.
> * Simplify the VIOT driver. Use the existing fwnode infrastructure
>   instead of adding hooks to the IOMMU driver.
> 
> You can find a QEMU implementation at [2], with extra support for
> testing all VIOT nodes including MMIO-based endpoints and IOMMU. This
> series, based on linux-next, is at [3]
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20210316191652.3401335-1-jean-phili...@linaro.org/
> [2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi
> [3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi
> 
> Jean-Philippe Brucker (6):
>   ACPI: arm64: Move DMA setup operations out of IORT
>   ACPI: Move IOMMU setup code out of IORT
>   ACPI: Add driver for the VIOT table
>   iommu/dma: Pass address limit rather than size to
> iommu_setup_dma_ops()
>   iommu/dma: Simplify calls to iommu_setup_dma_ops()
>   iommu/virtio: Enable x86 support
> 
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   4 +-
>  drivers/acpi/Makefile|   2 +
>  drivers/acpi/arm64/Makefile  |   1 +
>  include/acpi/acpi_bus.h  |   3 +
>  include/linux/acpi.h |   3 +
>  include/linux/acpi_iort.h|  14 +-
>  include/linux/acpi_viot.h|  19 ++
>  include/linux/dma-iommu.h|   4 +-
>  arch/arm64/mm/dma-mapping.c  |   2 +-
>  drivers/acpi/arm64/dma.c |  50 +
>  drivers/acpi/arm64/iort.c| 129 ++---
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |  60 +-
>  drivers/acpi/viot.c  | 350 +++
>  drivers/iommu/amd/iommu.c|   9 +-
>  drivers/iommu/dma-iommu.c|  17 +-
>  drivers/iommu/intel/iommu.c  |  10 +-
>  drivers/iommu/virtio-iommu.c |   8 +
>  MAINTAINERS  |   8 +
>  20 files changed, 548 insertions(+), 150 deletions(-)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/arm64/dma.c
>  create mode 100644 drivers/acpi/viot.c
> 
> -- 
> 2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/6] iommu/virtio: Enable x86 support

2021-05-14 Thread Michael S. Tsirkin
On Fri, Apr 23, 2021 at 01:38:37PM +0200, Jean-Philippe Brucker wrote:
> With the VIOT support in place, x86 platforms can now use the
> virtio-iommu.
> 
> Because the other x86 IOMMU drivers aren't yet ready to use the
> acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the
> moment. Similarly to Vt-d and AMD IOMMU, call iommu_setup_dma_ops() from
> probe_finalize().
> 
> Acked-by: Joerg Roedel 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/iommu/Kconfig| 3 ++-
>  drivers/iommu/dma-iommu.c| 1 +
>  drivers/iommu/virtio-iommu.c | 8 
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index aff8a4830dd1..07b7c25cbed8 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -400,8 +400,9 @@ config HYPERV_IOMMU
>  config VIRTIO_IOMMU
>   tristate "Virtio IOMMU driver"
>   depends on VIRTIO
> - depends on ARM64
> + depends on (ARM64 || X86)
>   select IOMMU_API
> + select IOMMU_DMA
>   select INTERVAL_TREE
>   select ACPI_VIOT if ACPI
>   help
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 175f8eaeb5b3..46ed43c400cf 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1332,6 +1332,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
> dma_base, u64 dma_limit)
>pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA 
> ops\n",
>dev_name(dev));
>  }
> +EXPORT_SYMBOL_GPL(iommu_setup_dma_ops);
>  
>  static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>   phys_addr_t msi_addr, struct iommu_domain *domain)
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 29a397c2d12f..8be546a338e7 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -1027,6 +1027,13 @@ static struct iommu_device *viommu_probe_device(struct 
> device *dev)
>   return ERR_PTR(ret);
>  }
>  
> +static void viommu_probe_finalize(struct device *dev)
> +{
> +#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> + iommu_setup_dma_ops(dev, 0, U64_MAX);
> +#endif
> +}
> +
>  static void viommu_release_device(struct device *dev)
>  {
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> @@ -1063,6 +1070,7 @@ static struct iommu_ops viommu_ops = {
>   .iova_to_phys   = viommu_iova_to_phys,
>   .iotlb_sync = viommu_iotlb_sync,
>   .probe_device   = viommu_probe_device,
> + .probe_finalize = viommu_probe_finalize,
>   .release_device = viommu_release_device,
>   .device_group   = viommu_device_group,
>   .get_resv_regions   = viommu_get_resv_regions,
> -- 
> 2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2021-03-18 Thread Michael S. Tsirkin
On Tue, Mar 16, 2021 at 08:16:54PM +0100, Jean-Philippe Brucker wrote:
> With the VIOT support in place, x86 platforms can now use the
> virtio-iommu.
> 
> The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it
> themselves.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/iommu/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 2819b5c8ec30..ccca83ef2f06 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -400,8 +400,9 @@ config HYPERV_IOMMU
>  config VIRTIO_IOMMU
>   tristate "Virtio IOMMU driver"
>   depends on VIRTIO
> - depends on ARM64
> + depends on (ARM64 || X86)
>   select IOMMU_API
> + select IOMMU_DMA if X86

Would it hurt to just select unconditionally? Seems a bit cleaner
...

>   select INTERVAL_TREE
>   select ACPI_VIOT if ACPI
>   help
> -- 
> 2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/virtio: unchecked device dma address and length

2020-12-16 Thread Michael S. Tsirkin
On Tue, Dec 15, 2020 at 11:20:48AM +0800, Jason Wang wrote:
> 
> On 2020/12/15 上午5:49, Konrad Rzeszutek Wilk wrote:
> > On Fri, Dec 11, 2020 at 06:31:21PM +0100, Felicitas Hetzelt wrote:
> > > Hello,
> > Hi! Please see below my responses.
> > 
> > > we have been analyzing the Hypervisor-OS interface of Linux
> > > and discovered bugs in the swiotlb/virtio implementation that can be
> > > triggered from a malicious Hypervisor / virtual device.
> > > With SEV, the SWIOTLB implementation is forcefully enabled and would
> > > always be used. Thus, all virtio devices and others would use it under
> > > the hood.
> > > 
> > > The reason for analyzing this interface is that, technologies such as
> > > Intel's Trusted Domain Extensions [1] and AMD's Secure Nested Paging [2]
> > > change the threat model assumed by various Linux kernel subsystems.
> > > These technologies take the presence of a fully malicious hypervisor
> > > into account and aim to provide protection for virtual machines in such
> > > an environment. Therefore, all input received from the hypervisor or an
> > > external device should be carefully validated. Note that these issues
> > > are of little (or no) relevance in a "normal" virtualization setup,
> > > nevertheless we believe that it is required to fix them if TDX or SNP is
> > > used.
> > > 
> > > We are happy to provide more information if needed!
> > > 
> > > [1]
> > > https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
> > > 
> > > [2]https://www.amd.com/en/processors/amd-secure-encrypted-virtualization
> > > 
> > > Bug:
> > > OOB memory write.
> > > dma_unmap_single -> swiotlb_tbl_unmap_single is invoked with dma_addr
> > > and length parameters that are under control of the device.
> > > This happens e.g. in virtio_ring:
> > > https://elixir.bootlin.com/linux/v5.10-rc7/source/drivers/virtio/virtio_ring.c#L378
> > Heya!
> > 
> > Thank you for pointing this out! I've a couple of questions and hope you can
> > help me out with them.
> > 
> > Also CC-ing AMD / TDX folks.
> > > This raises two issues:
> > > 1) swiotlb_tlb_unmap_single fails to check whether the index generated
> > > from the dma_addr is in range of the io_tlb_orig_addr array.
> > That is fairly simple to implement I would think. That is it can check
> > that the dma_addr is from the PA in the io_tlb pool when SWIOTLB=force
> > is used.
> 
> 
> I'm not sure this can fix all the cases. It looks to me we should map
> descriptor coherent but readonly (which is not supported by current DMA
> API).

Neither is this supported but encrypted memory technologies.

> Otherwise, device can modify the desc[i].addr/desc[i].len at any time to
> pretend a valid mapping.
> 
> Thanks
> 
> 
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-10-28 Thread Michael S. Tsirkin
On Wed, Oct 28, 2020 at 03:24:03PM +0100, Alexander Graf wrote:
> On 24.02.20 18:16, Christoph Hellwig wrote:
> > On Sat, Feb 22, 2020 at 02:07:58PM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Feb 21, 2020 at 03:33:40PM +0100, Halil Pasic wrote:
> > > > AFAIU you have a positive attitude towards the idea, that
> > > > !F_VIRTIO_PLATFORM implies 'no DMA API is used by virtio'
> > > > should be scrapped.
> > > > 
> > > > I would like to accomplish that without adverse effects to virtio-ccw
> > > > (because caring for virtio-ccw is a part of job description).
> > > > 
> > > > Regards,
> > > > Halil
> > > 
> > > It is possible, in theory. IIRC the main challenge is that DMA API
> > > has overhead of indirect function calls even when all it
> > > does it return back the PA without changes.
> > 
> > That overhead is gone now, the DMA direct calls are direct calls these
> > days.
> 
> Michael, would that mitigate your concerns to just always use the DMA API?
> If not, wouldn't it make sense to benchmark and pinpoint Christoph to paths
> that do slow things down, so we can finally move virtio into a world where
> it doesn't bypass the kernel DMA infrastructure.
> 
> 
> Alex


There's no specific concern really. One can in theory move the code
handling !VIRTIO_F_ACCESS_PLATFORM such that instead of having a branch
in virtio code, you instead override platform DMA API and then use DMA
API unconditionally.

The gain from that will probably be marginal, but maybe I'm wrong.
Let's see the patches.

We still need to do the override when !VIRTIO_F_ACCESS_PLATFORM,
according to spec. Encrypted hypervisors still need to set
VIRTIO_F_ACCESS_PLATFORM.


Is VIRTIO_F_ACCESS_PLATFORM a good default? Need to support
legacy guests does not allow that at the qemu level, we
need to be conservative there. But yes, if you have a very
modern guest then it might be a good idea to just always
enable VIRTIO_F_ACCESS_PLATFORM. I say might since unfortunately
most people only use VIRTIO_F_ACCESS_PLATFORM with
vIOMMU, using it without VIRTIO_F_ACCESS_PLATFORM is supported
but is a path less well trodden. Benchmarking that,
fixing issues that surface if any would be imho effort
well spent, and would be a prerequisite to making it
a default in a production system.


> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Fri, Sep 25, 2020 at 01:26:29PM +0200, Jean-Philippe Brucker wrote:
> On Fri, Sep 25, 2020 at 06:22:57AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> > > On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > > > Add a topology description to the virtio-iommu driver and enable x86
> > > > platforms.
> > > > 
> > > > Since [v2] we have made some progress on adding ACPI support for
> > > > virtio-iommu, which is the preferred boot method on x86. It will be a
> > > > new vendor-agnostic table describing para-virtual topologies in a
> > > > minimal format. However some platforms don't use either ACPI or DT for
> > > > booting (for example microvm), and will need the alternative topology
> > > > description method proposed here. In addition, since the process to get
> > > > a new ACPI table will take a long time, this provides a boot method even
> > > > to ACPI-based platforms, if only temporarily for testing and
> > > > development.
> > > > 
> > > > v3:
> > > > * Add patch 1 that moves virtio-iommu to a subfolder.
> > > > * Split the rest:
> > > >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > > > support.
> > > >   * Patch 4 adds definitions.
> > > >   * Patch 5 adds parser in topology.c.
> > > > * Address other comments.
> > > > 
> > > > Linux and QEMU patches available at:
> > > > https://jpbrucker.net/git/linux virtio-iommu/devel
> > > > https://jpbrucker.net/git/qemu virtio-iommu/devel
> > > 
> > > I'm parking this work again, until we make progress on the ACPI table, or
> > > until a platform without ACPI and DT needs it. Until then, I've pushed v4
> > > to my virtio-iommu/topo branch and will keep it rebased on master.
> > > 
> > > Thanks,
> > > Jean
> > 
> > I think you guys need to work on virtio spec too, not too much left to
> > do there ...
> 
> I know it's ready and I'd really like to move on with this, but I'd rather
> not commit it to the spec until we know it's going to be used at all. As
> Gerd pointed out the one example we had, microvm, now supports ACPI. Since
> we've kicked off the ACPI work anyway it isn't clear that the built-in
> topology will be useful.
> 
> Thanks,
> Jean

Many power platforms are OF based, thus without ACPI or DT support.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > Add a topology description to the virtio-iommu driver and enable x86
> > platforms.
> > 
> > Since [v2] we have made some progress on adding ACPI support for
> > virtio-iommu, which is the preferred boot method on x86. It will be a
> > new vendor-agnostic table describing para-virtual topologies in a
> > minimal format. However some platforms don't use either ACPI or DT for
> > booting (for example microvm), and will need the alternative topology
> > description method proposed here. In addition, since the process to get
> > a new ACPI table will take a long time, this provides a boot method even
> > to ACPI-based platforms, if only temporarily for testing and
> > development.
> > 
> > v3:
> > * Add patch 1 that moves virtio-iommu to a subfolder.
> > * Split the rest:
> >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > support.
> >   * Patch 4 adds definitions.
> >   * Patch 5 adds parser in topology.c.
> > * Address other comments.
> > 
> > Linux and QEMU patches available at:
> > https://jpbrucker.net/git/linux virtio-iommu/devel
> > https://jpbrucker.net/git/qemu virtio-iommu/devel
> 
> I'm parking this work again, until we make progress on the ACPI table, or
> until a platform without ACPI and DT needs it. Until then, I've pushed v4
> to my virtio-iommu/topo branch and will keep it rebased on master.
> 
> Thanks,
> Jean

I think you guys need to work on virtio spec too, not too much left to
do there ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 02:50:46PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 08:41:21AM -0400, Michael S. Tsirkin wrote:
> > But this has nothing to do with Linux.  There is also no guarantee that
> > the two committees will decide to use exactly the same format. Once one
> > of them sets the format in stone, we can add support for that format to
> > linux. If another one is playing nice and uses the same format, we can
> > use the same parsers. If it doesn't linux will have to follow suit.
> 
> Or Linux decides to support only one of the formats, which would then be
> ACPI.
> 
> Regards,
> 
>   Joerg

It's really up to hypervisors not guests, linux as a guest can for sure
refuse to work somewhere, but that's normally not very attractive.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 12:02:55PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:38:13AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> > > On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > > > OK so this looks good. Can you pls repost with the minor tweak
> > > > suggested and all acks included, and I will queue this?
> > > 
> > > My NACK still stands, as long as a few questions are open:
> > > 
> > >   1) The format used here will be the same as in the ACPI table? I
> > >  think the answer to this questions must be Yes, so this leads
> > >  to the real question:
> > 
> > I am not sure it's a must.
> 
> It is, having only one parser for the ACPI and MMIO descriptions was one
> of the selling points for MMIO in past discussions and I think it makes
> sense to keep them in sync.
> 
> > We can always tweak the parser if there are slight differences
> > between ACPI and virtio formats.
> 
> There is no guarantee that there only need to be "tweaks" until the
> ACPI table format is stablized.
> 
> Regards,
> 
>   Joerg

But this has nothing to do with Linux.  There is also no guarantee that
the two committees will decide to use exactly the same format. Once one
of them sets the format in stone, we can add support for that format to
linux. If another one is playing nice and uses the same format, we can
use the same parsers. If it doesn't linux will have to follow suit.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > OK so this looks good. Can you pls repost with the minor tweak
> > suggested and all acks included, and I will queue this?
> 
> My NACK still stands, as long as a few questions are open:
> 
>   1) The format used here will be the same as in the ACPI table? I
>  think the answer to this questions must be Yes, so this leads
>  to the real question:

I am not sure it's a must.
We can always tweak the parser if there are slight differences
between ACPI and virtio formats.

But we do want the virtio format used here to be approved by the virtio
TC, so it won't change.

Eric, Jean-Philippe, does one of you intend to create a github issue
and request a ballot for the TC? It's been posted end of August with no
changes ...

>   2) Has the ACPI table format stabalized already? If and only if
>  the answer is Yes I will Ack these patches. We don't need to
>  wait until the ACPI table format is published in a
>  specification update, but at least some certainty that it
>  will not change in incompatible ways anymore is needed.
> 

Not that I know, but I don't see why it's a must.

> So what progress has been made with the ACPI table specification, is it
> just a matter of time to get it approved or are there concerns?
> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Fri, Sep 04, 2020 at 06:24:12PM +0200, Auger Eric wrote:
> Hi,
> 
> On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> > Add a topology description to the virtio-iommu driver and enable x86
> > platforms.
> > 
> > Since [v2] we have made some progress on adding ACPI support for
> > virtio-iommu, which is the preferred boot method on x86. It will be a
> > new vendor-agnostic table describing para-virtual topologies in a
> > minimal format. However some platforms don't use either ACPI or DT for
> > booting (for example microvm), and will need the alternative topology
> > description method proposed here. In addition, since the process to get
> > a new ACPI table will take a long time, this provides a boot method even
> > to ACPI-based platforms, if only temporarily for testing and
> > development.
> > 
> > v3:
> > * Add patch 1 that moves virtio-iommu to a subfolder.
> > * Split the rest:
> >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > support.
> >   * Patch 4 adds definitions.
> >   * Patch 5 adds parser in topology.c.
> > * Address other comments.
> > 
> > Linux and QEMU patches available at:
> > https://jpbrucker.net/git/linux virtio-iommu/devel
> > https://jpbrucker.net/git/qemu virtio-iommu/devel
> I have tested that series with above QEMU branch on ARM with virtio-net
> and virtio-blk translated devices in non DT mode.
> 
> It works for me:
> Tested-by: Eric Auger 
> 
> Thanks
> 
> Eric

OK so this looks good. Can you pls repost with the minor tweak
suggested and all acks included, and I will queue this?

Thanks!

> > 
> > [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> > [v2] 
> > https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> > [v1] 
> > https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> > [rfc] 
> > https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> > 
> > Jean-Philippe Brucker (6):
> >   iommu/virtio: Move to drivers/iommu/virtio/
> >   iommu/virtio: Add topology helpers
> >   PCI: Add DMA configuration for virtual platforms
> >   iommu/virtio: Add topology definitions
> >   iommu/virtio: Support topology description in config space
> >   iommu/virtio: Enable x86 support
> > 
> >  drivers/iommu/Kconfig |  18 +-
> >  drivers/iommu/Makefile|   3 +-
> >  drivers/iommu/virtio/Makefile |   4 +
> >  drivers/iommu/virtio/topology-helpers.h   |  50 +
> >  include/linux/virt_iommu.h|  15 ++
> >  include/uapi/linux/virtio_iommu.h |  44 
> >  drivers/iommu/virtio/topology-helpers.c   | 196 
> >  drivers/iommu/virtio/topology.c   | 259 ++
> >  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
> >  drivers/pci/pci-driver.c  |   5 +
> >  MAINTAINERS   |   3 +-
> >  11 files changed, 597 insertions(+), 4 deletions(-)
> >  create mode 100644 drivers/iommu/virtio/Makefile
> >  create mode 100644 drivers/iommu/virtio/topology-helpers.h
> >  create mode 100644 include/linux/virt_iommu.h
> >  create mode 100644 drivers/iommu/virtio/topology-helpers.c
> >  create mode 100644 drivers/iommu/virtio/topology.c
> >  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-08-26 Thread Michael S. Tsirkin
On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> Add a topology description to the virtio-iommu driver and enable x86
> platforms.
> 
> Since [v2] we have made some progress on adding ACPI support for
> virtio-iommu, which is the preferred boot method on x86. It will be a
> new vendor-agnostic table describing para-virtual topologies in a
> minimal format. However some platforms don't use either ACPI or DT for
> booting (for example microvm), and will need the alternative topology
> description method proposed here. In addition, since the process to get
> a new ACPI table will take a long time, this provides a boot method even
> to ACPI-based platforms, if only temporarily for testing and
> development.

OK should I park this in next now? Seems appropriate ...

> v3:
> * Add patch 1 that moves virtio-iommu to a subfolder.
> * Split the rest:
>   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> support.
>   * Patch 4 adds definitions.
>   * Patch 5 adds parser in topology.c.
> * Address other comments.
> 
> Linux and QEMU patches available at:
> https://jpbrucker.net/git/linux virtio-iommu/devel
> https://jpbrucker.net/git/qemu virtio-iommu/devel
> 
> [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> [v2] 
> https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> [v1] 
> https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> [rfc] 
> https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Move to drivers/iommu/virtio/
>   iommu/virtio: Add topology helpers
>   PCI: Add DMA configuration for virtual platforms
>   iommu/virtio: Add topology definitions
>   iommu/virtio: Support topology description in config space
>   iommu/virtio: Enable x86 support
> 
>  drivers/iommu/Kconfig |  18 +-
>  drivers/iommu/Makefile|   3 +-
>  drivers/iommu/virtio/Makefile |   4 +
>  drivers/iommu/virtio/topology-helpers.h   |  50 +
>  include/linux/virt_iommu.h|  15 ++
>  include/uapi/linux/virtio_iommu.h |  44 
>  drivers/iommu/virtio/topology-helpers.c   | 196 
>  drivers/iommu/virtio/topology.c   | 259 ++
>  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
>  drivers/pci/pci-driver.c  |   5 +
>  MAINTAINERS   |   3 +-
>  11 files changed, 597 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/iommu/virtio/Makefile
>  create mode 100644 drivers/iommu/virtio/topology-helpers.h
>  create mode 100644 include/linux/virt_iommu.h
>  create mode 100644 drivers/iommu/virtio/topology-helpers.c
>  create mode 100644 drivers/iommu/virtio/topology.c
>  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> 
> -- 
> 2.28.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 36/38] virtio-iommu: convert to LE accessors

2020-08-05 Thread Michael S. Tsirkin
Virtio iommu is modern-only. Use LE accessors for config space.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/iommu/virtio-iommu.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index f6f07489a9aa..b4da396cce60 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1010,8 +1010,8 @@ static int viommu_probe(struct virtio_device *vdev)
if (ret)
return ret;
 
-   virtio_cread(vdev, struct virtio_iommu_config, page_size_mask,
->pgsize_bitmap);
+   virtio_cread_le(vdev, struct virtio_iommu_config, page_size_mask,
+   >pgsize_bitmap);
 
if (!viommu->pgsize_bitmap) {
ret = -EINVAL;
@@ -1022,25 +1022,25 @@ static int viommu_probe(struct virtio_device *vdev)
viommu->last_domain = ~0U;
 
/* Optional features */
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
-struct virtio_iommu_config, input_range.start,
-_start);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
+   struct virtio_iommu_config, input_range.start,
+   _start);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
-struct virtio_iommu_config, input_range.end,
-_end);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
+   struct virtio_iommu_config, input_range.end,
+   _end);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
-struct virtio_iommu_config, domain_range.start,
->first_domain);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
+   struct virtio_iommu_config, domain_range.start,
+   >first_domain);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
-struct virtio_iommu_config, domain_range.end,
->last_domain);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
+   struct virtio_iommu_config, domain_range.end,
+   >last_domain);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_PROBE,
-struct virtio_iommu_config, probe_size,
->probe_size);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_PROBE,
+   struct virtio_iommu_config, probe_size,
+   >probe_size);
 
viommu->geometry = (struct iommu_domain_geometry) {
.aperture_start = input_start,
-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-11 Thread Michael S. Tsirkin
On Fri, Jul 10, 2020 at 10:23:22AM -0700, Stefano Stabellini wrote:
> Sorry for the late reply -- a couple of conferences kept me busy.
> 
> 
> On Wed, 1 Jul 2020, Michael S. Tsirkin wrote:
> > On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> > > Would you be in favor of a more flexible check along the lines of the
> > > one proposed in the patch that started this thread:
> > > 
> > > if (xen_vring_use_dma())
> > > return true;
> > > 
> > > 
> > > xen_vring_use_dma would be implemented so that it returns true when
> > > xen_swiotlb is required and false otherwise.
> > 
> > Just to stress - with a patch like this virtio can *still* use DMA API
> > if PLATFORM_ACCESS is set. So if DMA API is broken on some platforms
> > as you seem to be saying, you guys should fix it before doing something
> > like this..
> 
> Yes, DMA API is broken with some interfaces (specifically: rpmesg and
> trusty), but for them PLATFORM_ACCESS is never set. That is why the
> errors weren't reported before. Xen special case aside, there is no
> problem under normal circumstances.

So why not fix DMA API? Then this patch is not needed.


> 
> If you are OK with this patch (after a little bit of clean-up), Peng,
> are you OK with sending an update or do you want me to?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Michael S. Tsirkin
On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> Would you be in favor of a more flexible check along the lines of the
> one proposed in the patch that started this thread:
> 
> if (xen_vring_use_dma())
> return true;
> 
> 
> xen_vring_use_dma would be implemented so that it returns true when
> xen_swiotlb is required and false otherwise.

Just to stress - with a patch like this virtio can *still* use DMA API
if PLATFORM_ACCESS is set. So if DMA API is broken on some platforms
as you seem to be saying, you guys should fix it before doing something
like this..

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Michael S. Tsirkin
On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> On Wed, 1 Jul 2020, Christoph Hellwig wrote:
> > On Mon, Jun 29, 2020 at 04:46:09PM -0700, Stefano Stabellini wrote:
> > > > I could imagine some future Xen hosts setting a flag somewhere in the
> > > > platform capability saying "no xen specific flag, rely on
> > > > "VIRTIO_F_ACCESS_PLATFORM". Then you set that accordingly in QEMU.
> > > > How about that?
> > > 
> > > Yes, that would be fine and there is no problem implementing something
> > > like that when we get virtio support in Xen. Today there are still no
> > > virtio interfaces provided by Xen to ARM guests (no virtio-block/net,
> > > etc.)
> > > 
> > > In fact, in both cases we are discussing virtio is *not* provided by
> > > Xen; it is a firmware interface to something entirely different:
> > > 
> > > 1) virtio is used to talk to a remote AMP processor (RPMesg)
> > > 2) virtio is used to talk to a secure-world firmware/OS (Trusty)
> > >
> > > VIRTIO_F_ACCESS_PLATFORM is not set by Xen in these cases but by RPMesg
> > > and by Trusty respectively. I don't know if Trusty should or should not
> > > set VIRTIO_F_ACCESS_PLATFORM, but I think Linux should still work
> > > without issues.
> > > 
> > 
> > Any virtio implementation that is not in control of the memory map
> > (aka not the hypervisor) absolutely must set VIRTIO_F_ACCESS_PLATFORM,
> > else it is completely broken.
> 
> Lots of broken virtio implementations out there it would seem :-(

Not really, most of virtio implementations are in full control of
memory, being part of the hypervisor.

> 
> > > The xen_domain() check in Linux makes it so that vring_use_dma_api
> > > returns the opposite value on native Linux compared to Linux as Xen/ARM
> > > DomU by "accident". By "accident" because there is no architectural
> > > reason why Linux Xen/ARM DomU should behave differently compared to
> > > native Linux in this regard.
> > > 
> > > I hope that now it is clearer why I think the if (xen_domain()) check
> > > needs to be improved anyway, even if we fix generic dma_ops with virtio
> > > interfaces missing VIRTIO_F_ACCESS_PLATFORM.
> > 
> > IMHO that Xen quirk should never have been added in this form..
> 
> Would you be in favor of a more flexible check along the lines of the
> one proposed in the patch that started this thread:
> 
> if (xen_vring_use_dma())
> return true;
> 
> 
> xen_vring_use_dma would be implemented so that it returns true when
> xen_swiotlb is required and false otherwise.

I'll need to think about it. Sounds reasonable on the surface ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 06:25:41AM +, Peng Fan wrote:
> > > > > Anyway, re-reading the last messages of the original thread [1],
> > > > > it looks like Peng had a clear idea on how to fix the general issue.
> > > > > Peng, what happened with that?
> > >
> > > We shrinked the rpmsg reserved area to workaround the issue.
> > > So still use the dma apis in rpmsg.
> > >
> > > But here I am going to address domu android trusty issue using virtio.
> > 
> > My suggestion is to first of all fix DMA API so it works properly.
> 
> Could you please elaborate more details?
> 
> You mean the DMA API usage of rpmsg? Or xen domu dma_ops?
> 
> Thanks,
> Peng. 

Not 100% sure but I think xen dma ops.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 03:05:19AM +, Peng Fan wrote:
> > Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> > 
> > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > > >
> > > > > > > > > Use xen_swiotlb to determine when vring should use dma
> > > > > > > > > APIs to map the
> > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required.
> > > > > > > > > When it is disabled, it is not required.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Peng Fan 
> > > > > > > >
> > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for
> > this?
> > > > > > > > Xen was there first, but everyone else is using that now.
> > > > > > >
> > > > > > > Unfortunately it is complicated and it is not related to
> > > > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > > >
> > > > > > >
> > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to
> > > > > > > translate foreign mappings (memory coming from other VMs) to
> > physical addresses.
> > > > > > > On x86, it also uses dma_ops to translate Linux's idea of a
> > > > > > > physical address into a real physical address (this is
> > > > > > > unneeded on ARM.)
> > > > > > >
> > > > > > >
> > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be
> > > > > > > used on Xen/x86 always and on Xen/ARM if Linux is Dom0
> > > > > > > (because it has foreign
> > > > > > > mappings.) That is why we have the if (xen_domain) return
> > > > > > > true; in vring_use_dma_api.
> > > > > >
> > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > > >
> > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also* forces
> > > > > > DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > > > >
> > > > > > Unfortunately as a result Xen never got around to properly
> > > > > > setting VIRTIO_F_IOMMU_PLATFORM.
> > > > >
> > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this
> > > > > because the usage of swiotlb_xen is not a property of virtio,
> > > >
> > > >
> > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM (that is it's
> > > > name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is what linux
> > > > calls it) is declared as "special, don't follow normal rules for
> > > > access".
> > > >
> > > > So yes swiotlb_xen is not a property of virtio, but what *is* a
> > > > property of virtio is that it's not special, just a regular device from 
> > > > DMA
> > POV.
> > >
> > > I am trying to understand what you meant but I think I am missing
> > > something.
> > >
> > > Are you saying that modern virtio should always have
> > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other
> > devices?
> > 
> > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you have
> > some special needs e.g. you are very sure it's ok to bypass DMA ops, or you
> > need to support a legacy guest (produced in the window between virtio 1
> > support in 2014 and support for VIRTIO_F_ACCESS_PLATFORM in 2016).
> > 
> > 
> > > If that is the case, how is it possible that virtio breaks on ARM
> > > using the default dma_ops? The breakage is not Xen related (except
> > > that Xen turns dma_ops on). The original message from Peng was:
> > >
> > >   vring_map_one

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-26 Thread Michael S. Tsirkin
On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > 
> > > > > > > Use xen_swiotlb to determine when vring should use dma APIs to 
> > > > > > > map the
> > > > > > > ring: when xen_swiotlb is enabled the dma API is required. When 
> > > > > > > it is
> > > > > > > disabled, it is not required.
> > > > > > > 
> > > > > > > Signed-off-by: Peng Fan 
> > > > > > 
> > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > > > > > Xen was there first, but everyone else is using that now.
> > > > > 
> > > > > Unfortunately it is complicated and it is not related to
> > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > 
> > > > > 
> > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> > > > > foreign mappings (memory coming from other VMs) to physical addresses.
> > > > > On x86, it also uses dma_ops to translate Linux's idea of a physical
> > > > > address into a real physical address (this is unneeded on ARM.)
> > > > > 
> > > > > 
> > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on 
> > > > > Xen/x86
> > > > > always and on Xen/ARM if Linux is Dom0 (because it has foreign
> > > > > mappings.) That is why we have the if (xen_domain) return true; in
> > > > > vring_use_dma_api.
> > > > 
> > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > 
> > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > >
> > > > Unfortunately as a result Xen never got around to
> > > > properly setting VIRTIO_F_IOMMU_PLATFORM.
> > > 
> > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because
> > > the usage of swiotlb_xen is not a property of virtio,
> > 
> > 
> > Basically any device without VIRTIO_F_ACCESS_PLATFORM
> > (that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is
> > what linux calls it) is declared as "special, don't follow normal rules
> > for access".
> > 
> > So yes swiotlb_xen is not a property of virtio, but what *is* a property
> > of virtio is that it's not special, just a regular device from DMA POV.
> 
> I am trying to understand what you meant but I think I am missing
> something.
> 
> Are you saying that modern virtio should always have
> VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other devices?

I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you
have some special needs e.g. you are very sure it's ok to bypass DMA
ops, or you need to support a legacy guest (produced in the window
between virtio 1 support in 2014 and support for
VIRTIO_F_ACCESS_PLATFORM in 2016).


> If that is the case, how is it possible that virtio breaks on ARM using
> the default dma_ops? The breakage is not Xen related (except that Xen
> turns dma_ops on). The original message from Peng was:
> 
>   vring_map_one_sg -> vring_use_dma_api
>-> dma_map_page
>  -> __swiotlb_map_page
>   ->swiotlb_map_page
>   ->__dma_map_area(phys_to_virt(dma_to_phys(dev, 
> dev_addr)), size, dir);
>   However we are using per device dma area for rpmsg, phys_to_virt
>   could not return a correct virtual address for virtual address in
>   vmalloc area. Then kernel panic.
> 
> I must be missing something. Maybe it is because it has to do with RPMesg?

I think it's an RPMesg bug, yes.

> 
> > > > > You might have noticed that I missed one possible case above: Xen/ARM
> > > > > DomU :-)
> > > > > 
> > > > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So 
> > > > > if
> > > 

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-24 Thread Michael S. Tsirkin
On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > 
> > > > > Use xen_swiotlb to determine when vring should use dma APIs to map the
> > > > > ring: when xen_swiotlb is enabled the dma API is required. When it is
> > > > > disabled, it is not required.
> > > > > 
> > > > > Signed-off-by: Peng Fan 
> > > > 
> > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > > > Xen was there first, but everyone else is using that now.
> > > 
> > > Unfortunately it is complicated and it is not related to
> > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > 
> > > 
> > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> > > foreign mappings (memory coming from other VMs) to physical addresses.
> > > On x86, it also uses dma_ops to translate Linux's idea of a physical
> > > address into a real physical address (this is unneeded on ARM.)
> > > 
> > > 
> > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on 
> > > Xen/x86
> > > always and on Xen/ARM if Linux is Dom0 (because it has foreign
> > > mappings.) That is why we have the if (xen_domain) return true; in
> > > vring_use_dma_api.
> > 
> > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > 
> > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> >
> > Unfortunately as a result Xen never got around to
> > properly setting VIRTIO_F_IOMMU_PLATFORM.
> 
> I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because
> the usage of swiotlb_xen is not a property of virtio,


Basically any device without VIRTIO_F_ACCESS_PLATFORM
(that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is
what linux calls it) is declared as "special, don't follow normal rules
for access".

So yes swiotlb_xen is not a property of virtio, but what *is* a property
of virtio is that it's not special, just a regular device from DMA POV.


> it is a detail of
> the way Linux does Xen address translations. swiotlb-xen is used to do
> these translations and it is hooked into the dma_ops framework.
> 
> It would be possible to have a device in hardware that is
> virtio-compatible and doesn't set VIRTIO_F_IOMMU_PLATFORM.

That device would be basically broken, since hardware
can't know whether it can access all memory or not.

> The device
> could be directly assigned (passthrough) to a DomU. We would still
> have to use swiotlb_xen if Xen is running.
> 
> You should think of swiotlb-xen as only internal to Linux and not
> related to whether the (virtual or non-virtual) hardware comes with an
> IOMMU or not.

IOMMU is a misnomer here.  Virtio spec now calls this bit
VIRTIO_F_ACCESS_PLATFORM. We should have done the same a while ago -
I'll send a patch.

> 
> > > You might have noticed that I missed one possible case above: Xen/ARM
> > > DomU :-)
> > > 
> > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So if
> > > (xen_domain) return true; would give the wrong answer in that case.
> > > Linux would end up calling the "normal" dma_ops, not swiotlb-xen, and
> > > the "normal" dma_ops fail.
> > > 
> > > 
> > > The solution I suggested was to make the check in vring_use_dma_api more
> > > flexible by returning true if the swiotlb_xen is supposed to be used,
> > > not in general for all Xen domains, because that is what the check was
> > > really meant to do.
> > 
> > Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong with 
> > that?
> 
> swiotlb-xen is not used on Xen/ARM DomU, the default dma_ops are the
> ones that are used. So you are saying, why don't we fix the default
> dma_ops to work with virtio?
> 
> It is bad that the default dma_ops crash with virtio, so yes I think it
> would be good to fix that. However, even if we fixed that, the if
> (xen_domain()) check in vring_use_dma_api is still a problem.

Why is it a problem? It just makes virtio use DMA API.
If that in turn works, problem solved.



> 
> Alternatively we could try to work-around it from swiotlb-xen. We could
> enable swiotlb-xen for Xen/ARM DomUs with a different implementation so
> that we could leave the vring_use_dma_api check unmodified.
> 
> It would be ugly because we would have to figure out from the new
> swiotlb-xen functions if the device is a normal device, so we have to
> call the regular dma_ops functions, or if the device is a virtio device,
> in which case there is nothing to do. I think it is undesirable but
> could probably be made to work.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-24 Thread Michael S. Tsirkin
On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > Export xen_swiotlb for all platforms using xen swiotlb
> > > 
> > > Use xen_swiotlb to determine when vring should use dma APIs to map the
> > > ring: when xen_swiotlb is enabled the dma API is required. When it is
> > > disabled, it is not required.
> > > 
> > > Signed-off-by: Peng Fan 
> > 
> > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > Xen was there first, but everyone else is using that now.
> 
> Unfortunately it is complicated and it is not related to
> VIRTIO_F_IOMMU_PLATFORM :-(
> 
> 
> The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> foreign mappings (memory coming from other VMs) to physical addresses.
> On x86, it also uses dma_ops to translate Linux's idea of a physical
> address into a real physical address (this is unneeded on ARM.)
> 
> 
> So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on Xen/x86
> always and on Xen/ARM if Linux is Dom0 (because it has foreign
> mappings.) That is why we have the if (xen_domain) return true; in
> vring_use_dma_api.

VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.

Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.

Unfortunately as a result Xen never got around to
properly setting VIRTIO_F_IOMMU_PLATFORM.

I would like to make Xen do what everyone else is doing
which is just set VIRTIO_F_IOMMU_PLATFORM and then put
platform specific hacks inside DMA API.
Then we can think about deprecating the Xen hack in a
distance future, or hiding it behind a static branch, or something
like this.


> You might have noticed that I missed one possible case above: Xen/ARM
> DomU :-)
> 
> Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So if
> (xen_domain) return true; would give the wrong answer in that case.
> Linux would end up calling the "normal" dma_ops, not swiotlb-xen, and
> the "normal" dma_ops fail.
> 
> 
> The solution I suggested was to make the check in vring_use_dma_api more
> flexible by returning true if the swiotlb_xen is supposed to be used,
> not in general for all Xen domains, because that is what the check was
> really meant to do.

Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong with that?


> 
> In this regards I have two more comments:
> 
> - the comment on top of the check in vring_use_dma_api is still valid
> - the patch looks broken on x86: it should always return true, but it
>   returns false instead
> 
>  
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index a2de775801af..768afd79f67a 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -252,7 +252,7 @@ static bool vring_use_dma_api(struct virtio_device 
> > > *vdev)
> > >* the DMA API if we're a Xen guest, which at least allows
> > >* all of the sensible Xen configurations to work correctly.
> > >*/
> > > - if (xen_domain())
> > > + if (xen_vring_use_dma())
> > >   return true;
> > >  
> > >   return false;
> > 
> > 
> > The comment above this should probably be fixed.
> 
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-24 Thread Michael S. Tsirkin
On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> Export xen_swiotlb for all platforms using xen swiotlb
> 
> Use xen_swiotlb to determine when vring should use dma APIs to map the
> ring: when xen_swiotlb is enabled the dma API is required. When it is
> disabled, it is not required.
> 
> Signed-off-by: Peng Fan 

Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
Xen was there first, but everyone else is using that now.


> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index a2de775801af..768afd79f67a 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -252,7 +252,7 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>* the DMA API if we're a Xen guest, which at least allows
>* all of the sensible Xen configurations to work correctly.
>*/
> - if (xen_domain())
> + if (xen_vring_use_dma())
>   return true;
>  
>   return false;


The comment above this should probably be fixed.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-14 Thread Michael S. Tsirkin
On Thu, May 14, 2020 at 12:50:16PM +0200, Jean-Philippe Brucker wrote:
> On Thu, May 14, 2020 at 05:31:00AM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 14, 2020 at 01:22:37PM +0530, Bharat Bhushan wrote:
> > > Different endpoint can support different page size, probe
> > > endpoint if it supports specific page size otherwise use
> > > global page sizes.
> > > 
> > > Device attached to domain should support a minimum of
> > > domain supported page sizes. If device supports more
> > > than domain supported page sizes then device is limited
> > > to use domain supported page sizes only.
> > 
> > OK so I am just trying to figure it out.
> > Before the patch, we always use the domain supported page sizes
> > right?
> > 
> > With the patch, we still do, but we also probe and
> > validate that device supports all domain page sizes,
> > if it does not then we fail to attach the device.
> 
> Generally there is one endpoint per domain. Linux creates the domains and
> decides which endpoint goes in which domain. It puts multiple endpoints in
> a domain in two cases:
> 
> * If endpoints cannot be isolated from each others by the IOMMU, for
>   example if ACS isolation isn't enabled in PCIe. In that case endpoints
>   are in the same IOMMU group, and therefore contained in the same domain.
>   This is more of a quirk for broken hardware, and isn't much of a concern
>   for virtualization because it's easy for the hypervisor to present
>   endpoints isolated from each others.

Unless they aren't isolated on real hardware :)

> * If userspace wants to put endpoints in the same VFIO container, then
>   VFIO first attempts to put them in the same IOMMU domain, and falls back
>   to multiple domains if that fails. That case is just a luxury and we
>   shouldn't over-complicate the driver to cater for this.
> 
> So the attach checks don't need to be that complicated. Checking that the
> page masks are exactly the same should be enough.
> 
> > This seems like a lot of effort for little benefit, can't
> > hypervisor simply make sure endpoints support the
> > iommu page sizes for us?
> 
> I tend to agree, it's not very likely that we'll have a configuration with
> different page sizes between physical and virtual endpoints.
> 
> If there is a way for QEMU to simply reject VFIO devices that don't use
> the same page mask as what's configured globally, let's do that instead of
> introducing the page_size_mask property.

Or we can even do the subset thing in QEMU. Can be transparent to
guests.


So I guess this patch isn't really needed then.

> 
> > > @@ -615,7 +636,7 @@ static int viommu_domain_finalise(struct 
> > > viommu_endpoint *vdev,
> > >   struct viommu_dev *viommu = vdev->viommu;
> > >   struct viommu_domain *vdomain = to_viommu_domain(domain);
> > >  
> > > - viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> > > + viommu_page_size = 1UL << __ffs(vdev->pgsize_bitmap);
> > >   if (viommu_page_size > PAGE_SIZE) {
> > >   dev_err(vdev->dev,
> > >   "granule 0x%lx larger than system page size 0x%lx\n",
> > 
> > 
> > Looks like this is messed up on 32 bit: e.g. 0x1 will try to do
> > 1UL << -1, which is undefined behaviour. Which is btw already messed up
> > wrt viommu->pgsize_bitmap, but that's not a reason to propagate
> > the error.
> 
> Realistically we're not going to have a page granule larger than 4G, it's
> going to be 4k or 64k. But we can add a check that truncates the
> page_size_mask to 32-bit and makes sure that it's non-null.

... on 32 bit

> 
> > > +struct virtio_iommu_probe_pgsize_mask {
> > > + struct virtio_iommu_probe_property  head;
> > > + __u8reserved[4];
> > > + /* Same format as virtio_iommu_config::page_size_mask */
> > 
> > It's actually slightly different in that
> > this must be a superset of domain page size mask, right?
> 
> No it overrides the global mask

OK so I'd copy the comment and tweak it rather than
refer to virtio_iommu_config::page_size_mask
(besides, virtio_iommu_config::page_size_mask isn't legal C,
I know C++ so I figured out what's meant but it's
better to just say "page_size_mask in sturct virtio_iommu_config" )


> 
> > > + __le64  pgsize_bitmap;
> 
> Bharat, please rename this to page_size_mask for consistency
> 
> Thanks,
> Jean
> 
> > > +};
> > > +
> > >  #define VIRTIO_IOMMU_RESV_MEM_T_RESERVED 0
> > >  #define VIRTIO_IOMMU_RESV_MEM_T_MSI  1
> > >  
> > > -- 
> > > 2.17.1
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-14 Thread Michael S. Tsirkin
On Thu, May 14, 2020 at 01:22:37PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Device attached to domain should support a minimum of
> domain supported page sizes. If device supports more
> than domain supported page sizes then device is limited
> to use domain supported page sizes only.

OK so I am just trying to figure it out.
Before the patch, we always use the domain supported page sizes
right? With the patch, we still do, but we also probe and
validate that device supports all domain page sizes,
if it does not then we fail to attach the device.

This seems like a lot of effort for little benefit, can't
hypervisor simply make sure endpoints support the
iommu page sizes for us?



> 
> Signed-off-by: Bharat Bhushan 
> ---
> v5->v6
>  - property length before dereference
>  - Error out on no supported page sizes (page-size-mask is zero)
>  - Allow device to attach to domain even it supports
>minimum of domain supported page sizes. In that case device
>will use domain page sizes only.
>  - added format of pgsize_bitmap
> 
> v4->v5:
>  - Rebase to Linux v5.7-rc4
> 
> v3->v4:
>  - Fix whitespace error
> 
> v2->v3:
>  - Fixed error return for incompatible endpoint
>  - __u64 changed to __le64 in header file
> 
>  drivers/iommu/virtio-iommu.c  | 63 ---
>  include/uapi/linux/virtio_iommu.h | 14 ++-
>  2 files changed, 71 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 4e1d11af23c8..cbac3047a781 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,23 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask,
> + size_t len)
> +{
> + u64 pgsize_bitmap;
> +
> + if (len < sizeof(*mask))
> + return -EINVAL;
> +
> + pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> + if (!pgsize_bitmap)
> + return -EINVAL;
> +
> + vdev->pgsize_bitmap = pgsize_bitmap;
> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -499,6 +517,9 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }
> @@ -615,7 +636,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>   struct viommu_dev *viommu = vdev->viommu;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> - viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> + viommu_page_size = 1UL << __ffs(vdev->pgsize_bitmap);
>   if (viommu_page_size > PAGE_SIZE) {
>   dev_err(vdev->dev,
>   "granule 0x%lx larger than system page size 0x%lx\n",


Looks like this is messed up on 32 bit: e.g. 0x1 will try to do
1UL << -1, which is undefined behaviour. Which is btw already messed up
wrt viommu->pgsize_bitmap, but that's not a reason to propagate
the error.


> @@ -630,7 +651,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>  
>   vdomain->id = (unsigned int)ret;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
>   vdomain->map_flags  = viommu->map_flags;
> @@ -654,6 +675,38 @@ static void viommu_domain_free(struct iommu_domain 
> *domain)
>   kfree(vdomain);
>  }
>  
> +/*
> + * Check whether the endpoint's capabilities are compatible with other
> + * endpoints in the domain. Report any inconsistency.

This actually has side effects, so _is_ isn't a good name for it.
viommu_endpoint_compatible?

> + */
> +static bool viommu_endpoint_is_compatible(struct viommu_endpoint *vdev,
> +   struct viommu_domain *vdomain)
> +{
> + struct device *dev 

Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-12 Thread Michael S. Tsirkin
On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> v4->v5:
>  - Rebase to Linux v5.7-rc4
> 
> v3->v4:
>  - Fix whitespace error
> 
> v2->v3:
>  - Fixed error return for incompatible endpoint
>  - __u64 changed to __le64 in header file
> 
>  drivers/iommu/virtio-iommu.c  | 48 ---
>  include/uapi/linux/virtio_iommu.h |  7 +
>  2 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index d5cac4f46ca5..9513d2ab819e 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask,
> + size_t len)
> +{
> + u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> +
> + if (len < sizeof(*mask))
> + return -EINVAL;
> +
> + vdev->pgsize_bitmap = pgsize_bitmap;
> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -499,6 +513,9 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }

So given this is necessary early in boot, how about we
add this in the config space? And maybe ACPI too ...


> @@ -630,7 +647,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>  
>   vdomain->id = (unsigned int)ret;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
>   vdomain->map_flags  = viommu->map_flags;
> @@ -654,6 +671,29 @@ static void viommu_domain_free(struct iommu_domain 
> *domain)
>   kfree(vdomain);
>  }
>  
> +/*
> + * Check whether the endpoint's capabilities are compatible with other
> + * endpoints in the domain. Report any inconsistency.
> + */
> +static bool viommu_endpoint_is_compatible(struct viommu_endpoint *vdev,
> +   struct viommu_domain *vdomain)
> +{
> + struct device *dev = vdev->dev;
> +
> + if (vdomain->viommu != vdev->viommu) {
> + dev_err(dev, "cannot attach to foreign vIOMMU\n");
> + return false;
> + }
> +
> + if (vdomain->domain.pgsize_bitmap != vdev->pgsize_bitmap) {
> + dev_err(dev, "incompatible domain bitmap 0x%lx != 0x%llx\n",
> + vdomain->domain.pgsize_bitmap, vdev->pgsize_bitmap);
> + return false;
> + }
> +
> + return true;
> +}
> +
>  static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>   int i;
> @@ -670,9 +710,8 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
> struct device *dev)
>* owns it.
>*/
>   ret = viommu_domain_finalise(vdev, domain);
> - } else if (vdomain->viommu != vdev->viommu) {
> - dev_err(dev, "cannot attach to foreign vIOMMU\n");
> - ret = -EXDEV;
> + } else if (!viommu_endpoint_is_compatible(vdev, vdomain)) {
> + ret = -EINVAL;
>   }
>   mutex_unlock(>mutex);
>  
> @@ -886,6 +925,7 @@ static int viommu_add_device(struct device *dev)
>  
>   vdev->dev = dev;
>   vdev->viommu = viommu;
> + vdev->pgsize_bitmap = viommu->pgsize_bitmap;
>   INIT_LIST_HEAD(>resv_regions);
>   dev_iommu_priv_set(dev, vdev);
>  
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 48e3c29223b5..2cced7accc99 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -111,6 +111,7 @@ struct virtio_iommu_req_unmap {
>  
>  #define VIRTIO_IOMMU_PROBE_T_NONE0
>  #define VIRTIO_IOMMU_PROBE_T_RESV_MEM   

Re: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2020 at 02:51:32PM +0200, Auger Eric wrote:
> Hi,
> 
> On 5/7/20 1:32 PM, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2020 at 11:24:29AM +, Bharat Bhushan wrote:
> >>
> >>
> >>> -----Original Message-
> >>> From: Michael S. Tsirkin 
> >>> Sent: Wednesday, May 6, 2020 5:53 AM
> >>> To: Bharat Bhushan 
> >>> Cc: jean-phili...@linaro.org; j...@8bytes.org; jasow...@redhat.com;
> >>> virtualizat...@lists.linux-foundation.org; 
> >>> iommu@lists.linux-foundation.org;
> >>> linux-ker...@vger.kernel.org; eric.auger@gmail.com; 
> >>> eric.au...@redhat.com
> >>> Subject: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap 
> >>> supported by
> >>> endpoint
> >>>
> >>> External Email
> >>>
> >>> --
> >>> On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> >>>> Different endpoint can support different page size, probe endpoint if
> >>>> it supports specific page size otherwise use global page sizes.
> >>>>
> >>>> Signed-off-by: Bharat Bhushan 
> >>>> ---
> >>>> v4->v5:
> >>>>  - Rebase to Linux v5.7-rc4
> >>>>
> >>>> v3->v4:
> >>>>  - Fix whitespace error
> >>>>
> >>>> v2->v3:
> >>>>  - Fixed error return for incompatible endpoint
> >>>>  - __u64 changed to __le64 in header file
> >>>>
> >>>>  drivers/iommu/virtio-iommu.c  | 48 ---
> >>>>  include/uapi/linux/virtio_iommu.h |  7 +
> >>>>  2 files changed, 51 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/iommu/virtio-iommu.c
> >>>> b/drivers/iommu/virtio-iommu.c index d5cac4f46ca5..9513d2ab819e 100644
> >>>> --- a/drivers/iommu/virtio-iommu.c
> >>>> +++ b/drivers/iommu/virtio-iommu.c
> >>>> @@ -78,6 +78,7 @@ struct viommu_endpoint {
> >>>>  struct viommu_dev   *viommu;
> >>>>  struct viommu_domain*vdomain;
> >>>>  struct list_headresv_regions;
> >>>> +u64 pgsize_bitmap;
> >>>>  };
> >>>>
> >>>>  struct viommu_request {
> >>>> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct
> >>> viommu_domain *vdomain)
> >>>>  return ret;
> >>>>  }
> >>>>
> >>>> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> >>>> +struct 
> >>>> virtio_iommu_probe_pgsize_mask *mask,
> >>>> +size_t len)
> >>>> +{
> >>>> +u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> >>>> +
> >>>> +if (len < sizeof(*mask))
> >>>
> >>> This is too late to validate length, you have dereferenced it already.
> >>> do it before the read pls.
> >>
> >> Yes, Will change here and other places as well
> >>
> >>>
> >>>> +return -EINVAL;
> >>>
> >>> OK but note that guest will then just proceed to ignore the property. Is 
> >>> that really
> >>> OK? Wouldn't host want to know?
> >>
> >>
> >> Guest need to be in sync with device, so yes seems like guest need to tell 
> >> device which page-size-mask it is using.
> >>
> >> Corresponding spec change patch 
> >> (https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06214.html)
> >>
> >> Would like Jean/Eric to comment here as well.
> >>
> >>>
> >>>
> >>>> +
> >>>> +vdev->pgsize_bitmap = pgsize_bitmap;
> >>>
> >>> what if bitmap is 0? Is that a valid size? I see a bunch of BUG_ON with 
> >>> that value ...
> >>
> >> As per spec proposed device is supposed to set at-least one bit.
> >> Will add a bug_on her.
> > 
> > Or better fail probe ...
> Yes I agree I would rather fail the probe.
> > 
> >> Should we add bug_on or switch to global config page-size mas

Re: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2020 at 11:24:29AM +, Bharat Bhushan wrote:
> 
> 
> > -Original Message-
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, May 6, 2020 5:53 AM
> > To: Bharat Bhushan 
> > Cc: jean-phili...@linaro.org; j...@8bytes.org; jasow...@redhat.com;
> > virtualizat...@lists.linux-foundation.org; iommu@lists.linux-foundation.org;
> > linux-ker...@vger.kernel.org; eric.auger@gmail.com; 
> > eric.au...@redhat.com
> > Subject: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported 
> > by
> > endpoint
> > 
> > External Email
> > 
> > --
> > On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> > > Different endpoint can support different page size, probe endpoint if
> > > it supports specific page size otherwise use global page sizes.
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > > v4->v5:
> > >  - Rebase to Linux v5.7-rc4
> > >
> > > v3->v4:
> > >  - Fix whitespace error
> > >
> > > v2->v3:
> > >  - Fixed error return for incompatible endpoint
> > >  - __u64 changed to __le64 in header file
> > >
> > >  drivers/iommu/virtio-iommu.c  | 48 ---
> > >  include/uapi/linux/virtio_iommu.h |  7 +
> > >  2 files changed, 51 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/iommu/virtio-iommu.c
> > > b/drivers/iommu/virtio-iommu.c index d5cac4f46ca5..9513d2ab819e 100644
> > > --- a/drivers/iommu/virtio-iommu.c
> > > +++ b/drivers/iommu/virtio-iommu.c
> > > @@ -78,6 +78,7 @@ struct viommu_endpoint {
> > >   struct viommu_dev   *viommu;
> > >   struct viommu_domain*vdomain;
> > >   struct list_headresv_regions;
> > > + u64 pgsize_bitmap;
> > >  };
> > >
> > >  struct viommu_request {
> > > @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct
> > viommu_domain *vdomain)
> > >   return ret;
> > >  }
> > >
> > > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> > > + struct virtio_iommu_probe_pgsize_mask *mask,
> > > + size_t len)
> > > +{
> > > + u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> > > +
> > > + if (len < sizeof(*mask))
> > 
> > This is too late to validate length, you have dereferenced it already.
> > do it before the read pls.
> 
> Yes, Will change here and other places as well
> 
> > 
> > > + return -EINVAL;
> > 
> > OK but note that guest will then just proceed to ignore the property. Is 
> > that really
> > OK? Wouldn't host want to know?
> 
> 
> Guest need to be in sync with device, so yes seems like guest need to tell 
> device which page-size-mask it is using.
> 
> Corresponding spec change patch 
> (https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06214.html)
> 
> Would like Jean/Eric to comment here as well.
> 
> > 
> > 
> > > +
> > > + vdev->pgsize_bitmap = pgsize_bitmap;
> > 
> > what if bitmap is 0? Is that a valid size? I see a bunch of BUG_ON with 
> > that value ...
> 
> As per spec proposed device is supposed to set at-least one bit.
> Will add a bug_on her.

Or better fail probe ...

> Should we add bug_on or switch to global config page-size mask if this is 
> zero (notify device which page-size-mask it is using).

It's a spec violation, I wouldn't try to use the device.

> > 
> > I also see a bunch of code like e.g. this:
> > 
> > pg_size = 1UL << __ffs(pgsize_bitmap);
> > 
> > which probably won't DTRT on a 32 bit guest if the bitmap has bits set in 
> > the high
> > word.
> > 
> 
> My thought is that in that case viommu_domain_finalise() will fail, do not 
> proceed.

That's undefined behaviour in C. You need to make sure this condition
is never reached. And spec does not make this illegal at all
so it looks like we actually need to handle this gracefully.


> > 
> > 
> > > + return 0;
> > > +}
> > > +
> > >  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> > >  struct virtio_iommu_probe_resv_mem *mem,
> > >  size_t len)
> > > @@ -499,6 +513,9 @@ static int viommu_pr

Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-05 Thread Michael S. Tsirkin
On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> v4->v5:
>  - Rebase to Linux v5.7-rc4
> 
> v3->v4:
>  - Fix whitespace error
> 
> v2->v3:
>  - Fixed error return for incompatible endpoint
>  - __u64 changed to __le64 in header file
> 
>  drivers/iommu/virtio-iommu.c  | 48 ---
>  include/uapi/linux/virtio_iommu.h |  7 +
>  2 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index d5cac4f46ca5..9513d2ab819e 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask,
> + size_t len)
> +{
> + u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> +
> + if (len < sizeof(*mask))

This is too late to validate length, you have dereferenced it already.
do it before the read pls.

> + return -EINVAL;

OK but note that guest will then just proceed to ignore the
property. Is that really OK? Wouldn't host want to know?


> +
> + vdev->pgsize_bitmap = pgsize_bitmap;

what if bitmap is 0? Is that a valid size? I see a bunch of
BUG_ON with that value ...

I also see a bunch of code like e.g. this:

pg_size = 1UL << __ffs(pgsize_bitmap);

which probably won't DTRT on a 32 bit guest if the bitmap has bits
set in the high word.



> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -499,6 +513,9 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }
> @@ -630,7 +647,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>  
>   vdomain->id = (unsigned int)ret;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
>   vdomain->map_flags  = viommu->map_flags;
> @@ -654,6 +671,29 @@ static void viommu_domain_free(struct iommu_domain 
> *domain)
>   kfree(vdomain);
>  }
>  
> +/*
> + * Check whether the endpoint's capabilities are compatible with other
> + * endpoints in the domain. Report any inconsistency.
> + */
> +static bool viommu_endpoint_is_compatible(struct viommu_endpoint *vdev,
> +   struct viommu_domain *vdomain)
> +{
> + struct device *dev = vdev->dev;
> +
> + if (vdomain->viommu != vdev->viommu) {
> + dev_err(dev, "cannot attach to foreign vIOMMU\n");
> + return false;
> + }
> +
> + if (vdomain->domain.pgsize_bitmap != vdev->pgsize_bitmap) {
> + dev_err(dev, "incompatible domain bitmap 0x%lx != 0x%llx\n",
> + vdomain->domain.pgsize_bitmap, vdev->pgsize_bitmap);
> + return false;
> + }

I'm confused by this. So let's assume host supports pages sizes
of 4k, 2M, 1G. It signals this in the properties. Nice.
Now domain supports 4k, 2M and that's all. Why is that a problem?
Just don't use 1G ...


> +
> + return true;
> +}
> +
>  static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>   int i;
> @@ -670,9 +710,8 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
> struct device *dev)
>* owns it.
>*/
>   ret = viommu_domain_finalise(vdev, domain);
> - } else if (vdomain->viommu != vdev->viommu) {
> - dev_err(dev, "cannot attach to foreign vIOMMU\n");
> - ret = -EXDEV;
> + } else if (!viommu_endpoint_is_compatible(vdev, vdomain)) {
> + ret = -EINVAL;
>   }
>   mutex_unlock(>mutex);
>  
> @@ -886,6 +925,7 @@ static int 

Re: [RFC/PATCH 0/1] virtio_mmio: hypervisor specific interfaces for MMIO

2020-04-30 Thread Michael S. Tsirkin
On Thu, Apr 30, 2020 at 03:32:55PM +0530, Srivatsa Vaddagiri wrote:
> The Type-1 hypervisor we are dealing with does not allow for MMIO transport. 

How about PCI then?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:
> On 29.04.20 12:20, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > > That would still not work I think where swiotlb is used for pass-thr 
> > > devices
> > > (when private memory is fine) as well as virtio devices (when shared 
> > > memory is
> > > required).
> > 
> > So that is a separate question. When there are multiple untrusted
> > devices, at the moment it looks like a single bounce buffer is used.
> > 
> > Which to me seems like a security problem, I think we should protect
> > untrusted devices from each other.
> > 
> 
> Definitely. That's the model we have for ivshmem-virtio as well.
> 
> Jan

Want to try implementing that?

> -- 
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> That would still not work I think where swiotlb is used for pass-thr devices
> (when private memory is fine) as well as virtio devices (when shared memory is
> required).

So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.





> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 03:14:10PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin  [2020-04-29 02:50:41]:
> 
> > So it seems that with modern Linux, all one needs
> > to do on x86 is mark the device as untrusted.
> > It's already possible to do this with ACPI and with OF - would that be
> > sufficient for achieving what this patchset is trying to do?
> 
> In my case, its not sufficient to just mark virtio device untrusted and thus
> activate the use of swiotlb. All of the secondary VM memory, including those
> allocate by swiotlb driver, is private to it.

So why not make the bounce buffer memory shared then?

> An additional piece of memory is
> available to secondary VM which is shared between VMs and which is where I 
> need
> swiotlb driver to do its work.
> 
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote:
> On 2020/4/29 12:57, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> > > On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > > > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > > > > 
> > > > > > Okay, but how is all this virtio specific?  For example, why not 
> > > > > > allow
> > > > > > separate swiotlbs for any type of device?
> > > > > > For example, this might make sense if a given device is from a
> > > > > > different, less trusted vendor.
> > > > > Is swiotlb commonly used for multiple devices that may be on 
> > > > > different trust
> > > > > boundaries (and not behind a hardware iommu)?
> > > > Even a hardware iommu does not imply a 100% security from malicious
> > > > hardware. First lots of people use iommu=pt for performance reasons.
> > > > Second even without pt, unmaps are often batched, and sub-page buffers
> > > > might be used for DMA, so we are not 100% protected at all times.
> > > > 
> > > 
> > > For untrusted devices, IOMMU is forced on even iommu=pt is used;
> > 
> > I think you are talking about untrusted *drivers* like with VFIO.
> 
> No. I am talking about untrusted devices like thunderbolt peripherals.
> We always trust drivers hosted in kernel and the DMA APIs are designed
> for them, right?
> 
> Please refer to this series.
> 
> https://lkml.org/lkml/2019/9/6/39
> 
> Best regards,
> baolu

Oh, thanks for that! I didn't realize Linux is doing this.

So it seems that with modern Linux, all one needs
to do on x86 is mark the device as untrusted.
It's already possible to do this with ACPI and with OF - would that be
sufficient for achieving what this patchset is trying to do?

Adding more ways to mark a device as untrusted, and adding
support for more platforms to use bounce buffers
sounds like a reasonable thing to do.

> > 
> > On the other hand, I am talking about things like thunderbolt
> > peripherals being less trusted than on-board ones.
> 
> 
> 
> > 
> > Or possibly even using swiotlb for specific use-cases where
> > speed is less of an issue.
> > 
> > E.g. my wifi is pretty slow anyway, and that card is exposed to
> > malicious actors all the time, put just that behind swiotlb
> > for security, and leave my graphics card with pt since
> > I'm trusting it with secrets anyway.
> > 
> > 
> > > and
> > > iotlb flush is in strict mode (no batched flushes); ATS is also not
> > > allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> > > only apply page granularity protection. Swiotlb is now used for devices
> > > from different trust zone.
> > > 
> > > Best regards,
> > > baolu
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > > 
> > > > Okay, but how is all this virtio specific?  For example, why not allow
> > > > separate swiotlbs for any type of device?
> > > > For example, this might make sense if a given device is from a
> > > > different, less trusted vendor.
> > > Is swiotlb commonly used for multiple devices that may be on different 
> > > trust
> > > boundaries (and not behind a hardware iommu)?
> > Even a hardware iommu does not imply a 100% security from malicious
> > hardware. First lots of people use iommu=pt for performance reasons.
> > Second even without pt, unmaps are often batched, and sub-page buffers
> > might be used for DMA, so we are not 100% protected at all times.
> > 
> 
> For untrusted devices, IOMMU is forced on even iommu=pt is used;

I think you are talking about untrusted *drivers* like with VFIO.

On the other hand, I am talking about things like thunderbolt
peripherals being less trusted than on-board ones.

Or possibly even using swiotlb for specific use-cases where
speed is less of an issue.

E.g. my wifi is pretty slow anyway, and that card is exposed to
malicious actors all the time, put just that behind swiotlb
for security, and leave my graphics card with pt since
I'm trusting it with secrets anyway.


> and
> iotlb flush is in strict mode (no batched flushes); ATS is also not
> allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> only apply page granularity protection. Swiotlb is now used for devices
> from different trust zone.
> 
> Best regards,
> baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> 
> > Okay, but how is all this virtio specific?  For example, why not allow
> > separate swiotlbs for any type of device?
> > For example, this might make sense if a given device is from a
> > different, less trusted vendor.
> 
> Is swiotlb commonly used for multiple devices that may be on different trust
> boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.


> If so, then yes it sounds like a
> good application of multiple swiotlb pools.
> 
> > All this can then maybe be hidden behind the DMA API.
> 
> Won't we still need some changes to virtio to make use of its own pool (to
> bounce buffers)? Something similar to its own DMA ops proposed in this patch?

If you are doing this for all devices, you need to either find a way
to do this without chaning DMA ops, or by doing some automatic change
to all drivers.


> > > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> > > +{
> > > + if (!bounce_buf_paddr)
> > > + return;
> > > +
> > > + set_dma_ops(vdev->dev.parent, _dma_ops);
> > 
> > 
> > I don't think DMA API maintainers will be happy with new users
> > of set_dma_ops.
> 
> Is there an alternate API that is more preffered?

all this is supposed to be part of DMA API itself. new drivers aren't
supposed to have custom DMA ops.

> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 05:09:18PM +0530, Srivatsa Vaddagiri wrote:
> For better security, its desirable that a guest VM's memory is
> not accessible to any entity that executes outside the context of
> guest VM. In case of virtio, backend drivers execute outside the
> context of guest VM and in general will need access to complete
> guest VM memory.  One option to restrict the access provided to
> backend driver is to make use of a bounce buffer. The bounce
> buffer is accessible to both backend and frontend drivers. All IO
> buffers that are in private space of guest VM are bounced to be
> accessible to backend.
> 
> This patch proposes a new memory pool to be used for this bounce
> purpose, rather than the default swiotlb memory pool. That will
> avoid any conflicts that may arise in situations where a VM needs
> to use swiotlb pool for driving any pass-through devices (in
> which case swiotlb memory needs not be shared with another VM) as
> well as virtio devices (which will require swiotlb memory to be
> shared with backend VM). As a possible extension to this patch,
> we can provide an option for virtio to make use of default
> swiotlb memory pool itself, where no such conflicts may exist in
> a given deployment.
> 
> Signed-off-by: Srivatsa Vaddagiri 


Okay, but how is all this virtio specific?  For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.
All this can then maybe be hidden behind the DMA API.



> ---
>  drivers/virtio/Makefile|   2 +-
>  drivers/virtio/virtio.c|   2 +
>  drivers/virtio/virtio_bounce.c | 150 
> +
>  include/linux/virtio.h |   4 ++
>  4 files changed, 157 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/virtio/virtio_bounce.c
> 
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 29a1386e..3fd3515 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
> +obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32..bc2f779 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
>  
>   dev->index = err;
>   dev_set_name(>dev, "virtio%u", dev->index);
> + virtio_bounce_set_dma_ops(dev);
>  
>   spin_lock_init(>config_lock);
>   dev->config_enabled = false;
> @@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore);
>  
>  static int virtio_init(void)
>  {
> + virtio_map_bounce_buffer();
>   if (bus_register(_bus) != 0)
>   panic("virtio bus registration failed");
>   return 0;
> diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
> new file mode 100644
> index 000..3de8e0e
> --- /dev/null
> +++ b/drivers/virtio/virtio_bounce.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Virtio DMA ops to bounce buffers
> + *
> + *  Copyright (c) 2020, The Linux Foundation. All rights reserved.
> + *
> + * This module allows bouncing of IO buffers to a region which will be
> + * accessible to backend drivers.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static phys_addr_t bounce_buf_paddr;
> +static void *bounce_buf_vaddr;
> +static size_t bounce_buf_size;
> +struct swiotlb_pool *virtio_pool;
> +
> +#define VIRTIO_MAX_BOUNCE_SIZE   (16*4096)
> +
> +static void *virtio_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs)
> +{
> + phys_addr_t addr;
> +
> + if (!virtio_pool)
> + return NULL;
> +
> + addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX);
> + if (addr == DMA_MAPPING_ERROR)
> + return NULL;
> +
> + *dma_handle = (addr - bounce_buf_paddr);
> +
> + return bounce_buf_vaddr + (addr - bounce_buf_paddr);
> +}
> +
> +static void virtio_free_coherent(struct device *dev, size_t size, void 
> *vaddr,
> + dma_addr_t dma_handle, unsigned long attrs)
> +{
> + phys_addr_t addr = (dma_handle + bounce_buf_paddr);
> +
> + swiotlb_free(virtio_pool, addr, size);
> +}
> +
> +static dma_addr_t virtio_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir, unsigned long attrs)
> +{
> + void *ptr = page_address(page) + offset;
> + phys_addr_t paddr = virt_to_phys(ptr);
> + dma_addr_t handle;
> +
> + if (!virtio_pool)
> + 

Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-04-13 Thread Michael S. Tsirkin
On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 237e36a280cb..ec57d215086a 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -16,6 +16,7 @@
>  #define VIRTIO_IOMMU_F_BYPASS3
>  #define VIRTIO_IOMMU_F_PROBE 4
>  #define VIRTIO_IOMMU_F_MMIO  5
> +#define VIRTIO_IOMMU_F_TOPOLOGY  6
>  
>  struct virtio_iommu_range_64 {
>   __le64  start;
> @@ -27,6 +28,12 @@ struct virtio_iommu_range_32 {
>   __le32  end;
>  };
>  
> +struct virtio_iommu_topo_config {
> + __le32  offset;

Any restrictions on offset? E.g. alignment?

> + __le32  num_items;
> + __le32  item_length;
> +};
> +
>  struct virtio_iommu_config {
>   /* Supported page sizes */
>   __le64  page_size_mask;
> @@ -36,6 +43,25 @@ struct virtio_iommu_config {
>   struct virtio_iommu_range_32domain_range;
>   /* Probe buffer size */
>   __le32  probe_size;
> + struct virtio_iommu_topo_config topo_config;
> +};
> +
> +#define VIRTIO_IOMMU_TOPO_PCI_RANGE  0x1
> +#define VIRTIO_IOMMU_TOPO_ENDPOINT   0x2
> +
> +struct virtio_iommu_topo_pci_range {
> + __le16  type;
> + __le16  hierarchy;
> + __le16  requester_start;
> + __le16  requester_end;
> + __le32  endpoint_start;
> +};
> +
> +struct virtio_iommu_topo_endpoint {
> + __le16  type;
> + __le16  reserved;
> + __le32  endpoint;
> + __le64  address;
>  };
>  
>  /* Request types */

As any UAPI change, this needs to be copied to virtio TC.

I believe an old version of QEMU patches was published there
but I don't think it was the latest one you tested against.

Description should preferably be added to spec too.

In partucular please add comments (in this header, too)
documenting the new fields, values and structures.

> -- 
> 2.25.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-11 Thread Michael S. Tsirkin
On Wed, Mar 11, 2020 at 06:48:22PM +0100, Jean-Philippe Brucker wrote:
> Yes "not elegant" in part because of the PCI fixup. Fixups are used to
> work around bugs

Not really - they are for anything unusual that common PCI code can not
handle on its own.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-05 Thread Michael S. Tsirkin
On Wed, Mar 04, 2020 at 10:54:49PM +0100, Joerg Roedel wrote:
> On Wed, Mar 04, 2020 at 01:37:44PM -0800, Jacob Pan (Jun) wrote:
> > + Mike and Erik who work closely on UEFI and ACPICA.
> > 
> > Copy paste Erik's initial response below on how to get a new table,
> > seems to confirm with the process you stated above.
> > 
> > "Fairly easy. You reserve a 4-letter symbol by sending a message
> > requesting to reserve the signature to Mike or the ASWG mailing list or
> > i...@acpi.info
> 
> Great! I think this is going to be the path forward here.
> 
> Regards,
> 
>   Joerg

I don't, I don't see why we should bother with ACPI. This is a PV device
anyway, we can just make it self-describing. The need for extra firmware
on some platforms is a bug not a feature. No point in emulating that.

> > 
> > There is also another option. You can have ASWG own this new table so
> > that not one entity or company owns the new table."

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-05 Thread Michael S. Tsirkin
On Wed, Mar 04, 2020 at 10:50:02PM +0100, Joerg Roedel wrote:
> On Wed, Mar 04, 2020 at 02:34:33PM -0500, Michael S. Tsirkin wrote:
> > All these extra levels of indirection is one of the reasons
> > hypervisors such as kata try to avoid ACPI.
> 
> Platforms that don't use ACPI need another hardware detection mechanism,
> which can also be supported. But the first step here is to enable the
> general case, and for x86 platforms this means ACPI.
> 
> Regards,
> 
>   Joerg

Frankly since a portable way to do it is needed anyway I don't see why
we also need ACPI.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-04 Thread Michael S. Tsirkin
On Wed, Mar 04, 2020 at 02:37:08PM +0100, Joerg Roedel wrote:
> Hi Michael,
> 
> On Tue, Mar 03, 2020 at 11:09:41AM -0500, Michael S. Tsirkin wrote:
> > No. It's coded into the hardware. Which might even be practical
> > for bare-metal (e.g. on-board flash), but is very practical
> > when the device is part of a hypervisor.
> 
> If its that way on PPC, than fine for them. But since this is enablement
> for x86, it should follow the x86 platform best practices, and that
> means describing hardware through ACPI.

For hardware, sure.  Hypervisors aren't hardware
though and a bunch of hypervisors don't use ACPI.


> > This "hardware" is actually part of hypervisor so there's no
> > reason it can't be completely self-descriptive. It's specified
> > by the same entity as the "firmware".
> 
> That is just an implementation detail. Yes, QEMU emulates the hardware
> and builds the ACPI tables. But it could also be implemented in a way
> where the ACPI tables are build by guest firmware.

All these extra levels of indirection is one of the reasons
hypervisors such as kata try to avoid ACPI.

> > I don't see why it would be much faster. The interface isn't that
> > different from command queues of VTD.
> 
> VirtIO IOMMU doesn't need to build page-tables that the hypervisor then
> has to shadow, which makes things much faster. If you emulate one of the
> other IOMMUs (VT-d or AMD-Vi) the code has to shadow the full page-table
> at once when device passthrough is used. VirtIO-IOMMU doesn't need that,
> and that makes it much faster and efficient.


IIUC VT-d at least supports range invalidations.

> 
> > Making ACPI meet the goals of embedded projects such as kata containers
> > would be a gigantic task with huge stability implications.  By
> > comparison this 400-line parser is well contained and does the job.  I
> > didn't yet see compelling reasons not to merge this, but I'll be
> > interested to see some more specific concerns.
> 
> An ACPI table parse wouldn't need more lines of code.

It realies on ACPI OSPM itself to handle ACPI bytecode, which is huge.


> For embedded
> systems there is still the DT way of describing things.

For some embedded systems.

> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Michael S. Tsirkin
On Tue, Mar 03, 2020 at 04:53:19PM +0100, Joerg Roedel wrote:
> On Tue, Mar 03, 2020 at 09:00:05AM -0500, Michael S. Tsirkin wrote:
> > Not necessarily. E.g. some power systems have neither.
> > There are also systems looking to bypass ACPI e.g. for boot speed.
> 
> If there is no firmware layer between the hardware and the OS the
> necessary information the OS needs to run on the hardware is probably
> hard-coded into the kernel?

No. It's coded into the hardware. Which might even be practical
for bare-metal (e.g. on-board flash), but is very practical
when the device is part of a hypervisor.

> In that case the same can be done with
> virtio-iommu tolopology.
> 
> > That sentence doesn't really answer the question, does it?
> 
> To be more elaborate, putting this information into config space is a
> layering violation. Hardware is never completly self-descriptive


This "hardware" is actually part of hypervisor so there's no
reason it can't be completely self-descriptive. It's specified
by the same entity as the "firmware".


> and
> that is why there is the firmware which provides the information about
> the hardware to the OS in a generic way.
>
> > Frankly with platform specific interfaces like ACPI, virtio-iommu is
> > much less compelling.  Describing topology as part of the device in a
> > way that is first, portable, and second, is a good fit for hypervisors,
> > is to me one of the main reasons virtio-iommu makes sense at all.
> 
> Virtio-IOMMU makes sense in the first place because it is much faster
> than emulating one of the hardware IOMMUs.

I don't see why it would be much faster. The interface isn't that
different from command queues of VTD.

> And an ACPI table is also
> portable to all ACPI platforms, same with device-tree.
> 
> Regards,
> 
>   Joerg

Making ACPI meet the goals of embedded projects such as kata containers
would be a gigantic task with huge stability implications.  By
comparison this 400-line parser is well contained and does the job.  I
didn't yet see compelling reasons not to merge this, but I'll be
interested to see some more specific concerns.


-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Michael S. Tsirkin
On Mon, Mar 02, 2020 at 05:16:12PM +0100, Joerg Roedel wrote:
> On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
> > This solution isn't elegant nor foolproof, but is the best we can do at
> > the moment and works with existing virtio-iommu implementations. It also
> > enables an IOMMU for lightweight hypervisors that do not rely on
> > firmware methods for booting.
> 
> I appreciate the enablement on x86, but putting the conmfiguration into
> mmio-space isn't really something I want to see upstream.

It's in virtio config space, not in mmio-space. With a PCI virtio-IOMMU
device this will be in memory.

> What is the
> problem with defining an ACPI table instead? This would also make things
> work on AARCH64 UEFI machines.
> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Michael S. Tsirkin
On Tue, Mar 03, 2020 at 02:01:56PM +0100, Joerg Roedel wrote:
> Hi Eric,
> 
> On Tue, Mar 03, 2020 at 11:19:20AM +0100, Auger Eric wrote:
> > Michael has pushed this solution (putting the "configuration in the PCI
> > config space"), I think for those main reasons:
> > - ACPI may not be supported on some archs/hyps
> 
> But on those there is device-tree, right?

Not necessarily. E.g. some power systems have neither.
There are also systems looking to bypass ACPI e.g. for boot speed.


> > - the virtio-iommu is a PCIe device so binding should not need ACPI
> > description
> 
> The other x86 IOMMUs are PCI devices too and they definitly need a ACPI
> table to be configured.
> 
> > Another issue with ACPI integration is we have different flavours of
> > tables that exist: IORT, DMAR, IVRS
> 
> An integration with IORT might be the best, but if it is not possible
> ther can be a new table-type for Virtio-iommu. That would still follow
> platform best practices.
> 
> > x86 ACPI integration was suggested with IORT. But it seems ARM is
> > reluctant to extend IORT to support para-virtualized IOMMU. So the VIOT
> > was proposed as a different atternative in "[RFC 00/13] virtio-iommu on
> > non-devicetree platforms"
> > (https://patchwork.kernel.org/cover/11257727/). Proposing a table that
> > may be used by different vendors seems to be a challenging issue here.
> 
> Yeah, if I am reading that right this proposes a one-fits-all solution.
> That is not needed as the other x86 IOMMUs already have their tables
> defined and implemented. There is no need to change anything there.
> 
> > So even if the above solution does not look perfect, it looked a
> > sensible compromise given the above arguments. Please could you explain
> > what are the most compelling arguments against it?
> 
> It is a hack and should be avoided if at all possible.

That sentence doesn't really answer the question, does it?

> And defining an
> own ACPI table type seems very much possible.

Frankly with platform specific interfaces like ACPI, virtio-iommu is
much less compelling.  Describing topology as part of the device in a
way that is first, portable, and second, is a good fit for hypervisors,
is to me one of the main reasons virtio-iommu makes sense at all.

> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-01 Thread Michael S. Tsirkin
On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
> Platforms without device-tree do not currently have a method for
> describing the vIOMMU topology. Provide a topology description embedded
> into the virtio device.
> 
> Use PCI FIXUP to probe the config space early, because we need to
> discover the topology before any DMA configuration takes place, and the
> virtio driver may be loaded much later. Since we discover the topology
> description when probing the PCI hierarchy, the virtual IOMMU cannot
> manage other platform devices discovered earlier.
> 
> This solution isn't elegant nor foolproof, but is the best we can do at
> the moment and works with existing virtio-iommu implementations. It also
> enables an IOMMU for lightweight hypervisors that do not rely on
> firmware methods for booting.
> 
> Signed-off-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  MAINTAINERS   |   2 +
>  drivers/iommu/Kconfig |  10 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu-topology.c | 343 ++
>  drivers/iommu/virtio-iommu.c  |   3 +
>  include/linux/virt_iommu.h|  19 ++
>  include/uapi/linux/virtio_iommu.h |  26 ++
>  7 files changed, 404 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virt_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fcd79fc38928..65a03ce53096 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17781,6 +17781,8 @@ M:Jean-Philippe Brucker 
>  L:   virtualizat...@lists.linux-foundation.org
>  S:   Maintained
>  F:   drivers/iommu/virtio-iommu.c
> +F:   drivers/iommu/virtio-iommu-topology.c
> +F:   include/linux/virt_iommu.h
>  F:   include/uapi/linux/virtio_iommu.h
>  
>  VIRTUAL BOX GUEST DEVICE DRIVER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index c5df570ef84a..f8cb45d84bb0 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -516,4 +516,14 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY
> + bool "Topology properties for the virtio-iommu"
> + depends on VIRTIO_IOMMU
> + default y
> + help
> +   Enable early probing of the virtio-iommu device, to detect the
> +   built-in topology description.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 9f33fdb3bb05..5da24280d08c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -37,3 +37,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
> diff --git a/drivers/iommu/virtio-iommu-topology.c 
> b/drivers/iommu/virtio-iommu-topology.c
> new file mode 100644
> index ..2188624ef216
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu-topology.c
> @@ -0,0 +1,343 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct viommu_cap_config {
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */
> +};
> +
> +union viommu_topo_cfg {
> + __le16  type;
> + struct virtio_iommu_topo_pci_range  pci;
> + struct virtio_iommu_topo_endpoint   ep;
> +};
> +
> +struct viommu_spec {
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headlist;
> + size_t  num_items;
> + /* The config array of length num_items follows */
> + union viommu_topo_cfg   cfg[];
> +};
> +
> +static LIST_HEAD(viommus);
> +static DEFINE_MUTEX(viommus_lock);
> +
> +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field)
> +
> +static inline int viommu_pci_find_capability(struct pci_dev *dev, u8 
> cfg_type,
> +  struct viommu_cap_config *cap)
> +{
> + int pos;
> + u8 bar;
> +
> + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +  pos > 0;
> +  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> + u8 type;
> +
> + pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), );
> + if (type != cfg_type)
> + continue;
> +
> + pci_read_config_byte(dev, pos + VPCI_FIELD(bar), );
> +
> + /* Ignore structures with reserved BAR values */
> +   

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-24 Thread Michael S. Tsirkin
On Mon, Feb 24, 2020 at 05:26:20PM +0800, Jason Wang wrote:
> 
> On 2020/2/24 下午3:48, Michael S. Tsirkin wrote:
> > On Mon, Feb 24, 2020 at 02:45:03PM +0800, Jason Wang wrote:
> > > On 2020/2/24 下午2:06, Michael S. Tsirkin wrote:
> > > > On Mon, Feb 24, 2020 at 12:01:57PM +0800, Jason Wang wrote:
> > > > > On 2020/2/21 下午10:56, Halil Pasic wrote:
> > > > > > On Fri, 21 Feb 2020 14:22:26 +0800
> > > > > > Jason Wang  wrote:
> > > > > > 
> > > > > > > On 2020/2/21 上午12:06, Halil Pasic wrote:
> > > > > > > > Currently if one intends to run a memory protection enabled VM 
> > > > > > > > with
> > > > > > > > virtio devices and linux as the guest OS, one needs to specify 
> > > > > > > > the
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the 
> > > > > > > > guest
> > > > > > > > linux use the DMA API, which in turn handles the memory
> > > > > > > > encryption/protection stuff if the guest decides to turn itself 
> > > > > > > > into
> > > > > > > > a protected one. This however makes no sense due to multiple 
> > > > > > > > reasons:
> > > > > > > > * The device is not changed by the fact that the guest RAM is
> > > > > > > > protected. The so called IOMMU bypass quirk is not affected.
> > > > > > > > * This usage is not congruent with  standardised semantics of
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an 
> > > > > > > > orthogonal reason
> > > > > > > > for using DMA API in virtio (orthogonal with respect to what is
> > > > > > > > expressed by VIRTIO_F_IOMMU_PLATFORM).
> > > > > > > > 
> > > > > > > > This series aims to decouple 'have to use DMA API because my 
> > > > > > > > (guest) RAM
> > > > > > > > is protected' and 'have to use DMA API because the device told 
> > > > > > > > me
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM'.
> > > > > > > > 
> > > > > > > > Please find more detailed explanations about the conceptual 
> > > > > > > > aspects in
> > > > > > > > the individual patches. There is however also a very practical 
> > > > > > > > problem
> > > > > > > > that is addressed by this series.
> > > > > > > > 
> > > > > > > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the 
> > > > > > > > following side
> > > > > > > > effect The vhost code assumes it the addresses on the virtio 
> > > > > > > > descriptor
> > > > > > > > ring are not guest physical addresses but iova's, and insists 
> > > > > > > > on doing a
> > > > > > > > translation of these regardless of what transport is used (e.g. 
> > > > > > > > whether
> > > > > > > > we emulate a PCI or a CCW device). (For details see commit 
> > > > > > > > 6b1e6cc7855b
> > > > > > > > "vhost: new device IOTLB API".) On s390 this results in severe
> > > > > > > > performance degradation (c.a. factor 10).
> > > > > > > Do you see a consistent degradation on the performance, or it only
> > > > > > > happen when for during the beginning of the test?
> > > > > > > 
> > > > > > AFAIK the degradation is consistent.
> > > > > > 
> > > > > > > > BTW with ccw I/O there is
> > > > > > > > (architecturally) no IOMMU, so the whole address translation 
> > > > > > > > makes no
> > > > > > > > sense in the context of virtio-ccw.
> > > > > > > I suspect we can do optimization in qemu side.
> > > > > > > 
> > > > > > > E.g send memtable entry via IOTLB API when vIOMMU is not enabled.
> > > > > > > 
> > > > > > > If this makes sense, I can draft patch to see if there's any 
> > > > > > > difference.
> > > > > > Frankly I would

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-23 Thread Michael S. Tsirkin
On Mon, Feb 24, 2020 at 02:45:03PM +0800, Jason Wang wrote:
> 
> On 2020/2/24 下午2:06, Michael S. Tsirkin wrote:
> > On Mon, Feb 24, 2020 at 12:01:57PM +0800, Jason Wang wrote:
> > > On 2020/2/21 下午10:56, Halil Pasic wrote:
> > > > On Fri, 21 Feb 2020 14:22:26 +0800
> > > > Jason Wang  wrote:
> > > > 
> > > > > On 2020/2/21 上午12:06, Halil Pasic wrote:
> > > > > > Currently if one intends to run a memory protection enabled VM with
> > > > > > virtio devices and linux as the guest OS, one needs to specify the
> > > > > > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the 
> > > > > > guest
> > > > > > linux use the DMA API, which in turn handles the memory
> > > > > > encryption/protection stuff if the guest decides to turn itself into
> > > > > > a protected one. This however makes no sense due to multiple 
> > > > > > reasons:
> > > > > > * The device is not changed by the fact that the guest RAM is
> > > > > > protected. The so called IOMMU bypass quirk is not affected.
> > > > > > * This usage is not congruent with  standardised semantics of
> > > > > > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal 
> > > > > > reason
> > > > > > for using DMA API in virtio (orthogonal with respect to what is
> > > > > > expressed by VIRTIO_F_IOMMU_PLATFORM).
> > > > > > 
> > > > > > This series aims to decouple 'have to use DMA API because my 
> > > > > > (guest) RAM
> > > > > > is protected' and 'have to use DMA API because the device told me
> > > > > > VIRTIO_F_IOMMU_PLATFORM'.
> > > > > > 
> > > > > > Please find more detailed explanations about the conceptual aspects 
> > > > > > in
> > > > > > the individual patches. There is however also a very practical 
> > > > > > problem
> > > > > > that is addressed by this series.
> > > > > > 
> > > > > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following 
> > > > > > side
> > > > > > effect The vhost code assumes it the addresses on the virtio 
> > > > > > descriptor
> > > > > > ring are not guest physical addresses but iova's, and insists on 
> > > > > > doing a
> > > > > > translation of these regardless of what transport is used (e.g. 
> > > > > > whether
> > > > > > we emulate a PCI or a CCW device). (For details see commit 
> > > > > > 6b1e6cc7855b
> > > > > > "vhost: new device IOTLB API".) On s390 this results in severe
> > > > > > performance degradation (c.a. factor 10).
> > > > > Do you see a consistent degradation on the performance, or it only
> > > > > happen when for during the beginning of the test?
> > > > > 
> > > > AFAIK the degradation is consistent.
> > > > 
> > > > > > BTW with ccw I/O there is
> > > > > > (architecturally) no IOMMU, so the whole address translation makes 
> > > > > > no
> > > > > > sense in the context of virtio-ccw.
> > > > > I suspect we can do optimization in qemu side.
> > > > > 
> > > > > E.g send memtable entry via IOTLB API when vIOMMU is not enabled.
> > > > > 
> > > > > If this makes sense, I can draft patch to see if there's any 
> > > > > difference.
> > > > Frankly I would prefer to avoid IOVAs on the descriptor ring (and the
> > > > then necessary translation) for virtio-ccw altogether. But Michael
> > > > voiced his opinion that we should mandate F_IOMMU_PLATFORM for devices
> > > > that could be used with guests running in protected mode. I don't share
> > > > his opinion, but that's an ongoing discussion.
> > > > 
> > > > Should we end up having to do translation from IOVA in vhost, we are
> > > > very interested in that translation being fast and efficient.
> > > > 
> > > > In that sense we would be very happy to test any optimization that aim
> > > > into that direction.
> > > > 
> > > > Thank you very much for your input!
> > > 
> > > Using IOTLB API on platform without IOMMU support is not intended.

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-23 Thread Michael S. Tsirkin
On Mon, Feb 24, 2020 at 12:01:57PM +0800, Jason Wang wrote:
> 
> On 2020/2/21 下午10:56, Halil Pasic wrote:
> > On Fri, 21 Feb 2020 14:22:26 +0800
> > Jason Wang  wrote:
> > 
> > > On 2020/2/21 上午12:06, Halil Pasic wrote:
> > > > Currently if one intends to run a memory protection enabled VM with
> > > > virtio devices and linux as the guest OS, one needs to specify the
> > > > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
> > > > linux use the DMA API, which in turn handles the memory
> > > > encryption/protection stuff if the guest decides to turn itself into
> > > > a protected one. This however makes no sense due to multiple reasons:
> > > > * The device is not changed by the fact that the guest RAM is
> > > > protected. The so called IOMMU bypass quirk is not affected.
> > > > * This usage is not congruent with  standardised semantics of
> > > > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> > > > for using DMA API in virtio (orthogonal with respect to what is
> > > > expressed by VIRTIO_F_IOMMU_PLATFORM).
> > > > 
> > > > This series aims to decouple 'have to use DMA API because my (guest) RAM
> > > > is protected' and 'have to use DMA API because the device told me
> > > > VIRTIO_F_IOMMU_PLATFORM'.
> > > > 
> > > > Please find more detailed explanations about the conceptual aspects in
> > > > the individual patches. There is however also a very practical problem
> > > > that is addressed by this series.
> > > > 
> > > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> > > > effect The vhost code assumes it the addresses on the virtio descriptor
> > > > ring are not guest physical addresses but iova's, and insists on doing a
> > > > translation of these regardless of what transport is used (e.g. whether
> > > > we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> > > > "vhost: new device IOTLB API".) On s390 this results in severe
> > > > performance degradation (c.a. factor 10).
> > > 
> > > Do you see a consistent degradation on the performance, or it only
> > > happen when for during the beginning of the test?
> > > 
> > AFAIK the degradation is consistent.
> > 
> > > > BTW with ccw I/O there is
> > > > (architecturally) no IOMMU, so the whole address translation makes no
> > > > sense in the context of virtio-ccw.
> > > 
> > > I suspect we can do optimization in qemu side.
> > > 
> > > E.g send memtable entry via IOTLB API when vIOMMU is not enabled.
> > > 
> > > If this makes sense, I can draft patch to see if there's any difference.
> > Frankly I would prefer to avoid IOVAs on the descriptor ring (and the
> > then necessary translation) for virtio-ccw altogether. But Michael
> > voiced his opinion that we should mandate F_IOMMU_PLATFORM for devices
> > that could be used with guests running in protected mode. I don't share
> > his opinion, but that's an ongoing discussion.
> > 
> > Should we end up having to do translation from IOVA in vhost, we are
> > very interested in that translation being fast and efficient.
> > 
> > In that sense we would be very happy to test any optimization that aim
> > into that direction.
> > 
> > Thank you very much for your input!
> 
> 
> Using IOTLB API on platform without IOMMU support is not intended. Please
> try the attached patch to see if it helps.
> 
> Thanks
> 
> 
> > 
> > Regards,
> > Halil
> > 
> > > Thanks
> > > 
> > > 
> > > > Halil Pasic (2):
> > > > mm: move force_dma_unencrypted() to mem_encrypt.h
> > > > virtio: let virtio use DMA API when guest RAM is protected
> > > > 
> > > >drivers/virtio/virtio_ring.c |  3 +++
> > > >include/linux/dma-direct.h   |  9 -
> > > >include/linux/mem_encrypt.h  | 10 ++
> > > >3 files changed, 13 insertions(+), 9 deletions(-)
> > > > 
> > > > 
> > > > base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2

> >From 66fa730460875ac99e81d7db2334cd16bb1d2b27 Mon Sep 17 00:00:00 2001
> From: Jason Wang 
> Date: Mon, 24 Feb 2020 12:00:10 +0800
> Subject: [PATCH] virtio: turn on IOMMU_PLATFORM properly
> 
> When transport does not support IOMMU, we should clear IOMMU_PLATFORM
> even if the device and vhost claims to support that. This help to
> avoid the performance overhead caused by unnecessary IOTLB miss/update
> transactions on such platform.
> 
> Signed-off-by: Jason Wang 
> ---
>  hw/virtio/virtio-bus.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index d6332d45c3..2741b9fdd2 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -47,7 +47,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
> **errp)
>  VirtioBusState *bus = VIRTIO_BUS(qbus);
>  VirtioBusClass *klass = VIRTIO_BUS_GET_CLASS(bus);
>  VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
> -bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
>  Error *local_err = 

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-22 Thread Michael S. Tsirkin
On Fri, Feb 21, 2020 at 03:33:40PM +0100, Halil Pasic wrote:
> AFAIU you have a positive attitude towards the idea, that 
> !F_VIRTIO_PLATFORM implies 'no DMA API is used by virtio' 
> should be scrapped. 
> 
> I would like to accomplish that without adverse effects to virtio-ccw
> (because caring for virtio-ccw is a part of job description).
> 
> Regards,
> Halil

It is possible, in theory. IIRC the main challenge is that DMA API
has overhead of indirect function calls even when all it
does it return back the PA without changes. So that will lead to
a measureable performance degradation. That might be fixable,
possibly using some kind of logic along the lines of
if (iova is pa)
return pa
else
indirect call

and for unmapping, we might need an API that says "unmap
is a nop, safe to skip" so we don't maintain the
dma address until unmap.


-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-21 Thread Michael S. Tsirkin
On Fri, Feb 21, 2020 at 02:12:30PM +0100, Halil Pasic wrote:
> On Thu, 20 Feb 2020 15:55:14 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> > > Currently the advanced guest memory protection technologies (AMD SEV,
> > > powerpc secure guest technology and s390 Protected VMs) abuse the
> > > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> > > is in turn necessary, to make IO work with guest memory protection.
> > > 
> > > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> > > different beast: with virtio devices whose implementation runs on an SMP
> > > CPU we are still fine with doing all the usual optimizations, it is just
> > > that we need to make sure that the memory protection mechanism does not
> > > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> > > side of the guest (and possibly he host side as well) than we actually
> > > need.
> > > 
> > > An additional benefit of teaching the guest to make the right decision
> > > (and use DMA API) on it's own is: removing the need, to mandate special
> > > VM configuration for guests that may run with protection. This is
> > > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> > > the virtio control structures into the first 2G of guest memory:
> > > something we don't necessarily want to do per-default.
> > > 
> > > Signed-off-by: Halil Pasic 
> > > Tested-by: Ram Pai 
> > > Tested-by: Michael Mueller 
> > 
> > This might work for you but it's fragile, since without
> > VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
> > GPA's, not DMA addresses.
> > 
> 
> Thanks for your constructive approach. I do want the hypervisor to
> assume it gets GPA's. My train of thought was that the guys that need
> to use IOVA's that are not GPA's when force_dma_unencrypted() will have
> to to specify VIRTIO_F_ACCESS_PLATFORM (at the device) anyway, because
> otherwise it won't work. But I see your point: in case of a
> mis-configuration and provided the DMA API returns IOVA's one could end
> up trying to touch wrong memory locations. But this should be similar to
> what would happen if DMA ops are not used, and memory is not made accessible.
> 
> > 
> > 
> > IOW this looks like another iteration of:
> > 
> > virtio: Support encrypted memory on powerpc secure guests
> > 
> > which I was under the impression was abandoned as unnecessary.
> 
> Unnecessary for powerpc because they do normal PCI. In the context of
> CCW there are only guest physical addresses (CCW I/O has no concept of
> IOMMU or IOVAs).
> 
> > 
> > 
> > To summarize, the necessary conditions for a hack along these lines
> > (using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:
> > 
> >   - secure guest mode is enabled - so we know that since we don't share
> > most memory regular virtio code won't
> > work, even though the buggy hypervisor didn't set 
> > VIRTIO_F_ACCESS_PLATFORM
> 
> force_dma_unencrypted(>dev) is IMHO exactly about this.
> 
> >   - DMA API is giving us addresses that are actually also physical
> > addresses
> 
> In case of s390 this is given.
> I talked with the power people before
> posting this, and they ensured me they can are willing to deal with
> this. I was hoping to talk abut this with the AMD SEV people here (hence
> the cc).

We'd need a part of DMA API that promises this though. Platform
maintainers aren't going to go out of their way to do the
right thing just for virtio, and I can't track all arches
to make sure they don't violate virtio requirements.

> 
> >   - Hypervisor is buggy and didn't enable VIRTIO_F_ACCESS_PLATFORM
> > 
> 
> I don't get this point. The argument where the hypervisor is buggy is a
> bit hard to follow for me. If hypervisor is buggy we have already lost
> anyway most of the time, or?

If VIRTIO_F_ACCESS_PLATFORM is set then things just work.  If
VIRTIO_F_ACCESS_PLATFORM is clear device is supposed to have access to
all of memory.  You can argue in various ways but it's easier to just
declare a behaviour that violates this a bug. Which might still be worth
working around, for various reasons.


> > I don't see how this patch does this.
> 
> I do get your point. I don't know of a good way to check that DMA API
> is giving us addresses that are actually physical addresses, and the
> situation you describe definitely has some risk to it.

One way would be to extend the DMA API with such an A

Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-21 Thread Michael S. Tsirkin
On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> On Fri, 21 Feb 2020 14:27:27 +1100
> David Gibson  wrote:
> 
> > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > bounce buffers when they are NEEDED. 
> > > > Forcing the user to specify iommu_platform just because you need bounce 
> > > > buffers
> > > > really feels wrong. And obviously we have a severe performance issue
> > > > because of the indirections.
> > > 
> > > The point is that the user should not have to specify iommu_platform.
> > > We need to make sure any new hypervisor (especially one that might require
> > > bounce buffering) always sets it,
> > 
> > So, I have draft qemu patches which enable iommu_platform by default.
> > But that's really because of other problems with !iommu_platform, not
> > anything to do with bounce buffering or secure VMs.
> > 
> > The thing is that the hypervisor *doesn't* require bounce buffering.
> > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > to know whether the guest needs bounce buffering.  As far as the
> > hypervisor and qemu are concerned that's a guest internal detail, it
> > just expects to get addresses it can access whether those are GPAs
> > (iommu_platform=off) or IOVAs (iommu_platform=on).
> 
> I very much agree!
> 
> > 
> > > as was a rather bogus legacy hack
> > 
> > It was certainly a bad idea, but it was a bad idea that went into a
> > public spec and has been widely deployed for many years.  We can't
> > just pretend it didn't happen and move on.
> > 
> > Turning iommu_platform=on by default breaks old guests, some of which
> > we still care about.  We can't (automatically) do it only for guests
> > that need bounce buffering, because the hypervisor doesn't know that
> > ahead of time.
> 
> Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> because for CCW I/O there is no such thing as IOMMU and the addresses
> are always physical addresses.

Fix the name then. The spec calls is ACCESS_PLATFORM now, which
makes much more sense.

> > 
> > > that isn't extensibe for cases that for example require bounce buffering.
> > 
> > In fact bounce buffering isn't really the issue from the hypervisor
> > (or spec's) point of view.  It's the fact that not all of guest memory
> > is accessible to the hypervisor.  Bounce buffering is just one way the
> > guest might deal with that.
> > 
> 
> Agreed.
> 
> Regards,
> Halil
> 
> 
> 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> effect The vhost code assumes it the addresses on the virtio descriptor
> ring are not guest physical addresses but iova's, and insists on doing a
> translation of these regardless of what transport is used (e.g. whether
> we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> "vhost: new device IOTLB API".) On s390 this results in severe
> performance degradation (c.a. factor 10). BTW with ccw I/O there is
> (architecturally) no IOMMU, so the whole address translation makes no
> sense in the context of virtio-ccw.

So it sounds like a host issue: the emulation of s390 unnecessarily complicated.
Working around it by the guest looks wrong ...

> Halil Pasic (2):
>   mm: move force_dma_unencrypted() to mem_encrypt.h
>   virtio: let virtio use DMA API when guest RAM is protected
> 
>  drivers/virtio/virtio_ring.c |  3 +++
>  include/linux/dma-direct.h   |  9 -
>  include/linux/mem_encrypt.h  | 10 ++
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> 
> base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> * This usage is not congruent with  standardised semantics of
> VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> for using DMA API in virtio (orthogonal with respect to what is
> expressed by VIRTIO_F_IOMMU_PLATFORM). 

Quoting the spec:

  \item[VIRTIO_F_ACCESS_PLATFORM(33)] This feature indicates that
  the device can be used on a platform where device access to data
  in memory is limited and/or translated. E.g. this is the case if the device 
can be located
  behind an IOMMU that translates bus addresses from the device into physical
  addresses in memory, if the device can be limited to only access
  certain memory addresses or if special commands such as
  a cache flush can be needed to synchronise data in memory with
  the device. Whether accesses are actually limited or translated
  is described by platform-specific means.
  If this feature bit is set to 0, then the device
  has same access to memory addresses supplied to it as the
  driver has.
  In particular, the device will always use physical addresses
  matching addresses used by the driver (typically meaning
  physical addresses used by the CPU)
  and not translated further, and can access any address supplied to it by
  the driver. When clear, this overrides any platform-specific description of
  whether device access is limited or translated in any way, e.g.
  whether an IOMMU may be present.

since device can't access encrypted memory,
this seems to match your case reasonably well.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> Currently the advanced guest memory protection technologies (AMD SEV,
> powerpc secure guest technology and s390 Protected VMs) abuse the
> VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> is in turn necessary, to make IO work with guest memory protection.
> 
> But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> different beast: with virtio devices whose implementation runs on an SMP
> CPU we are still fine with doing all the usual optimizations, it is just
> that we need to make sure that the memory protection mechanism does not
> get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> side of the guest (and possibly he host side as well) than we actually
> need.
> 
> An additional benefit of teaching the guest to make the right decision
> (and use DMA API) on it's own is: removing the need, to mandate special
> VM configuration for guests that may run with protection. This is
> especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> the virtio control structures into the first 2G of guest memory:
> something we don't necessarily want to do per-default.
> 
> Signed-off-by: Halil Pasic 
> Tested-by: Ram Pai 
> Tested-by: Michael Mueller 

This might work for you but it's fragile, since without
VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
GPA's, not DMA addresses.



IOW this looks like another iteration of:

virtio: Support encrypted memory on powerpc secure guests

which I was under the impression was abandoned as unnecessary.


To summarize, the necessary conditions for a hack along these lines
(using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:

  - secure guest mode is enabled - so we know that since we don't share
most memory regular virtio code won't
work, even though the buggy hypervisor didn't set VIRTIO_F_ACCESS_PLATFORM
  - DMA API is giving us addresses that are actually also physical
addresses
  - Hypervisor is buggy and didn't enable VIRTIO_F_ACCESS_PLATFORM
 
I don't see how this patch does this.


> ---
>  drivers/virtio/virtio_ring.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 867c7ebd3f10..fafc8f924955 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   if (!virtio_has_iommu_quirk(vdev))
>   return true;
>  
> + if (force_dma_unencrypted(>dev))
> + return true;
> +
>   /* Otherwise, we are left to guess. */
>   /*
>* In theory, it's possible to have a buggy QEMU-supposed
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> Currently if one intends to run a memory protection enabled VM with
> virtio devices and linux as the guest OS, one needs to specify the
> VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
> linux use the DMA API, which in turn handles the memory
> encryption/protection stuff if the guest decides to turn itself into
> a protected one. This however makes no sense due to multiple reasons:
> * The device is not changed by the fact that the guest RAM is
> protected. The so called IOMMU bypass quirk is not affected.
> * This usage is not congruent with  standardised semantics of
> VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> for using DMA API in virtio (orthogonal with respect to what is
> expressed by VIRTIO_F_IOMMU_PLATFORM). 
> 
> This series aims to decouple 'have to use DMA API because my (guest) RAM
> is protected' and 'have to use DMA API because the device told me
> VIRTIO_F_IOMMU_PLATFORM'.
> 
> Please find more detailed explanations about the conceptual aspects in
> the individual patches. There is however also a very practical problem
> that is addressed by this series. 
> 
> For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> effect The vhost code assumes it the addresses on the virtio descriptor
> ring are not guest physical addresses but iova's, and insists on doing a
> translation of these regardless of what transport is used (e.g. whether
> we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> "vhost: new device IOTLB API".) On s390 this results in severe
> performance degradation (c.a. factor 10). BTW with ccw I/O there is
> (architecturally) no IOMMU, so the whole address translation makes no
> sense in the context of virtio-ccw.

That's just a QEMU thing. It uses the same flag for VIRTIO_F_ACCESS_PLATFORM
and vhost IOTLB. QEMU can separate them, no need to change linux.


> Halil Pasic (2):
>   mm: move force_dma_unencrypted() to mem_encrypt.h
>   virtio: let virtio use DMA API when guest RAM is protected
> 
>  drivers/virtio/virtio_ring.c |  3 +++
>  include/linux/dma-direct.h   |  9 -
>  include/linux/mem_encrypt.h  | 10 ++
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> 
> base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote:
> On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:
> > On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:
> > > On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> > > > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > > > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > > > > With the built-in topology description in place, x86 platforms can 
> > > > > > now
> > > > > > use the virtio-iommu.
> > > > > > 
> > > > > > Signed-off-by: Jean-Philippe Brucker 
> > > > > > ---
> > > > > >drivers/iommu/Kconfig | 3 ++-
> > > > > >1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > > > index 068d4e0e3541..adcbda44d473 100644
> > > > > > --- a/drivers/iommu/Kconfig
> > > > > > +++ b/drivers/iommu/Kconfig
> > > > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > > > > >config VIRTIO_IOMMU
> > > > > > bool "Virtio IOMMU driver"
> > > > > > depends on VIRTIO=y
> > > > > > -   depends on ARM64
> > > > > > +   depends on (ARM64 || X86)
> > > > > > select IOMMU_API
> > > > > > +   select IOMMU_DMA
> > > > > 
> > > > > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > > > > virtio-iommu itself (and really shouldn't be), but is merely to 
> > > > > satisfy the
> > > > > x86 arch code's expectation that IOMMU drivers bring their own DMA 
> > > > > ops,
> > > > > right?
> > > > > 
> > > > > Robin.
> > > > 
> > > > In fact does not this work on any platform now?
> > > 
> > > There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
> > > has been converted recently [1] but VT-d still implements its own DMA ops
> > > (conversion patches are on the list [2]). On Arm the arch Kconfig selects
> > > IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
> > > complete. Until then I can add a "if X86" here for clarity.
> > > 
> > > Thanks,
> > > Jean
> > > 
> > > [1] 
> > > https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
> > > [2] 
> > > https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/
> > 
> > What about others? E.g. PPC?
> 
> That was the point I was getting at - while iommu-dma should build just fine
> for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code
> to correctly wire up iommu_dma_ops to devices. Thus there's currently no
> point pulling it in and pretending it's anything more than a waste of space
> for architectures other than arm64 and x86. It's merely a historical
> artefact of the x86 DMA API implementation that when the IOMMU drivers were
> split out to form drivers/iommu they took some of their relevant arch code
> with them.
> 
> Robin.


Rather than white-listing architectures, how about making the
architectures in question set some kind of symbol, and depend on it?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:
> On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > > With the built-in topology description in place, x86 platforms can now
> > > > use the virtio-iommu.
> > > > 
> > > > Signed-off-by: Jean-Philippe Brucker 
> > > > ---
> > > >   drivers/iommu/Kconfig | 3 ++-
> > > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > index 068d4e0e3541..adcbda44d473 100644
> > > > --- a/drivers/iommu/Kconfig
> > > > +++ b/drivers/iommu/Kconfig
> > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > > >   config VIRTIO_IOMMU
> > > > bool "Virtio IOMMU driver"
> > > > depends on VIRTIO=y
> > > > -   depends on ARM64
> > > > +   depends on (ARM64 || X86)
> > > > select IOMMU_API
> > > > +   select IOMMU_DMA
> > > 
> > > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > > virtio-iommu itself (and really shouldn't be), but is merely to satisfy 
> > > the
> > > x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
> > > right?
> > > 
> > > Robin.
> > 
> > In fact does not this work on any platform now?
> 
> There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
> has been converted recently [1] but VT-d still implements its own DMA ops
> (conversion patches are on the list [2]). On Arm the arch Kconfig selects
> IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
> complete. Until then I can add a "if X86" here for clarity.
> 
> Thanks,
> Jean
> 
> [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
> [2] 
> https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/

What about others? E.g. PPC?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-16 Thread Michael S. Tsirkin
On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > With the built-in topology description in place, x86 platforms can now
> > use the virtio-iommu.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >   drivers/iommu/Kconfig | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 068d4e0e3541..adcbda44d473 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> >   config VIRTIO_IOMMU
> > bool "Virtio IOMMU driver"
> > depends on VIRTIO=y
> > -   depends on ARM64
> > +   depends on (ARM64 || X86)
> > select IOMMU_API
> > +   select IOMMU_DMA
> 
> Can that have an "if X86" for clarity? AIUI it's not necessary for
> virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
> x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
> right?
> 
> Robin.

In fact does not this work on any platform now?

> > select INTERVAL_TREE
> > help
> >   Para-virtualised IOMMU driver with virtio.
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 00/13] virtio-iommu on non-devicetree platforms

2019-11-22 Thread Michael S. Tsirkin
On Fri, Nov 22, 2019 at 11:49:47AM +0100, Jean-Philippe Brucker wrote:
> I'm seeking feedback on multi-platform support for virtio-iommu. At the
> moment only devicetree (DT) is supported and we don't have a pleasant
> solution for other platforms. Once we figure out the topology
> description, x86 support is trivial.
> 
> Since the IOMMU manages memory accesses from other devices, the guest
> kernel needs to initialize the IOMMU before endpoints start issuing DMA.
> It's a solved problem: firmware or hypervisor describes through DT or
> ACPI tables the device dependencies, and probe of endpoints is deferred
> until the IOMMU is probed. But:
> 
> (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT
> for Arm). From my point of view IORT is easier to extend, since we
> just need to introduce a new node type. There are no dependencies to
> Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> 
> However, there are concerns about other OS vendors feeling obligated
> to implement this new node, so Arm proposed introducing another ACPI
> table, that can wrap any of DMAR, IVRS and IORT to extend it with
> new virtual nodes. A draft of this VIOT table specification is
> available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf
> 
> I'm afraid this could increase fragmentation as guests would need to
> implement or modify their support for all of DMAR, IVRS and IORT. If
> we end up doing VIOT, I suggest limiting it to IORT.
> 
> (2) In addition, there are some concerns about having virtio depend on
> ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
> [1])

power?

> don't currently implement those methods.
> 
> It was suggested to embed the topology description into the device.
> It can work, as demonstrated at the end of this RFC, with the
> following limitations:
> 
> - The topology description must be read before any endpoint managed
>   by the IOMMU is probed, and even before the virtio module is
>   loaded. This RFC uses a PCI quirk to manually parse the virtio
>   configuration. It assumes that all endpoints managed by the IOMMU
>   are under this same PCI host.
> 
> - I don't have a solution for the virtio-mmio transport at the
>   moment, because I haven't had time to modify a host to test it. I
>   think it could either use a notifier on the platform bus, or
>   better, a new 'iommu' command-line argument to the virtio-mmio
>   driver.

A notifier seems easier for users. What are the disadvantages of
that?

>   So the current prototype doesn't work for firecracker and
>   microvm, which rely on virtio-mmio.
> 
> - For Arm, if the platform has an ITS, the hypervisor needs IORT or
>   DT to describe it anyway. More generally, not using either ACPI or
>   DT might prevent from supporting other features as well. I suspect
>   the above users will have to implement a standard method sooner or
>   later.
> 
> - Even when reusing as much existing code as possible, guest support
>   is still going to be around a few hundred lines since we can't
>   rely on the normal virtio infrastructure to be loaded at that
>   point. As you can see below, the diffstat for the incomplete
>   topology implementation is already bigger than the exhaustive IORT
>   support, even when jumping through the VIOT hoop.
> 
> So it's a lightweight solution for very specific use-cases, and we
> should still support ACPI for the general case. Multi-platform
> guests such as Linux will then need to support three topology
> descriptions instead of two.
> 
> In this RFC I present both solutions, but I'd rather not keep all of it.
> Please see the individual patches for details:
> 
> (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
> driver and patches 2, 11 add the VIOT glue.
> 
> (2) Patch 12 adds the built-in topology description to the virtio-iommu
> specification. Patch 13 is a partial implementation for the Linux
> virtio-iommu driver. It only supports PCI, not platform devices.
> 
> You can find Linux and QEMU code on my virtio-iommu/devel branches at
> http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu
> 
> 
> I split the diffstat since there are two independent features. The first
> one is for patches 1-11, and the second one for patch 13.
> 
> Jean-Philippe Brucker (11):
>   ACPI/IORT: Move IORT to the ACPI folder
>   ACPI: Add VIOT definitions
>   ACPI/IORT: Allow registration of external tables
>   ACPI/IORT: Add node categories
>   ACPI/IORT: Support VIOT virtio-mmio node
>   ACPI/IORT: Support VIOT virtio-pci node
>   ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
>   ACPI/IORT: Add callback to update a device's fwnode
>   iommu/virtio: Create fwnode if necessary
>   iommu/virtio: Update IORT fwnode
>   ACPI: Add VIOT 

Re: [RFC 13/13] iommu/virtio: Add topology description to

2019-11-22 Thread Michael S. Tsirkin
On Fri, Nov 22, 2019 at 11:50:00AM +0100, Jean-Philippe Brucker wrote:
> Some hypervisors don't implement either device-tree or ACPI, but still
> need a method to describe the IOMMU topology. Read the virtio-iommu
> config early and parse the topology description. Hook into the
> dma_setup() callbacks to initialize the IOMMU before probing endpoints.
> 
> If the virtio-iommu uses the virtio-pci transport, this will only work
> if the PCI root complex is the first device probed. We don't currently
> support virtio-mmio.
> 
> Initially I tried to generate a fake IORT table and feed it to the IORT
> driver, in order to avoid rewriting the whole DMA code, but it wouldn't
> work with platform endpoints, which are references to items in the ACPI
> table on IORT.
> 
> Signed-off-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

Overall this looks good to me. The only point is that
I think the way the interface is designed makes writing
the driver a bit too difficult. Idea: if instead we just
have a length field and then an array of records
(preferably unions so we don't need to work hard),
we can shadow that into memory, then iterate over
the unions.

Maybe add a uniform record length + number of records field.
Then just skip types you do not know how to handle.
This will also help make sure it's within bounds.

What do you think?


You will need to do something to address the TODO I think.

> ---
> Note that we only call virt_dma_configure() if the host didn't provide
> either DT or ACPI method. If you want to test this with QEMU, you'll
> need to manually disable the acpi_dma_configure() part in pci-driver.c
> ---
>  drivers/base/platform.c   |   3 +
>  drivers/iommu/Kconfig |   9 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu-topology.c | 410 ++
>  drivers/iommu/virtio-iommu.c  |   3 +
>  drivers/pci/pci-driver.c  |   3 +
>  include/linux/virtio_iommu.h  |  18 ++
>  include/uapi/linux/virtio_iommu.h |  26 ++
>  8 files changed, 473 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virtio_iommu.h
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index b230beb6ccb4..70b12c8ef2fb 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev)
>   } else if (has_acpi_companion(dev)) {
>   attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
>   ret = acpi_dma_configure(dev, attr);
> + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
> + ret = virt_dma_configure(dev);
>   }
>  
>   return ret;
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e6eb4f238d1a..d02c0d36019d 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -486,4 +486,13 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY
> + bool "Topology properties for the virtio-iommu"
> + depends on VIRTIO_IOMMU
> + help
> +   Enable early probing of the virtio-iommu device, to detect the
> +   topology description.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 4f405f926e73..6b51c4186ebc 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
> diff --git a/drivers/iommu/virtio-iommu-topology.c 
> b/drivers/iommu/virtio-iommu-topology.c
> new file mode 100644
> index ..ec22510ace3d
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu-topology.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct viommu_cap_config {
> + u8 pos; /* PCI capability position */
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */
> +};
> +
> +struct viommu_spec {
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headtopology;
> + struct list_headlist;
> +};
> +
> +struct viommu_topology {
> + union {
> + struct virtio_iommu_topo_head head;
> + struct virtio_iommu_topo_pci_range pci;
> + 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-09-05 Thread Michael S. Tsirkin
On Mon, Aug 12, 2019 at 02:15:32PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 11, 2019 at 04:55:27AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> > > So we need a flag on the virtio device, exposed by the
> > > hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> > > don't take a shortcut.
> > 
> > The point here is that it's actually still not real. So we would still
> > use a physical address. However Linux decides that it wants extra
> > security by moving all data through the bounce buffer.  The distinction
> > made is that one can actually give device a physical address of the
> > bounce buffer.
> 
> Sure.  The problem is just that you keep piling hacks on top of hacks.
> We need the per-device flag anyway to properly support hardware virtio
> device in all circumstances.  Instead of coming up with another ad-hoc
> hack to force DMA uses implement that one proper bit and reuse it here.

The flag that you mention literally means "I am a real device" so for
example, you can use VFIO with it. And this device isn't a real one,
and you can't use VFIO with it, even though it's part of a power
system which always has an IOMMU.



Or here's another way to put it: we have a broken device that can only
access physical addresses, not DMA addresses. But to enable SEV Linux
requires DMA API.  So we can still make it work if DMA address happens
to be a physical address (not necessarily of the same page).


This is where dma_addr_is_a_phys_addr() is coming from: it tells us this
weird configuration can still work.  What are we going to do for SEV if
dma_addr_is_a_phys_addr does not apply? Fail probe I guess.


So the proposal is really to make things safe and to this end,
to add this in probe:

if (sev_active() &&
!dma_addr_is_a_phys_addr(dev) &&
!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
return -EINVAL;


the point being to prevent loading driver where it would
corrupt guest memory. Put this way, any objections to adding
dma_addr_is_a_phys_addr to the DMA API?





-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> So we need a flag on the virtio device, exposed by the
> hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> don't take a shortcut.

The point here is that it's actually still not real. So we would still
use a physical address. However Linux decides that it wants extra
security by moving all data through the bounce buffer.  The distinction
made is that one can actually give device a physical address of the
bounce buffer.

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sat, Aug 10, 2019 at 11:46:21PM -0700, Ram Pai wrote:
> On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> > sev_active() is gone now in linux-next, at least as a global API.
> > 
> > And once again this is entirely going in the wrong direction.  The only
> > way using the DMA API is going to work at all is if the device is ready
> > for it.  So we need a flag on the virtio device, exposed by the
> > hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> > don't take a shortcut.
> > 
> > And that means on power and s390 qemu will always have to set thos if
> > you want to be ready for the ultravisor and co games.  It's not like we
> > haven't been through this a few times before, have we?
> 
> 
> We have been through this so many times, but I dont think, we ever
> understood each other.   I have a fundamental question, the answer to
> which was never clear. Here it is...
> 
> If the hypervisor (hardware for hw virtio devices) does not mandate a
> DMA API, why is it illegal for the driver to request, special handling
> of its i/o buffers? Why are we associating this special handling to
> always mean, some DMA address translation? Can't there be 
> any other kind of special handling needs, that has nothing to do with
> DMA address translation?

I think the answer to that is, extend the DMA API to cover that special
need then. And that's exactly what dma_addr_is_phys_addr is trying to
do.

> 
> -- 
> Ram Pai


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> And once again this is entirely going in the wrong direction.  The only
> way using the DMA API is going to work at all is if the device is ready
> for it.

So the point made is that if DMA addresses are also physical addresses
(not necessarily the same physical addresses that driver supplied), then
DMA API actually works even though device itself uses CPU page tables.


To put it in other terms: it would be possible to make all or part of
memory unenecrypted and then have virtio access all of it.  SEV guests
at the moment make a decision to instead use a bounce buffer, forcing an
extra copy but gaining security.

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sat, Aug 10, 2019 at 03:07:02PM -0700, Ram Pai wrote:
> On Sat, Aug 10, 2019 at 02:57:17PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jan 29, 2019 at 03:08:12PM -0200, Thiago Jung Bauermann wrote:
> > > 
> > > Hello,
> > > 
> > > With Christoph's rework of the DMA API that recently landed, the patch
> > > below is the only change needed in virtio to make it work in a POWER
> > > secure guest under the ultravisor.
> > > 
> > > The other change we need (making sure the device's dma_map_ops is NULL
> > > so that the dma-direct/swiotlb code is used) can be made in
> > > powerpc-specific code.
> > > 
> > > Of course, I also have patches (soon to be posted as RFC) which hook up
> > >  to the powerpc secure guest support code.
> > > 
> > > What do you think?
> > > 
> > > >From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> > > From: Thiago Jung Bauermann 
> > > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
> > > 
> > > The host can't access the guest memory when it's encrypted, so using
> > > regular memory pages for the ring isn't an option. Go through the DMA API.
> > > 
> > > Signed-off-by: Thiago Jung Bauermann 
> > > ---
> > >  drivers/virtio/virtio_ring.c | 5 -
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index cd7e755484e3..321a27075380 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device 
> > > *vdev)
> > >* not work without an even larger kludge.  Instead, enable
> > >* the DMA API if we're a Xen guest, which at least allows
> > >* all of the sensible Xen configurations to work correctly.
> > > +  *
> > > +  * Also, if guest memory is encrypted the host can't access
> > > +  * it directly. In this case, we'll need to use the DMA API.
> > >*/
> > > - if (xen_domain())
> > > + if (xen_domain() || sev_active())
> > >   return true;
> > > 
> > >   return false;
> > 
> > So I gave this lots of thought, and I'm coming round to
> > basically accepting something very similar to this patch.
> > 
> > But not exactly like this :).
> > 
> > Let's see what are the requirements.
> > 
> > If
> > 
> > 1. We do not trust the device (so we want to use a bounce buffer with it)
> > 2. DMA address is also a physical address of a buffer
> > 
> > then we should use DMA API with virtio.
> > 
> > 
> > sev_active() above is one way to put (1).  I can't say I love it but
> > it's tolerable.
> > 
> > 
> > But we also want promise from DMA API about 2.
> > 
> > 
> > Without promise 2 we simply can't use DMA API with a legacy device.
> > 
> > 
> > Otherwise, on a SEV system with an IOMMU which isn't 1:1
> > and with a virtio device without ACCESS_PLATFORM, we are trying
> > to pass a virtual address, and devices without ACCESS_PLATFORM
> > can only access CPU physical addresses.
> > 
> > So something like:
> > 
> > dma_addr_is_phys_addr?
> 
> 
> On our Secure pseries platform,  dma address is physical address and this
> proposal will help us, use DMA API. 
> 
> On our normal pseries platform, dma address is physical address too.
> But we do not necessarily need to use the DMA API.  We can use the DMA
> API, but our handlers will do the same thing, the generic virtio handlers
> would do. If there is an opt-out option; even when dma addr is same as
> physical addr, than there will be less code duplication.
> 
> Would something like this be better.
> 
> (dma_addr_is_phys_addr  && arch_want_to_use_dma_api()) ?
> 
> 
> RP

I think sev_active() is an OK replacement for arch_want_to_use_dma_api.
So just the addition of dma_addr_is_phys_addr would be enough.

> 
> > -- 
> > MST
> 
> -- 
> Ram Pai
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-10 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 03:08:12PM -0200, Thiago Jung Bauermann wrote:
> 
> Hello,
> 
> With Christoph's rework of the DMA API that recently landed, the patch
> below is the only change needed in virtio to make it work in a POWER
> secure guest under the ultravisor.
> 
> The other change we need (making sure the device's dma_map_ops is NULL
> so that the dma-direct/swiotlb code is used) can be made in
> powerpc-specific code.
> 
> Of course, I also have patches (soon to be posted as RFC) which hook up
>  to the powerpc secure guest support code.
> 
> What do you think?
> 
> >From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> From: Thiago Jung Bauermann 
> Date: Thu, 24 Jan 2019 22:08:02 -0200
> Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
> 
> The host can't access the guest memory when it's encrypted, so using
> regular memory pages for the ring isn't an option. Go through the DMA API.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  drivers/virtio/virtio_ring.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cd7e755484e3..321a27075380 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>* not work without an even larger kludge.  Instead, enable
>* the DMA API if we're a Xen guest, which at least allows
>* all of the sensible Xen configurations to work correctly.
> +  *
> +  * Also, if guest memory is encrypted the host can't access
> +  * it directly. In this case, we'll need to use the DMA API.
>*/
> - if (xen_domain())
> + if (xen_domain() || sev_active())
>   return true;
> 
>   return false;

So I gave this lots of thought, and I'm coming round to
basically accepting something very similar to this patch.

But not exactly like this :).

Let's see what are the requirements.

If

1. We do not trust the device (so we want to use a bounce buffer with it)
2. DMA address is also a physical address of a buffer

then we should use DMA API with virtio.


sev_active() above is one way to put (1).  I can't say I love it but
it's tolerable.


But we also want promise from DMA API about 2.


Without promise 2 we simply can't use DMA API with a legacy device.


Otherwise, on a SEV system with an IOMMU which isn't 1:1
and with a virtio device without ACCESS_PLATFORM, we are trying
to pass a virtual address, and devices without ACCESS_PLATFORM
can only access CPU physical addresses.

So something like:

dma_addr_is_phys_addr?



-- 
MST


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-24 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:51PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index c8be1c4f5b55..37c143971211 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
> >>   {
> >>size_t max_segment_size = SIZE_MAX;
> >>   -if (vring_use_dma_api(vdev))
> >> +  if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
> >
> > Hmm, might it make sense to roll that check up into vring_use_dma_api() 
> > itself? After all, if the device has no mask then it's likely that other 
> > DMA API ops wouldn't really work as expected either.
> 
> Makes sense to me.

Christoph - would a documented API wrapping dma_mask make sense?
With the documentation explaining how users must
desist from using DMA APIs if that returns false ...


-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:30PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 11:33:35AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> > > Do not call dma_max_mapping_size for devices that have no DMA
> > > mask set, otherwise we can hit a NULL pointer dereference.
> > > 
> > > This occurs when a virtio-blk-pci device is protected with
> > > a virtual IOMMU.
> > > 
> > > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > > Signed-off-by: Eric Auger 
> > > Suggested-by: Christoph Hellwig 
> > 
> > Christoph, I wonder why did you suggest this?
> > The connection between dma_mask and dma_max_mapping_size
> > is far from obvious.  The documentation doesn't exist.
> > Do we really have to teach all users about this hack?
> > Why not just make dma_max_mapping_size DTRT?
> 
> Because we should not call into dma API functions for devices that
> are not DMA capable.

I'd rather call is_device_dma_capable then, better than poking
at DMA internals.

-- 
MST


Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 06:56:49PM +0200, Auger Eric wrote:
> Hi Christoph,
> 
> On 7/22/19 6:51 PM, Eric Auger wrote:
> > We currently have cases where the dma_addressing_limited() gets
> > called with dma_mask unset. This causes a NULL pointer dereference.
> > 
> > Use dma_get_mask() accessor to prevent the crash.
> > 
> > Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
> > Signed-off-by: Eric Auger 
> 
> As a follow-up of my last email, here is a patch featuring
> dma_get_mask(). But you don't have the WARN_ON_ONCE anymore, pointing
> out suspect users.

OTOH these users then simply become okay so no need for WARN_ON_ONCE
then :)

> Feel free to pick up your preferred approach
> 
> Thanks
> 
> Eric
> > 
> > ---
> > 
> > v1 -> v2:
> > - was [PATCH 1/2] dma-mapping: Protect dma_addressing_limited
> >   against NULL dma_mask
> > - Use dma_get_mask
> > ---
> >  include/linux/dma-mapping.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> > index e11b115dd0e4..f7d1eea32c78 100644
> > --- a/include/linux/dma-mapping.h
> > +++ b/include/linux/dma-mapping.h
> > @@ -689,8 +689,8 @@ static inline int dma_coerce_mask_and_coherent(struct 
> > device *dev, u64 mask)
> >   */
> >  static inline bool dma_addressing_limited(struct device *dev)
> >  {
> > -   return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
> > -   dma_get_required_mask(dev);
> > +   return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
> > +   dma_get_required_mask(dev);
> >  }
> >  
> >  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 06:51:49PM +0200, Eric Auger wrote:
> We currently have cases where the dma_addressing_limited() gets
> called with dma_mask unset. This causes a NULL pointer dereference.
> 
> Use dma_get_mask() accessor to prevent the crash.
> 
> Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
> Signed-off-by: Eric Auger 

Acked-by: Michael S. Tsirkin 

If possible I really prefer this approach: avoids changing all callers
and uses documented interfaces.

> ---
> 
> v1 -> v2:
> - was [PATCH 1/2] dma-mapping: Protect dma_addressing_limited
>   against NULL dma_mask
> - Use dma_get_mask
> ---
>  include/linux/dma-mapping.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index e11b115dd0e4..f7d1eea32c78 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -689,8 +689,8 @@ static inline int dma_coerce_mask_and_coherent(struct 
> device *dev, u64 mask)
>   */
>  static inline bool dma_addressing_limited(struct device *dev)
>  {
> - return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
> - dma_get_required_mask(dev);
> + return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
> + dma_get_required_mask(dev);
>  }
>  
>  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> -- 
> 2.20.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
> On 22/07/2019 15:55, Eric Auger wrote:
> > Do not call dma_max_mapping_size for devices that have no DMA
> > mask set, otherwise we can hit a NULL pointer dereference.
> > 
> > This occurs when a virtio-blk-pci device is protected with
> > a virtual IOMMU.
> > 
> > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > Signed-off-by: Eric Auger 
> > Suggested-by: Christoph Hellwig 
> > ---
> >   drivers/virtio/virtio_ring.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c8be1c4f5b55..37c143971211 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
> >   {
> > size_t max_segment_size = SIZE_MAX;
> > -   if (vring_use_dma_api(vdev))
> > +   if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
> 
> Hmm, might it make sense to roll that check up into vring_use_dma_api()
> itself? After all, if the device has no mask then it's likely that other DMA
> API ops wouldn't really work as expected either.
> 
> Robin.

Nope, Eric pointed out it's just dma_addressing_limited that is broken.

Other APIs call dma_get_mask which handles the NULL mask case fine.


> > max_segment_size = dma_max_mapping_size(>dev);
> > return max_segment_size;
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 05:27:10PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> > Do not call dma_max_mapping_size for devices that have no DMA
> > mask set, otherwise we can hit a NULL pointer dereference.
> > 
> > This occurs when a virtio-blk-pci device is protected with
> > a virtual IOMMU.
> > 
> > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > Signed-off-by: Eric Auger 
> > Suggested-by: Christoph Hellwig 
> 
> Looks good.  virtio maintainers, let me know if you want to queue
> it up or if I should pick the patch up through the dma-mapping tree.

Personally I dislike this API because I feel presence of dma mask does
not strictly have to reflect max size. And generally the requirement to
check presence of mask feels like an undocumented hack to me.  Even
reading code will not help you avoid the warning, everyone will get it
wrong and get the warning splat in their logs.  So I would prefer just
v1 of the patch that makes dma API do the right thing for us.

However, if that's not going to be the case, let's fix it up in virtio.
In any case, for both v1 and v2 of the patches, you can merge them
through your tree:

Acked-by: Michael S. Tsirkin 

-- 
MST


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> Do not call dma_max_mapping_size for devices that have no DMA
> mask set, otherwise we can hit a NULL pointer dereference.
> 
> This occurs when a virtio-blk-pci device is protected with
> a virtual IOMMU.
> 
> Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> Signed-off-by: Eric Auger 
> Suggested-by: Christoph Hellwig 

Christoph, I wonder why did you suggest this?
The connection between dma_mask and dma_max_mapping_size
is far from obvious.  The documentation doesn't exist.
Do we really have to teach all users about this hack?
Why not just make dma_max_mapping_size DTRT?

> ---
>  drivers/virtio/virtio_ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c8be1c4f5b55..37c143971211 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
>  {
>   size_t max_segment_size = SIZE_MAX;
>  
> - if (vring_use_dma_api(vdev))
> + if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
>   max_segment_size = dma_max_mapping_size(>dev);
>  
>   return max_segment_size;
> -- 
> 2.20.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/virtio: Update to most recent specification

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 03:40:07PM +0100, Jean-Philippe Brucker wrote:
> Following specification review a few things were changed in v8 of the
> virtio-iommu series [1], but have been omitted when merging the base
> driver. Add them now:
> 
> * Remove the EXEC flag.
> * Add feature bit for the MMIO flag.
> * Change domain_bits to domain_range.
> * Add NOMEM status flag.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20190530170929.19366-1-jean-philippe.bruc...@arm.com/
> 
> Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
> Reported-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

I'm merging this for this release, unless someone objects.

> ---
>  drivers/iommu/virtio-iommu.c  | 40 ++-
>  include/uapi/linux/virtio_iommu.h | 32 ++---
>  2 files changed, 47 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 433f4d2ee956..80a740df0737 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -2,7 +2,7 @@
>  /*
>   * Virtio driver for the paravirtualized IOMMU
>   *
> - * Copyright (C) 2018 Arm Limited
> + * Copyright (C) 2019 Arm Limited
>   */
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -47,7 +47,10 @@ struct viommu_dev {
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
> - u8  domain_bits;
> + u32 first_domain;
> + u32 last_domain;
> + /* Supported MAP flags */
> + u32 map_flags;
>   u32 probe_size;
>  };
>  
> @@ -62,6 +65,7 @@ struct viommu_domain {
>   struct viommu_dev   *viommu;
>   struct mutexmutex; /* protects viommu pointer */
>   unsigned intid;
> + u32 map_flags;
>  
>   spinlock_t  mappings_lock;
>   struct rb_root_cached   mappings;
> @@ -113,6 +117,8 @@ static int viommu_get_req_errno(void *buf, size_t len)
>   return -ENOENT;
>   case VIRTIO_IOMMU_S_FAULT:
>   return -EFAULT;
> + case VIRTIO_IOMMU_S_NOMEM:
> + return -ENOMEM;
>   case VIRTIO_IOMMU_S_IOERR:
>   case VIRTIO_IOMMU_S_DEVERR:
>   default:
> @@ -607,15 +613,15 @@ static int viommu_domain_finalise(struct viommu_dev 
> *viommu,
>  {
>   int ret;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
> - unsigned int max_domain = viommu->domain_bits > 31 ? ~0 :
> -   (1U << viommu->domain_bits) - 1;
>  
>   vdomain->viommu = viommu;
> + vdomain->map_flags  = viommu->map_flags;
>  
>   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
> - ret = ida_alloc_max(>domain_ids, max_domain, GFP_KERNEL);
> + ret = ida_alloc_range(>domain_ids, viommu->first_domain,
> +   viommu->last_domain, GFP_KERNEL);
>   if (ret >= 0)
>   vdomain->id = (unsigned int)ret;
>  
> @@ -710,7 +716,7 @@ static int viommu_map(struct iommu_domain *domain, 
> unsigned long iova,
> phys_addr_t paddr, size_t size, int prot)
>  {
>   int ret;
> - int flags;
> + u32 flags;
>   struct virtio_iommu_req_map map;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> @@ -718,6 +724,9 @@ static int viommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>   (prot & IOMMU_WRITE ? VIRTIO_IOMMU_MAP_F_WRITE : 0) |
>   (prot & IOMMU_MMIO ? VIRTIO_IOMMU_MAP_F_MMIO : 0);
>  
> + if (flags & ~vdomain->map_flags)
> + return -EINVAL;
> +
>   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
>   if (ret)
>   return ret;
> @@ -1027,7 +1036,8 @@ static int viommu_probe(struct virtio_device *vdev)
>   goto err_free_vqs;
>   }
>  
> - viommu->domain_bits = 32;
> + viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
> + viommu->last_domain = ~0U;
>  
>   /* Optional features */
>   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
> @@ -1038,9 +1048,13 @@ static int viommu_probe(struct virtio_device *vdev)
>struct virtio_iommu_config, input_range.end,
>_end);
>  
> - virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_BITS,
> -  struct virtio_iommu_config, domain_bits,
> -  >domain_bits);
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
> +  struct virtio_iommu_config, domain_range.start,
> +  >first_domain);
> +
> + virtio_cread_feature(vdev, 

  1   2   3   >