Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace

2021-05-25 Thread Michael S. Tsirkin
On Tue, May 25, 2021 at 02:40:57PM +0800, Jason Wang wrote:
> 
> 在 2021/5/20 下午5:06, Yongji Xie 写道:
> > On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin  wrote:
> > > On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> > > > This series introduces a framework, which can be used to implement
> > > > vDPA Devices in a userspace program. The work consist of two parts:
> > > > control path forwarding and data path offloading.
> > > > 
> > > > In the control path, the VDUSE driver will make use of message
> > > > mechnism to forward the config operation from vdpa bus driver
> > > > to userspace. Userspace can use read()/write() to receive/reply
> > > > those control messages.
> > > > 
> > > > In the data path, the core is mapping dma buffer into VDUSE
> > > > daemon's address space, which can be implemented in different ways
> > > > depending on the vdpa bus to which the vDPA device is attached.
> > > > 
> > > > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the 
> > > > dma
> > > > buffer is reside in a userspace memory region which can be shared to the
> > > > VDUSE userspace processs via transferring the shmfd.
> > > > 
> > > > The details and our user case is shown below:
> > > > 
> > > > -   
> > > > --
> > > > |Container ||  QEMU(VM) |   |   
> > > > VDUSE daemon |
> > > > |   -  ||  ---  |   | 
> > > > -  |
> > > > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> > > > emulation | | block driver | |
> > > > +--- ---+   
> > > > -+--+-
> > > >  |   || 
> > > >  |
> > > >  |   || 
> > > >  |
> > > > +---++--+-
> > > > || block device |   |  vhost device || vduse 
> > > > driver |  | TCP/IP ||
> > > > |---+   +
> > > > ---+  -+|
> > > > |   |   |   |   
> > > > ||
> > > > | --+--   --+--- 
> > > > ---+---||
> > > > | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa 
> > > > device |||
> > > > | --+--   --+--- 
> > > > ---+---||
> > > > |   |  virtio bus   |   |   
> > > > ||
> > > > |   ++---   |   |   
> > > > ||
> > > > ||  |   |   
> > > > ||
> > > > |  --+--|   |   
> > > > ||
> > > > |  | virtio-blk device ||   |   
> > > > ||
> > > > |  --+--|   |   
> > > > ||
> > > > ||  |   |   
> > > > ||
> > > > | ---+---   |  

Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace

2021-05-20 Thread Michael S. Tsirkin
On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> This series introduces a framework, which can be used to implement
> vDPA Devices in a userspace program. The work consist of two parts:
> control path forwarding and data path offloading.
> 
> In the control path, the VDUSE driver will make use of message
> mechnism to forward the config operation from vdpa bus driver
> to userspace. Userspace can use read()/write() to receive/reply
> those control messages.
> 
> In the data path, the core is mapping dma buffer into VDUSE
> daemon's address space, which can be implemented in different ways
> depending on the vdpa bus to which the vDPA device is attached.
> 
> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> buffer is reside in a userspace memory region which can be shared to the
> VDUSE userspace processs via transferring the shmfd.
> 
> The details and our user case is shown below:
> 
> -   
> --
> |Container ||  QEMU(VM) |   | 
>   VDUSE daemon |
> |   -  ||  ---  |   | 
> -  |
> |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> emulation | | block driver | |
> +--- ---+   
> -+--+-
> |   ||
>   |
> |   ||
>   |
> +---++--+-
> || block device |   |  vhost device || vduse driver | 
>  | TCP/IP ||
> |---+   +---+ 
>  -+|
> |   |   |   | 
>   ||
> | --+--   --+--- ---+---  
>   ||
> | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |  
>   ||
> | --+--   --+--- ---+---  
>   ||
> |   |  virtio bus   |   | 
>   ||
> |   ++---   |   | 
>   ||
> ||  |   | 
>   ||
> |  --+--|   | 
>   ||
> |  | virtio-blk device ||   | 
>   ||
> |  --+--|   | 
>   ||
> ||  |   | 
>   ||
> | ---+---   |   | 
>   ||
> | |  virtio-vdpa driver |   |   | 
>   ||
> | ---+---   |   | 
>   ||
> ||  |   |vdpa 
> bus   ||
> | 
> ---+--+---+   
> ||
> | 
>---+--- |
> -|
>  NIC |--
>   
>---+---
>   
>   |
>   
>  -+-
>   
>  | Remote Storages |
>   
>  ---
> 
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used both in containers and
> VMs. Thus, we can have an unified technology stack in this two cases.
> 
> To test it with null-blk:
> 
>   $ qemu-storage-daemon \
>   --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>   --monitor chardev=charmonitor \
>   --blockdev 
> 

Re: Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space

2021-05-19 Thread Michael S. Tsirkin
On Thu, May 20, 2021 at 01:25:16PM +0800, Yongji Xie wrote:
> On Wed, May 19, 2021 at 10:42 PM Dan Carpenter  
> wrote:
> >
> > On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> > > On Mon, May 17, 2021 at 5:56 PM Xie Yongji  
> > > wrote:
> > > >
> > > > This ensures that we will not use an invalid block size
> > > > in config space (might come from an untrusted device).
> >
> > I looked at if I should add this as an untrusted function so that Smatch
> > could find these sorts of bugs but this is reading data from the host so
> > there has to be some level of trust...
> >
> 
> It would be great if Smatch could detect this case if possible. The
> data might be trusted in traditional VM cases. But now the data can be
> read from a userspace daemon when VDUSE is enabled.
> 
> > I should add some more untrusted data kvm functions to Smatch.  Right
> > now I only have kvm_register_read() and I've added kvm_read_guest_virt()
> > just now.
> >
> > > >
> > > > Signed-off-by: Xie Yongji 
> > > > ---
> > > >  drivers/block/virtio_blk.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index ebb4d3fe803f..c848aa36d49b 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> > > >struct virtio_blk_config, blk_size,
> > > >_size);
> > > > -   if (!err)
> > > > +   if (!err && blk_size > 0 && blk_size <= max_size)
> > >
> > > The check here is incorrect. I will use PAGE_SIZE as the maximum
> > > boundary in the new version.
> >
> > What does this bug look like to the user?
> 
> The kernel will panic if the block size is larger than PAGE_SIZE.

Kernel panic at this point is par for the course IMHO.
Let's focus on eliminating data corruption for starters.

> > A minimum block size of 1 seems pretty crazy.  Surely the minimum should be 
> > > higher?
> >
> 
> Yes, 512 is better here.
> 
> Thanks,
> Yongji

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/6] Add support for ACPI VIOT

2021-05-14 Thread Michael S. Tsirkin
On Fri, Apr 23, 2021 at 01:38:31PM +0200, Jean-Philippe Brucker wrote:
> Add a driver for the ACPI VIOT table, which provides topology
> information for para-virtual platforms. Enable virtio-iommu on
> non-devicetree platforms, including x86.

Acked-by: Michael S. Tsirkin 

Mostly ACPI stuff so I assume it's best to merge through that tree.

> Since v1 [1]:
> * The VIOT header definitions have been picked for v5.13
> * Share more code with IORT. Patches 1 and 2 extract the common code
>   from IORT.
> * Simplify the VIOT driver. Use the existing fwnode infrastructure
>   instead of adding hooks to the IOMMU driver.
> 
> You can find a QEMU implementation at [2], with extra support for
> testing all VIOT nodes including MMIO-based endpoints and IOMMU. This
> series, based on linux-next, is at [3]
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20210316191652.3401335-1-jean-phili...@linaro.org/
> [2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi
> [3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi
> 
> Jean-Philippe Brucker (6):
>   ACPI: arm64: Move DMA setup operations out of IORT
>   ACPI: Move IOMMU setup code out of IORT
>   ACPI: Add driver for the VIOT table
>   iommu/dma: Pass address limit rather than size to
> iommu_setup_dma_ops()
>   iommu/dma: Simplify calls to iommu_setup_dma_ops()
>   iommu/virtio: Enable x86 support
> 
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   4 +-
>  drivers/acpi/Makefile|   2 +
>  drivers/acpi/arm64/Makefile  |   1 +
>  include/acpi/acpi_bus.h  |   3 +
>  include/linux/acpi.h |   3 +
>  include/linux/acpi_iort.h|  14 +-
>  include/linux/acpi_viot.h|  19 ++
>  include/linux/dma-iommu.h|   4 +-
>  arch/arm64/mm/dma-mapping.c  |   2 +-
>  drivers/acpi/arm64/dma.c |  50 +
>  drivers/acpi/arm64/iort.c| 129 ++---
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |  60 +-
>  drivers/acpi/viot.c  | 350 +++
>  drivers/iommu/amd/iommu.c|   9 +-
>  drivers/iommu/dma-iommu.c|  17 +-
>  drivers/iommu/intel/iommu.c  |  10 +-
>  drivers/iommu/virtio-iommu.c |   8 +
>  MAINTAINERS  |   8 +
>  20 files changed, 548 insertions(+), 150 deletions(-)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/arm64/dma.c
>  create mode 100644 drivers/acpi/viot.c
> 
> -- 
> 2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/6] iommu/virtio: Enable x86 support

2021-05-14 Thread Michael S. Tsirkin
On Fri, Apr 23, 2021 at 01:38:37PM +0200, Jean-Philippe Brucker wrote:
> With the VIOT support in place, x86 platforms can now use the
> virtio-iommu.
> 
> Because the other x86 IOMMU drivers aren't yet ready to use the
> acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the
> moment. Similarly to Vt-d and AMD IOMMU, call iommu_setup_dma_ops() from
> probe_finalize().
> 
> Acked-by: Joerg Roedel 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/iommu/Kconfig| 3 ++-
>  drivers/iommu/dma-iommu.c| 1 +
>  drivers/iommu/virtio-iommu.c | 8 
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index aff8a4830dd1..07b7c25cbed8 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -400,8 +400,9 @@ config HYPERV_IOMMU
>  config VIRTIO_IOMMU
>   tristate "Virtio IOMMU driver"
>   depends on VIRTIO
> - depends on ARM64
> + depends on (ARM64 || X86)
>   select IOMMU_API
> + select IOMMU_DMA
>   select INTERVAL_TREE
>   select ACPI_VIOT if ACPI
>   help
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 175f8eaeb5b3..46ed43c400cf 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1332,6 +1332,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
> dma_base, u64 dma_limit)
>pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA 
> ops\n",
>dev_name(dev));
>  }
> +EXPORT_SYMBOL_GPL(iommu_setup_dma_ops);
>  
>  static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>   phys_addr_t msi_addr, struct iommu_domain *domain)
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 29a397c2d12f..8be546a338e7 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -1027,6 +1027,13 @@ static struct iommu_device *viommu_probe_device(struct 
> device *dev)
>   return ERR_PTR(ret);
>  }
>  
> +static void viommu_probe_finalize(struct device *dev)
> +{
> +#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> + iommu_setup_dma_ops(dev, 0, U64_MAX);
> +#endif
> +}
> +
>  static void viommu_release_device(struct device *dev)
>  {
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> @@ -1063,6 +1070,7 @@ static struct iommu_ops viommu_ops = {
>   .iova_to_phys   = viommu_iova_to_phys,
>   .iotlb_sync = viommu_iotlb_sync,
>   .probe_device   = viommu_probe_device,
> + .probe_finalize = viommu_probe_finalize,
>   .release_device = viommu_release_device,
>   .device_group   = viommu_device_group,
>   .get_resv_regions   = viommu_get_resv_regions,
> -- 
> 2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2021-03-18 Thread Michael S. Tsirkin
On Tue, Mar 16, 2021 at 08:16:54PM +0100, Jean-Philippe Brucker wrote:
> With the VIOT support in place, x86 platforms can now use the
> virtio-iommu.
> 
> The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it
> themselves.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/iommu/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 2819b5c8ec30..ccca83ef2f06 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -400,8 +400,9 @@ config HYPERV_IOMMU
>  config VIRTIO_IOMMU
>   tristate "Virtio IOMMU driver"
>   depends on VIRTIO
> - depends on ARM64
> + depends on (ARM64 || X86)
>   select IOMMU_API
> + select IOMMU_DMA if X86

Would it hurt to just select unconditionally? Seems a bit cleaner
...

>   select INTERVAL_TREE
>   select ACPI_VIOT if ACPI
>   help
> -- 
> 2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/virtio: unchecked device dma address and length

2020-12-16 Thread Michael S. Tsirkin
On Tue, Dec 15, 2020 at 11:20:48AM +0800, Jason Wang wrote:
> 
> On 2020/12/15 上午5:49, Konrad Rzeszutek Wilk wrote:
> > On Fri, Dec 11, 2020 at 06:31:21PM +0100, Felicitas Hetzelt wrote:
> > > Hello,
> > Hi! Please see below my responses.
> > 
> > > we have been analyzing the Hypervisor-OS interface of Linux
> > > and discovered bugs in the swiotlb/virtio implementation that can be
> > > triggered from a malicious Hypervisor / virtual device.
> > > With SEV, the SWIOTLB implementation is forcefully enabled and would
> > > always be used. Thus, all virtio devices and others would use it under
> > > the hood.
> > > 
> > > The reason for analyzing this interface is that, technologies such as
> > > Intel's Trusted Domain Extensions [1] and AMD's Secure Nested Paging [2]
> > > change the threat model assumed by various Linux kernel subsystems.
> > > These technologies take the presence of a fully malicious hypervisor
> > > into account and aim to provide protection for virtual machines in such
> > > an environment. Therefore, all input received from the hypervisor or an
> > > external device should be carefully validated. Note that these issues
> > > are of little (or no) relevance in a "normal" virtualization setup,
> > > nevertheless we believe that it is required to fix them if TDX or SNP is
> > > used.
> > > 
> > > We are happy to provide more information if needed!
> > > 
> > > [1]
> > > https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
> > > 
> > > [2]https://www.amd.com/en/processors/amd-secure-encrypted-virtualization
> > > 
> > > Bug:
> > > OOB memory write.
> > > dma_unmap_single -> swiotlb_tbl_unmap_single is invoked with dma_addr
> > > and length parameters that are under control of the device.
> > > This happens e.g. in virtio_ring:
> > > https://elixir.bootlin.com/linux/v5.10-rc7/source/drivers/virtio/virtio_ring.c#L378
> > Heya!
> > 
> > Thank you for pointing this out! I've a couple of questions and hope you can
> > help me out with them.
> > 
> > Also CC-ing AMD / TDX folks.
> > > This raises two issues:
> > > 1) swiotlb_tlb_unmap_single fails to check whether the index generated
> > > from the dma_addr is in range of the io_tlb_orig_addr array.
> > That is fairly simple to implement I would think. That is it can check
> > that the dma_addr is from the PA in the io_tlb pool when SWIOTLB=force
> > is used.
> 
> 
> I'm not sure this can fix all the cases. It looks to me we should map
> descriptor coherent but readonly (which is not supported by current DMA
> API).

Neither is this supported but encrypted memory technologies.

> Otherwise, device can modify the desc[i].addr/desc[i].len at any time to
> pretend a valid mapping.
> 
> Thanks
> 
> 
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-10-28 Thread Michael S. Tsirkin
On Wed, Oct 28, 2020 at 03:24:03PM +0100, Alexander Graf wrote:
> On 24.02.20 18:16, Christoph Hellwig wrote:
> > On Sat, Feb 22, 2020 at 02:07:58PM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Feb 21, 2020 at 03:33:40PM +0100, Halil Pasic wrote:
> > > > AFAIU you have a positive attitude towards the idea, that
> > > > !F_VIRTIO_PLATFORM implies 'no DMA API is used by virtio'
> > > > should be scrapped.
> > > > 
> > > > I would like to accomplish that without adverse effects to virtio-ccw
> > > > (because caring for virtio-ccw is a part of job description).
> > > > 
> > > > Regards,
> > > > Halil
> > > 
> > > It is possible, in theory. IIRC the main challenge is that DMA API
> > > has overhead of indirect function calls even when all it
> > > does it return back the PA without changes.
> > 
> > That overhead is gone now, the DMA direct calls are direct calls these
> > days.
> 
> Michael, would that mitigate your concerns to just always use the DMA API?
> If not, wouldn't it make sense to benchmark and pinpoint Christoph to paths
> that do slow things down, so we can finally move virtio into a world where
> it doesn't bypass the kernel DMA infrastructure.
> 
> 
> Alex


There's no specific concern really. One can in theory move the code
handling !VIRTIO_F_ACCESS_PLATFORM such that instead of having a branch
in virtio code, you instead override platform DMA API and then use DMA
API unconditionally.

The gain from that will probably be marginal, but maybe I'm wrong.
Let's see the patches.

We still need to do the override when !VIRTIO_F_ACCESS_PLATFORM,
according to spec. Encrypted hypervisors still need to set
VIRTIO_F_ACCESS_PLATFORM.


Is VIRTIO_F_ACCESS_PLATFORM a good default? Need to support
legacy guests does not allow that at the qemu level, we
need to be conservative there. But yes, if you have a very
modern guest then it might be a good idea to just always
enable VIRTIO_F_ACCESS_PLATFORM. I say might since unfortunately
most people only use VIRTIO_F_ACCESS_PLATFORM with
vIOMMU, using it without VIRTIO_F_ACCESS_PLATFORM is supported
but is a path less well trodden. Benchmarking that,
fixing issues that surface if any would be imho effort
well spent, and would be a prerequisite to making it
a default in a production system.


> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Fri, Sep 25, 2020 at 01:26:29PM +0200, Jean-Philippe Brucker wrote:
> On Fri, Sep 25, 2020 at 06:22:57AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> > > On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > > > Add a topology description to the virtio-iommu driver and enable x86
> > > > platforms.
> > > > 
> > > > Since [v2] we have made some progress on adding ACPI support for
> > > > virtio-iommu, which is the preferred boot method on x86. It will be a
> > > > new vendor-agnostic table describing para-virtual topologies in a
> > > > minimal format. However some platforms don't use either ACPI or DT for
> > > > booting (for example microvm), and will need the alternative topology
> > > > description method proposed here. In addition, since the process to get
> > > > a new ACPI table will take a long time, this provides a boot method even
> > > > to ACPI-based platforms, if only temporarily for testing and
> > > > development.
> > > > 
> > > > v3:
> > > > * Add patch 1 that moves virtio-iommu to a subfolder.
> > > > * Split the rest:
> > > >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > > > support.
> > > >   * Patch 4 adds definitions.
> > > >   * Patch 5 adds parser in topology.c.
> > > > * Address other comments.
> > > > 
> > > > Linux and QEMU patches available at:
> > > > https://jpbrucker.net/git/linux virtio-iommu/devel
> > > > https://jpbrucker.net/git/qemu virtio-iommu/devel
> > > 
> > > I'm parking this work again, until we make progress on the ACPI table, or
> > > until a platform without ACPI and DT needs it. Until then, I've pushed v4
> > > to my virtio-iommu/topo branch and will keep it rebased on master.
> > > 
> > > Thanks,
> > > Jean
> > 
> > I think you guys need to work on virtio spec too, not too much left to
> > do there ...
> 
> I know it's ready and I'd really like to move on with this, but I'd rather
> not commit it to the spec until we know it's going to be used at all. As
> Gerd pointed out the one example we had, microvm, now supports ACPI. Since
> we've kicked off the ACPI work anyway it isn't clear that the built-in
> topology will be useful.
> 
> Thanks,
> Jean

Many power platforms are OF based, thus without ACPI or DT support.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > Add a topology description to the virtio-iommu driver and enable x86
> > platforms.
> > 
> > Since [v2] we have made some progress on adding ACPI support for
> > virtio-iommu, which is the preferred boot method on x86. It will be a
> > new vendor-agnostic table describing para-virtual topologies in a
> > minimal format. However some platforms don't use either ACPI or DT for
> > booting (for example microvm), and will need the alternative topology
> > description method proposed here. In addition, since the process to get
> > a new ACPI table will take a long time, this provides a boot method even
> > to ACPI-based platforms, if only temporarily for testing and
> > development.
> > 
> > v3:
> > * Add patch 1 that moves virtio-iommu to a subfolder.
> > * Split the rest:
> >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > support.
> >   * Patch 4 adds definitions.
> >   * Patch 5 adds parser in topology.c.
> > * Address other comments.
> > 
> > Linux and QEMU patches available at:
> > https://jpbrucker.net/git/linux virtio-iommu/devel
> > https://jpbrucker.net/git/qemu virtio-iommu/devel
> 
> I'm parking this work again, until we make progress on the ACPI table, or
> until a platform without ACPI and DT needs it. Until then, I've pushed v4
> to my virtio-iommu/topo branch and will keep it rebased on master.
> 
> Thanks,
> Jean

I think you guys need to work on virtio spec too, not too much left to
do there ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 02:50:46PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 08:41:21AM -0400, Michael S. Tsirkin wrote:
> > But this has nothing to do with Linux.  There is also no guarantee that
> > the two committees will decide to use exactly the same format. Once one
> > of them sets the format in stone, we can add support for that format to
> > linux. If another one is playing nice and uses the same format, we can
> > use the same parsers. If it doesn't linux will have to follow suit.
> 
> Or Linux decides to support only one of the formats, which would then be
> ACPI.
> 
> Regards,
> 
>   Joerg

It's really up to hypervisors not guests, linux as a guest can for sure
refuse to work somewhere, but that's normally not very attractive.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 12:02:55PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:38:13AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> > > On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > > > OK so this looks good. Can you pls repost with the minor tweak
> > > > suggested and all acks included, and I will queue this?
> > > 
> > > My NACK still stands, as long as a few questions are open:
> > > 
> > >   1) The format used here will be the same as in the ACPI table? I
> > >  think the answer to this questions must be Yes, so this leads
> > >  to the real question:
> > 
> > I am not sure it's a must.
> 
> It is, having only one parser for the ACPI and MMIO descriptions was one
> of the selling points for MMIO in past discussions and I think it makes
> sense to keep them in sync.
> 
> > We can always tweak the parser if there are slight differences
> > between ACPI and virtio formats.
> 
> There is no guarantee that there only need to be "tweaks" until the
> ACPI table format is stablized.
> 
> Regards,
> 
>   Joerg

But this has nothing to do with Linux.  There is also no guarantee that
the two committees will decide to use exactly the same format. Once one
of them sets the format in stone, we can add support for that format to
linux. If another one is playing nice and uses the same format, we can
use the same parsers. If it doesn't linux will have to follow suit.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > OK so this looks good. Can you pls repost with the minor tweak
> > suggested and all acks included, and I will queue this?
> 
> My NACK still stands, as long as a few questions are open:
> 
>   1) The format used here will be the same as in the ACPI table? I
>  think the answer to this questions must be Yes, so this leads
>  to the real question:

I am not sure it's a must.
We can always tweak the parser if there are slight differences
between ACPI and virtio formats.

But we do want the virtio format used here to be approved by the virtio
TC, so it won't change.

Eric, Jean-Philippe, does one of you intend to create a github issue
and request a ballot for the TC? It's been posted end of August with no
changes ...

>   2) Has the ACPI table format stabalized already? If and only if
>  the answer is Yes I will Ack these patches. We don't need to
>  wait until the ACPI table format is published in a
>  specification update, but at least some certainty that it
>  will not change in incompatible ways anymore is needed.
> 

Not that I know, but I don't see why it's a must.

> So what progress has been made with the ACPI table specification, is it
> just a matter of time to get it approved or are there concerns?
> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Fri, Sep 04, 2020 at 06:24:12PM +0200, Auger Eric wrote:
> Hi,
> 
> On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> > Add a topology description to the virtio-iommu driver and enable x86
> > platforms.
> > 
> > Since [v2] we have made some progress on adding ACPI support for
> > virtio-iommu, which is the preferred boot method on x86. It will be a
> > new vendor-agnostic table describing para-virtual topologies in a
> > minimal format. However some platforms don't use either ACPI or DT for
> > booting (for example microvm), and will need the alternative topology
> > description method proposed here. In addition, since the process to get
> > a new ACPI table will take a long time, this provides a boot method even
> > to ACPI-based platforms, if only temporarily for testing and
> > development.
> > 
> > v3:
> > * Add patch 1 that moves virtio-iommu to a subfolder.
> > * Split the rest:
> >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > support.
> >   * Patch 4 adds definitions.
> >   * Patch 5 adds parser in topology.c.
> > * Address other comments.
> > 
> > Linux and QEMU patches available at:
> > https://jpbrucker.net/git/linux virtio-iommu/devel
> > https://jpbrucker.net/git/qemu virtio-iommu/devel
> I have tested that series with above QEMU branch on ARM with virtio-net
> and virtio-blk translated devices in non DT mode.
> 
> It works for me:
> Tested-by: Eric Auger 
> 
> Thanks
> 
> Eric

OK so this looks good. Can you pls repost with the minor tweak
suggested and all acks included, and I will queue this?

Thanks!

> > 
> > [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> > [v2] 
> > https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> > [v1] 
> > https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> > [rfc] 
> > https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> > 
> > Jean-Philippe Brucker (6):
> >   iommu/virtio: Move to drivers/iommu/virtio/
> >   iommu/virtio: Add topology helpers
> >   PCI: Add DMA configuration for virtual platforms
> >   iommu/virtio: Add topology definitions
> >   iommu/virtio: Support topology description in config space
> >   iommu/virtio: Enable x86 support
> > 
> >  drivers/iommu/Kconfig |  18 +-
> >  drivers/iommu/Makefile|   3 +-
> >  drivers/iommu/virtio/Makefile |   4 +
> >  drivers/iommu/virtio/topology-helpers.h   |  50 +
> >  include/linux/virt_iommu.h|  15 ++
> >  include/uapi/linux/virtio_iommu.h |  44 
> >  drivers/iommu/virtio/topology-helpers.c   | 196 
> >  drivers/iommu/virtio/topology.c   | 259 ++
> >  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
> >  drivers/pci/pci-driver.c  |   5 +
> >  MAINTAINERS   |   3 +-
> >  11 files changed, 597 insertions(+), 4 deletions(-)
> >  create mode 100644 drivers/iommu/virtio/Makefile
> >  create mode 100644 drivers/iommu/virtio/topology-helpers.h
> >  create mode 100644 include/linux/virt_iommu.h
> >  create mode 100644 drivers/iommu/virtio/topology-helpers.c
> >  create mode 100644 drivers/iommu/virtio/topology.c
> >  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-08-26 Thread Michael S. Tsirkin
On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> Add a topology description to the virtio-iommu driver and enable x86
> platforms.
> 
> Since [v2] we have made some progress on adding ACPI support for
> virtio-iommu, which is the preferred boot method on x86. It will be a
> new vendor-agnostic table describing para-virtual topologies in a
> minimal format. However some platforms don't use either ACPI or DT for
> booting (for example microvm), and will need the alternative topology
> description method proposed here. In addition, since the process to get
> a new ACPI table will take a long time, this provides a boot method even
> to ACPI-based platforms, if only temporarily for testing and
> development.

OK should I park this in next now? Seems appropriate ...

> v3:
> * Add patch 1 that moves virtio-iommu to a subfolder.
> * Split the rest:
>   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> support.
>   * Patch 4 adds definitions.
>   * Patch 5 adds parser in topology.c.
> * Address other comments.
> 
> Linux and QEMU patches available at:
> https://jpbrucker.net/git/linux virtio-iommu/devel
> https://jpbrucker.net/git/qemu virtio-iommu/devel
> 
> [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> [v2] 
> https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> [v1] 
> https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> [rfc] 
> https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Move to drivers/iommu/virtio/
>   iommu/virtio: Add topology helpers
>   PCI: Add DMA configuration for virtual platforms
>   iommu/virtio: Add topology definitions
>   iommu/virtio: Support topology description in config space
>   iommu/virtio: Enable x86 support
> 
>  drivers/iommu/Kconfig |  18 +-
>  drivers/iommu/Makefile|   3 +-
>  drivers/iommu/virtio/Makefile |   4 +
>  drivers/iommu/virtio/topology-helpers.h   |  50 +
>  include/linux/virt_iommu.h|  15 ++
>  include/uapi/linux/virtio_iommu.h |  44 
>  drivers/iommu/virtio/topology-helpers.c   | 196 
>  drivers/iommu/virtio/topology.c   | 259 ++
>  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
>  drivers/pci/pci-driver.c  |   5 +
>  MAINTAINERS   |   3 +-
>  11 files changed, 597 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/iommu/virtio/Makefile
>  create mode 100644 drivers/iommu/virtio/topology-helpers.h
>  create mode 100644 include/linux/virt_iommu.h
>  create mode 100644 drivers/iommu/virtio/topology-helpers.c
>  create mode 100644 drivers/iommu/virtio/topology.c
>  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> 
> -- 
> 2.28.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 36/38] virtio-iommu: convert to LE accessors

2020-08-05 Thread Michael S. Tsirkin
Virtio iommu is modern-only. Use LE accessors for config space.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/iommu/virtio-iommu.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index f6f07489a9aa..b4da396cce60 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1010,8 +1010,8 @@ static int viommu_probe(struct virtio_device *vdev)
if (ret)
return ret;
 
-   virtio_cread(vdev, struct virtio_iommu_config, page_size_mask,
->pgsize_bitmap);
+   virtio_cread_le(vdev, struct virtio_iommu_config, page_size_mask,
+   >pgsize_bitmap);
 
if (!viommu->pgsize_bitmap) {
ret = -EINVAL;
@@ -1022,25 +1022,25 @@ static int viommu_probe(struct virtio_device *vdev)
viommu->last_domain = ~0U;
 
/* Optional features */
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
-struct virtio_iommu_config, input_range.start,
-_start);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
+   struct virtio_iommu_config, input_range.start,
+   _start);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
-struct virtio_iommu_config, input_range.end,
-_end);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
+   struct virtio_iommu_config, input_range.end,
+   _end);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
-struct virtio_iommu_config, domain_range.start,
->first_domain);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
+   struct virtio_iommu_config, domain_range.start,
+   >first_domain);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
-struct virtio_iommu_config, domain_range.end,
->last_domain);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
+   struct virtio_iommu_config, domain_range.end,
+   >last_domain);
 
-   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_PROBE,
-struct virtio_iommu_config, probe_size,
->probe_size);
+   virtio_cread_le_feature(vdev, VIRTIO_IOMMU_F_PROBE,
+   struct virtio_iommu_config, probe_size,
+   >probe_size);
 
viommu->geometry = (struct iommu_domain_geometry) {
.aperture_start = input_start,
-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-11 Thread Michael S. Tsirkin
On Fri, Jul 10, 2020 at 10:23:22AM -0700, Stefano Stabellini wrote:
> Sorry for the late reply -- a couple of conferences kept me busy.
> 
> 
> On Wed, 1 Jul 2020, Michael S. Tsirkin wrote:
> > On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> > > Would you be in favor of a more flexible check along the lines of the
> > > one proposed in the patch that started this thread:
> > > 
> > > if (xen_vring_use_dma())
> > > return true;
> > > 
> > > 
> > > xen_vring_use_dma would be implemented so that it returns true when
> > > xen_swiotlb is required and false otherwise.
> > 
> > Just to stress - with a patch like this virtio can *still* use DMA API
> > if PLATFORM_ACCESS is set. So if DMA API is broken on some platforms
> > as you seem to be saying, you guys should fix it before doing something
> > like this..
> 
> Yes, DMA API is broken with some interfaces (specifically: rpmesg and
> trusty), but for them PLATFORM_ACCESS is never set. That is why the
> errors weren't reported before. Xen special case aside, there is no
> problem under normal circumstances.

So why not fix DMA API? Then this patch is not needed.


> 
> If you are OK with this patch (after a little bit of clean-up), Peng,
> are you OK with sending an update or do you want me to?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Michael S. Tsirkin
On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> Would you be in favor of a more flexible check along the lines of the
> one proposed in the patch that started this thread:
> 
> if (xen_vring_use_dma())
> return true;
> 
> 
> xen_vring_use_dma would be implemented so that it returns true when
> xen_swiotlb is required and false otherwise.

Just to stress - with a patch like this virtio can *still* use DMA API
if PLATFORM_ACCESS is set. So if DMA API is broken on some platforms
as you seem to be saying, you guys should fix it before doing something
like this..

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-07-01 Thread Michael S. Tsirkin
On Wed, Jul 01, 2020 at 10:34:53AM -0700, Stefano Stabellini wrote:
> On Wed, 1 Jul 2020, Christoph Hellwig wrote:
> > On Mon, Jun 29, 2020 at 04:46:09PM -0700, Stefano Stabellini wrote:
> > > > I could imagine some future Xen hosts setting a flag somewhere in the
> > > > platform capability saying "no xen specific flag, rely on
> > > > "VIRTIO_F_ACCESS_PLATFORM". Then you set that accordingly in QEMU.
> > > > How about that?
> > > 
> > > Yes, that would be fine and there is no problem implementing something
> > > like that when we get virtio support in Xen. Today there are still no
> > > virtio interfaces provided by Xen to ARM guests (no virtio-block/net,
> > > etc.)
> > > 
> > > In fact, in both cases we are discussing virtio is *not* provided by
> > > Xen; it is a firmware interface to something entirely different:
> > > 
> > > 1) virtio is used to talk to a remote AMP processor (RPMesg)
> > > 2) virtio is used to talk to a secure-world firmware/OS (Trusty)
> > >
> > > VIRTIO_F_ACCESS_PLATFORM is not set by Xen in these cases but by RPMesg
> > > and by Trusty respectively. I don't know if Trusty should or should not
> > > set VIRTIO_F_ACCESS_PLATFORM, but I think Linux should still work
> > > without issues.
> > > 
> > 
> > Any virtio implementation that is not in control of the memory map
> > (aka not the hypervisor) absolutely must set VIRTIO_F_ACCESS_PLATFORM,
> > else it is completely broken.
> 
> Lots of broken virtio implementations out there it would seem :-(

Not really, most of virtio implementations are in full control of
memory, being part of the hypervisor.

> 
> > > The xen_domain() check in Linux makes it so that vring_use_dma_api
> > > returns the opposite value on native Linux compared to Linux as Xen/ARM
> > > DomU by "accident". By "accident" because there is no architectural
> > > reason why Linux Xen/ARM DomU should behave differently compared to
> > > native Linux in this regard.
> > > 
> > > I hope that now it is clearer why I think the if (xen_domain()) check
> > > needs to be improved anyway, even if we fix generic dma_ops with virtio
> > > interfaces missing VIRTIO_F_ACCESS_PLATFORM.
> > 
> > IMHO that Xen quirk should never have been added in this form..
> 
> Would you be in favor of a more flexible check along the lines of the
> one proposed in the patch that started this thread:
> 
> if (xen_vring_use_dma())
> return true;
> 
> 
> xen_vring_use_dma would be implemented so that it returns true when
> xen_swiotlb is required and false otherwise.

I'll need to think about it. Sounds reasonable on the surface ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 06:25:41AM +, Peng Fan wrote:
> > > > > Anyway, re-reading the last messages of the original thread [1],
> > > > > it looks like Peng had a clear idea on how to fix the general issue.
> > > > > Peng, what happened with that?
> > >
> > > We shrinked the rpmsg reserved area to workaround the issue.
> > > So still use the dma apis in rpmsg.
> > >
> > > But here I am going to address domu android trusty issue using virtio.
> > 
> > My suggestion is to first of all fix DMA API so it works properly.
> 
> Could you please elaborate more details?
> 
> You mean the DMA API usage of rpmsg? Or xen domu dma_ops?
> 
> Thanks,
> Peng. 

Not 100% sure but I think xen dma ops.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 03:05:19AM +, Peng Fan wrote:
> > Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> > 
> > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > > >
> > > > > > > > > Use xen_swiotlb to determine when vring should use dma
> > > > > > > > > APIs to map the
> > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required.
> > > > > > > > > When it is disabled, it is not required.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Peng Fan 
> > > > > > > >
> > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for
> > this?
> > > > > > > > Xen was there first, but everyone else is using that now.
> > > > > > >
> > > > > > > Unfortunately it is complicated and it is not related to
> > > > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > > >
> > > > > > >
> > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to
> > > > > > > translate foreign mappings (memory coming from other VMs) to
> > physical addresses.
> > > > > > > On x86, it also uses dma_ops to translate Linux's idea of a
> > > > > > > physical address into a real physical address (this is
> > > > > > > unneeded on ARM.)
> > > > > > >
> > > > > > >
> > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be
> > > > > > > used on Xen/x86 always and on Xen/ARM if Linux is Dom0
> > > > > > > (because it has foreign
> > > > > > > mappings.) That is why we have the if (xen_domain) return
> > > > > > > true; in vring_use_dma_api.
> > > > > >
> > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > > >
> > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also* forces
> > > > > > DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > > > >
> > > > > > Unfortunately as a result Xen never got around to properly
> > > > > > setting VIRTIO_F_IOMMU_PLATFORM.
> > > > >
> > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this
> > > > > because the usage of swiotlb_xen is not a property of virtio,
> > > >
> > > >
> > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM (that is it's
> > > > name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is what linux
> > > > calls it) is declared as "special, don't follow normal rules for
> > > > access".
> > > >
> > > > So yes swiotlb_xen is not a property of virtio, but what *is* a
> > > > property of virtio is that it's not special, just a regular device from 
> > > > DMA
> > POV.
> > >
> > > I am trying to understand what you meant but I think I am missing
> > > something.
> > >
> > > Are you saying that modern virtio should always have
> > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other
> > devices?
> > 
> > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you have
> > some special needs e.g. you are very sure it's ok to bypass DMA ops, or you
> > need to support a legacy guest (produced in the window between virtio 1
> > support in 2014 and support for VIRTIO_F_ACCESS_PLATFORM in 2016).
> > 
> > 
> > > If that is the case, how is it possible that virtio breaks on ARM
> > > using the default dma_ops? The breakage is not Xen related (except
> > > that Xen turns dma_ops on). The original message from Peng was:
> > >
> > >   vring_map_one

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-26 Thread Michael S. Tsirkin
On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > 
> > > > > > > Use xen_swiotlb to determine when vring should use dma APIs to 
> > > > > > > map the
> > > > > > > ring: when xen_swiotlb is enabled the dma API is required. When 
> > > > > > > it is
> > > > > > > disabled, it is not required.
> > > > > > > 
> > > > > > > Signed-off-by: Peng Fan 
> > > > > > 
> > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > > > > > Xen was there first, but everyone else is using that now.
> > > > > 
> > > > > Unfortunately it is complicated and it is not related to
> > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > 
> > > > > 
> > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> > > > > foreign mappings (memory coming from other VMs) to physical addresses.
> > > > > On x86, it also uses dma_ops to translate Linux's idea of a physical
> > > > > address into a real physical address (this is unneeded on ARM.)
> > > > > 
> > > > > 
> > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on 
> > > > > Xen/x86
> > > > > always and on Xen/ARM if Linux is Dom0 (because it has foreign
> > > > > mappings.) That is why we have the if (xen_domain) return true; in
> > > > > vring_use_dma_api.
> > > > 
> > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > 
> > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > >
> > > > Unfortunately as a result Xen never got around to
> > > > properly setting VIRTIO_F_IOMMU_PLATFORM.
> > > 
> > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because
> > > the usage of swiotlb_xen is not a property of virtio,
> > 
> > 
> > Basically any device without VIRTIO_F_ACCESS_PLATFORM
> > (that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is
> > what linux calls it) is declared as "special, don't follow normal rules
> > for access".
> > 
> > So yes swiotlb_xen is not a property of virtio, but what *is* a property
> > of virtio is that it's not special, just a regular device from DMA POV.
> 
> I am trying to understand what you meant but I think I am missing
> something.
> 
> Are you saying that modern virtio should always have
> VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other devices?

I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you
have some special needs e.g. you are very sure it's ok to bypass DMA
ops, or you need to support a legacy guest (produced in the window
between virtio 1 support in 2014 and support for
VIRTIO_F_ACCESS_PLATFORM in 2016).


> If that is the case, how is it possible that virtio breaks on ARM using
> the default dma_ops? The breakage is not Xen related (except that Xen
> turns dma_ops on). The original message from Peng was:
> 
>   vring_map_one_sg -> vring_use_dma_api
>-> dma_map_page
>  -> __swiotlb_map_page
>   ->swiotlb_map_page
>   ->__dma_map_area(phys_to_virt(dma_to_phys(dev, 
> dev_addr)), size, dir);
>   However we are using per device dma area for rpmsg, phys_to_virt
>   could not return a correct virtual address for virtual address in
>   vmalloc area. Then kernel panic.
> 
> I must be missing something. Maybe it is because it has to do with RPMesg?

I think it's an RPMesg bug, yes.

> 
> > > > > You might have noticed that I missed one possible case above: Xen/ARM
> > > > > DomU :-)
> > > > > 
> > > > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So 
> > > > > if
> > > 

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-24 Thread Michael S. Tsirkin
On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > 
> > > > > Use xen_swiotlb to determine when vring should use dma APIs to map the
> > > > > ring: when xen_swiotlb is enabled the dma API is required. When it is
> > > > > disabled, it is not required.
> > > > > 
> > > > > Signed-off-by: Peng Fan 
> > > > 
> > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > > > Xen was there first, but everyone else is using that now.
> > > 
> > > Unfortunately it is complicated and it is not related to
> > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > 
> > > 
> > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> > > foreign mappings (memory coming from other VMs) to physical addresses.
> > > On x86, it also uses dma_ops to translate Linux's idea of a physical
> > > address into a real physical address (this is unneeded on ARM.)
> > > 
> > > 
> > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on 
> > > Xen/x86
> > > always and on Xen/ARM if Linux is Dom0 (because it has foreign
> > > mappings.) That is why we have the if (xen_domain) return true; in
> > > vring_use_dma_api.
> > 
> > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > 
> > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> >
> > Unfortunately as a result Xen never got around to
> > properly setting VIRTIO_F_IOMMU_PLATFORM.
> 
> I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because
> the usage of swiotlb_xen is not a property of virtio,


Basically any device without VIRTIO_F_ACCESS_PLATFORM
(that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is
what linux calls it) is declared as "special, don't follow normal rules
for access".

So yes swiotlb_xen is not a property of virtio, but what *is* a property
of virtio is that it's not special, just a regular device from DMA POV.


> it is a detail of
> the way Linux does Xen address translations. swiotlb-xen is used to do
> these translations and it is hooked into the dma_ops framework.
> 
> It would be possible to have a device in hardware that is
> virtio-compatible and doesn't set VIRTIO_F_IOMMU_PLATFORM.

That device would be basically broken, since hardware
can't know whether it can access all memory or not.

> The device
> could be directly assigned (passthrough) to a DomU. We would still
> have to use swiotlb_xen if Xen is running.
> 
> You should think of swiotlb-xen as only internal to Linux and not
> related to whether the (virtual or non-virtual) hardware comes with an
> IOMMU or not.

IOMMU is a misnomer here.  Virtio spec now calls this bit
VIRTIO_F_ACCESS_PLATFORM. We should have done the same a while ago -
I'll send a patch.

> 
> > > You might have noticed that I missed one possible case above: Xen/ARM
> > > DomU :-)
> > > 
> > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So if
> > > (xen_domain) return true; would give the wrong answer in that case.
> > > Linux would end up calling the "normal" dma_ops, not swiotlb-xen, and
> > > the "normal" dma_ops fail.
> > > 
> > > 
> > > The solution I suggested was to make the check in vring_use_dma_api more
> > > flexible by returning true if the swiotlb_xen is supposed to be used,
> > > not in general for all Xen domains, because that is what the check was
> > > really meant to do.
> > 
> > Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong with 
> > that?
> 
> swiotlb-xen is not used on Xen/ARM DomU, the default dma_ops are the
> ones that are used. So you are saying, why don't we fix the default
> dma_ops to work with virtio?
> 
> It is bad that the default dma_ops crash with virtio, so yes I think it
> would be good to fix that. However, even if we fixed that, the if
> (xen_domain()) check in vring_use_dma_api is still a problem.

Why is it a problem? It just makes virtio use DMA API.
If that in turn works, problem solved.



> 
> Alternatively we could try to work-around it from swiotlb-xen. We could
> enable swiotlb-xen for Xen/ARM DomUs with a different implementation so
> that we could leave the vring_use_dma_api check unmodified.
> 
> It would be ugly because we would have to figure out from the new
> swiotlb-xen functions if the device is a normal device, so we have to
> call the regular dma_ops functions, or if the device is a virtio device,
> in which case there is nothing to do. I think it is undesirable but
> could probably be made to work.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-24 Thread Michael S. Tsirkin
On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > Export xen_swiotlb for all platforms using xen swiotlb
> > > 
> > > Use xen_swiotlb to determine when vring should use dma APIs to map the
> > > ring: when xen_swiotlb is enabled the dma API is required. When it is
> > > disabled, it is not required.
> > > 
> > > Signed-off-by: Peng Fan 
> > 
> > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > Xen was there first, but everyone else is using that now.
> 
> Unfortunately it is complicated and it is not related to
> VIRTIO_F_IOMMU_PLATFORM :-(
> 
> 
> The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> foreign mappings (memory coming from other VMs) to physical addresses.
> On x86, it also uses dma_ops to translate Linux's idea of a physical
> address into a real physical address (this is unneeded on ARM.)
> 
> 
> So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on Xen/x86
> always and on Xen/ARM if Linux is Dom0 (because it has foreign
> mappings.) That is why we have the if (xen_domain) return true; in
> vring_use_dma_api.

VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.

Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.

Unfortunately as a result Xen never got around to
properly setting VIRTIO_F_IOMMU_PLATFORM.

I would like to make Xen do what everyone else is doing
which is just set VIRTIO_F_IOMMU_PLATFORM and then put
platform specific hacks inside DMA API.
Then we can think about deprecating the Xen hack in a
distance future, or hiding it behind a static branch, or something
like this.


> You might have noticed that I missed one possible case above: Xen/ARM
> DomU :-)
> 
> Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. So if
> (xen_domain) return true; would give the wrong answer in that case.
> Linux would end up calling the "normal" dma_ops, not swiotlb-xen, and
> the "normal" dma_ops fail.
> 
> 
> The solution I suggested was to make the check in vring_use_dma_api more
> flexible by returning true if the swiotlb_xen is supposed to be used,
> not in general for all Xen domains, because that is what the check was
> really meant to do.

Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong with that?


> 
> In this regards I have two more comments:
> 
> - the comment on top of the check in vring_use_dma_api is still valid
> - the patch looks broken on x86: it should always return true, but it
>   returns false instead
> 
>  
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index a2de775801af..768afd79f67a 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -252,7 +252,7 @@ static bool vring_use_dma_api(struct virtio_device 
> > > *vdev)
> > >* the DMA API if we're a Xen guest, which at least allows
> > >* all of the sensible Xen configurations to work correctly.
> > >*/
> > > - if (xen_domain())
> > > + if (xen_vring_use_dma())
> > >   return true;
> > >  
> > >   return false;
> > 
> > 
> > The comment above this should probably be fixed.
> 
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-24 Thread Michael S. Tsirkin
On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> Export xen_swiotlb for all platforms using xen swiotlb
> 
> Use xen_swiotlb to determine when vring should use dma APIs to map the
> ring: when xen_swiotlb is enabled the dma API is required. When it is
> disabled, it is not required.
> 
> Signed-off-by: Peng Fan 

Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
Xen was there first, but everyone else is using that now.


> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index a2de775801af..768afd79f67a 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -252,7 +252,7 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>* the DMA API if we're a Xen guest, which at least allows
>* all of the sensible Xen configurations to work correctly.
>*/
> - if (xen_domain())
> + if (xen_vring_use_dma())
>   return true;
>  
>   return false;


The comment above this should probably be fixed.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-14 Thread Michael S. Tsirkin
On Thu, May 14, 2020 at 12:50:16PM +0200, Jean-Philippe Brucker wrote:
> On Thu, May 14, 2020 at 05:31:00AM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 14, 2020 at 01:22:37PM +0530, Bharat Bhushan wrote:
> > > Different endpoint can support different page size, probe
> > > endpoint if it supports specific page size otherwise use
> > > global page sizes.
> > > 
> > > Device attached to domain should support a minimum of
> > > domain supported page sizes. If device supports more
> > > than domain supported page sizes then device is limited
> > > to use domain supported page sizes only.
> > 
> > OK so I am just trying to figure it out.
> > Before the patch, we always use the domain supported page sizes
> > right?
> > 
> > With the patch, we still do, but we also probe and
> > validate that device supports all domain page sizes,
> > if it does not then we fail to attach the device.
> 
> Generally there is one endpoint per domain. Linux creates the domains and
> decides which endpoint goes in which domain. It puts multiple endpoints in
> a domain in two cases:
> 
> * If endpoints cannot be isolated from each others by the IOMMU, for
>   example if ACS isolation isn't enabled in PCIe. In that case endpoints
>   are in the same IOMMU group, and therefore contained in the same domain.
>   This is more of a quirk for broken hardware, and isn't much of a concern
>   for virtualization because it's easy for the hypervisor to present
>   endpoints isolated from each others.

Unless they aren't isolated on real hardware :)

> * If userspace wants to put endpoints in the same VFIO container, then
>   VFIO first attempts to put them in the same IOMMU domain, and falls back
>   to multiple domains if that fails. That case is just a luxury and we
>   shouldn't over-complicate the driver to cater for this.
> 
> So the attach checks don't need to be that complicated. Checking that the
> page masks are exactly the same should be enough.
> 
> > This seems like a lot of effort for little benefit, can't
> > hypervisor simply make sure endpoints support the
> > iommu page sizes for us?
> 
> I tend to agree, it's not very likely that we'll have a configuration with
> different page sizes between physical and virtual endpoints.
> 
> If there is a way for QEMU to simply reject VFIO devices that don't use
> the same page mask as what's configured globally, let's do that instead of
> introducing the page_size_mask property.

Or we can even do the subset thing in QEMU. Can be transparent to
guests.


So I guess this patch isn't really needed then.

> 
> > > @@ -615,7 +636,7 @@ static int viommu_domain_finalise(struct 
> > > viommu_endpoint *vdev,
> > >   struct viommu_dev *viommu = vdev->viommu;
> > >   struct viommu_domain *vdomain = to_viommu_domain(domain);
> > >  
> > > - viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> > > + viommu_page_size = 1UL << __ffs(vdev->pgsize_bitmap);
> > >   if (viommu_page_size > PAGE_SIZE) {
> > >   dev_err(vdev->dev,
> > >   "granule 0x%lx larger than system page size 0x%lx\n",
> > 
> > 
> > Looks like this is messed up on 32 bit: e.g. 0x1 will try to do
> > 1UL << -1, which is undefined behaviour. Which is btw already messed up
> > wrt viommu->pgsize_bitmap, but that's not a reason to propagate
> > the error.
> 
> Realistically we're not going to have a page granule larger than 4G, it's
> going to be 4k or 64k. But we can add a check that truncates the
> page_size_mask to 32-bit and makes sure that it's non-null.

... on 32 bit

> 
> > > +struct virtio_iommu_probe_pgsize_mask {
> > > + struct virtio_iommu_probe_property  head;
> > > + __u8reserved[4];
> > > + /* Same format as virtio_iommu_config::page_size_mask */
> > 
> > It's actually slightly different in that
> > this must be a superset of domain page size mask, right?
> 
> No it overrides the global mask

OK so I'd copy the comment and tweak it rather than
refer to virtio_iommu_config::page_size_mask
(besides, virtio_iommu_config::page_size_mask isn't legal C,
I know C++ so I figured out what's meant but it's
better to just say "page_size_mask in sturct virtio_iommu_config" )


> 
> > > + __le64  pgsize_bitmap;
> 
> Bharat, please rename this to page_size_mask for consistency
> 
> Thanks,
> Jean
> 
> > > +};
> > > +
> > >  #define VIRTIO_IOMMU_RESV_MEM_T_RESERVED 0
> > >  #define VIRTIO_IOMMU_RESV_MEM_T_MSI  1
> > >  
> > > -- 
> > > 2.17.1
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-14 Thread Michael S. Tsirkin
On Thu, May 14, 2020 at 01:22:37PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Device attached to domain should support a minimum of
> domain supported page sizes. If device supports more
> than domain supported page sizes then device is limited
> to use domain supported page sizes only.

OK so I am just trying to figure it out.
Before the patch, we always use the domain supported page sizes
right? With the patch, we still do, but we also probe and
validate that device supports all domain page sizes,
if it does not then we fail to attach the device.

This seems like a lot of effort for little benefit, can't
hypervisor simply make sure endpoints support the
iommu page sizes for us?



> 
> Signed-off-by: Bharat Bhushan 
> ---
> v5->v6
>  - property length before dereference
>  - Error out on no supported page sizes (page-size-mask is zero)
>  - Allow device to attach to domain even it supports
>minimum of domain supported page sizes. In that case device
>will use domain page sizes only.
>  - added format of pgsize_bitmap
> 
> v4->v5:
>  - Rebase to Linux v5.7-rc4
> 
> v3->v4:
>  - Fix whitespace error
> 
> v2->v3:
>  - Fixed error return for incompatible endpoint
>  - __u64 changed to __le64 in header file
> 
>  drivers/iommu/virtio-iommu.c  | 63 ---
>  include/uapi/linux/virtio_iommu.h | 14 ++-
>  2 files changed, 71 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 4e1d11af23c8..cbac3047a781 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,23 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask,
> + size_t len)
> +{
> + u64 pgsize_bitmap;
> +
> + if (len < sizeof(*mask))
> + return -EINVAL;
> +
> + pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> + if (!pgsize_bitmap)
> + return -EINVAL;
> +
> + vdev->pgsize_bitmap = pgsize_bitmap;
> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -499,6 +517,9 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }
> @@ -615,7 +636,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>   struct viommu_dev *viommu = vdev->viommu;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> - viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> + viommu_page_size = 1UL << __ffs(vdev->pgsize_bitmap);
>   if (viommu_page_size > PAGE_SIZE) {
>   dev_err(vdev->dev,
>   "granule 0x%lx larger than system page size 0x%lx\n",


Looks like this is messed up on 32 bit: e.g. 0x1 will try to do
1UL << -1, which is undefined behaviour. Which is btw already messed up
wrt viommu->pgsize_bitmap, but that's not a reason to propagate
the error.


> @@ -630,7 +651,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>  
>   vdomain->id = (unsigned int)ret;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
>   vdomain->map_flags  = viommu->map_flags;
> @@ -654,6 +675,38 @@ static void viommu_domain_free(struct iommu_domain 
> *domain)
>   kfree(vdomain);
>  }
>  
> +/*
> + * Check whether the endpoint's capabilities are compatible with other
> + * endpoints in the domain. Report any inconsistency.

This actually has side effects, so _is_ isn't a good name for it.
viommu_endpoint_compatible?

> + */
> +static bool viommu_endpoint_is_compatible(struct viommu_endpoint *vdev,
> +   struct viommu_domain *vdomain)
> +{
> + struct device *dev 

Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-12 Thread Michael S. Tsirkin
On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> v4->v5:
>  - Rebase to Linux v5.7-rc4
> 
> v3->v4:
>  - Fix whitespace error
> 
> v2->v3:
>  - Fixed error return for incompatible endpoint
>  - __u64 changed to __le64 in header file
> 
>  drivers/iommu/virtio-iommu.c  | 48 ---
>  include/uapi/linux/virtio_iommu.h |  7 +
>  2 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index d5cac4f46ca5..9513d2ab819e 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask,
> + size_t len)
> +{
> + u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> +
> + if (len < sizeof(*mask))
> + return -EINVAL;
> +
> + vdev->pgsize_bitmap = pgsize_bitmap;
> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -499,6 +513,9 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }

So given this is necessary early in boot, how about we
add this in the config space? And maybe ACPI too ...


> @@ -630,7 +647,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>  
>   vdomain->id = (unsigned int)ret;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
>   vdomain->map_flags  = viommu->map_flags;
> @@ -654,6 +671,29 @@ static void viommu_domain_free(struct iommu_domain 
> *domain)
>   kfree(vdomain);
>  }
>  
> +/*
> + * Check whether the endpoint's capabilities are compatible with other
> + * endpoints in the domain. Report any inconsistency.
> + */
> +static bool viommu_endpoint_is_compatible(struct viommu_endpoint *vdev,
> +   struct viommu_domain *vdomain)
> +{
> + struct device *dev = vdev->dev;
> +
> + if (vdomain->viommu != vdev->viommu) {
> + dev_err(dev, "cannot attach to foreign vIOMMU\n");
> + return false;
> + }
> +
> + if (vdomain->domain.pgsize_bitmap != vdev->pgsize_bitmap) {
> + dev_err(dev, "incompatible domain bitmap 0x%lx != 0x%llx\n",
> + vdomain->domain.pgsize_bitmap, vdev->pgsize_bitmap);
> + return false;
> + }
> +
> + return true;
> +}
> +
>  static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>   int i;
> @@ -670,9 +710,8 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
> struct device *dev)
>* owns it.
>*/
>   ret = viommu_domain_finalise(vdev, domain);
> - } else if (vdomain->viommu != vdev->viommu) {
> - dev_err(dev, "cannot attach to foreign vIOMMU\n");
> - ret = -EXDEV;
> + } else if (!viommu_endpoint_is_compatible(vdev, vdomain)) {
> + ret = -EINVAL;
>   }
>   mutex_unlock(>mutex);
>  
> @@ -886,6 +925,7 @@ static int viommu_add_device(struct device *dev)
>  
>   vdev->dev = dev;
>   vdev->viommu = viommu;
> + vdev->pgsize_bitmap = viommu->pgsize_bitmap;
>   INIT_LIST_HEAD(>resv_regions);
>   dev_iommu_priv_set(dev, vdev);
>  
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 48e3c29223b5..2cced7accc99 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -111,6 +111,7 @@ struct virtio_iommu_req_unmap {
>  
>  #define VIRTIO_IOMMU_PROBE_T_NONE0
>  #define VIRTIO_IOMMU_PROBE_T_RESV_MEM   

Re: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2020 at 02:51:32PM +0200, Auger Eric wrote:
> Hi,
> 
> On 5/7/20 1:32 PM, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2020 at 11:24:29AM +, Bharat Bhushan wrote:
> >>
> >>
> >>> -----Original Message-
> >>> From: Michael S. Tsirkin 
> >>> Sent: Wednesday, May 6, 2020 5:53 AM
> >>> To: Bharat Bhushan 
> >>> Cc: jean-phili...@linaro.org; j...@8bytes.org; jasow...@redhat.com;
> >>> virtualizat...@lists.linux-foundation.org; 
> >>> iommu@lists.linux-foundation.org;
> >>> linux-ker...@vger.kernel.org; eric.auger@gmail.com; 
> >>> eric.au...@redhat.com
> >>> Subject: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap 
> >>> supported by
> >>> endpoint
> >>>
> >>> External Email
> >>>
> >>> --
> >>> On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> >>>> Different endpoint can support different page size, probe endpoint if
> >>>> it supports specific page size otherwise use global page sizes.
> >>>>
> >>>> Signed-off-by: Bharat Bhushan 
> >>>> ---
> >>>> v4->v5:
> >>>>  - Rebase to Linux v5.7-rc4
> >>>>
> >>>> v3->v4:
> >>>>  - Fix whitespace error
> >>>>
> >>>> v2->v3:
> >>>>  - Fixed error return for incompatible endpoint
> >>>>  - __u64 changed to __le64 in header file
> >>>>
> >>>>  drivers/iommu/virtio-iommu.c  | 48 ---
> >>>>  include/uapi/linux/virtio_iommu.h |  7 +
> >>>>  2 files changed, 51 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/iommu/virtio-iommu.c
> >>>> b/drivers/iommu/virtio-iommu.c index d5cac4f46ca5..9513d2ab819e 100644
> >>>> --- a/drivers/iommu/virtio-iommu.c
> >>>> +++ b/drivers/iommu/virtio-iommu.c
> >>>> @@ -78,6 +78,7 @@ struct viommu_endpoint {
> >>>>  struct viommu_dev   *viommu;
> >>>>  struct viommu_domain*vdomain;
> >>>>  struct list_headresv_regions;
> >>>> +u64 pgsize_bitmap;
> >>>>  };
> >>>>
> >>>>  struct viommu_request {
> >>>> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct
> >>> viommu_domain *vdomain)
> >>>>  return ret;
> >>>>  }
> >>>>
> >>>> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> >>>> +struct 
> >>>> virtio_iommu_probe_pgsize_mask *mask,
> >>>> +size_t len)
> >>>> +{
> >>>> +u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> >>>> +
> >>>> +if (len < sizeof(*mask))
> >>>
> >>> This is too late to validate length, you have dereferenced it already.
> >>> do it before the read pls.
> >>
> >> Yes, Will change here and other places as well
> >>
> >>>
> >>>> +return -EINVAL;
> >>>
> >>> OK but note that guest will then just proceed to ignore the property. Is 
> >>> that really
> >>> OK? Wouldn't host want to know?
> >>
> >>
> >> Guest need to be in sync with device, so yes seems like guest need to tell 
> >> device which page-size-mask it is using.
> >>
> >> Corresponding spec change patch 
> >> (https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06214.html)
> >>
> >> Would like Jean/Eric to comment here as well.
> >>
> >>>
> >>>
> >>>> +
> >>>> +vdev->pgsize_bitmap = pgsize_bitmap;
> >>>
> >>> what if bitmap is 0? Is that a valid size? I see a bunch of BUG_ON with 
> >>> that value ...
> >>
> >> As per spec proposed device is supposed to set at-least one bit.
> >> Will add a bug_on her.
> > 
> > Or better fail probe ...
> Yes I agree I would rather fail the probe.
> > 
> >> Should we add bug_on or switch to global config page-size mas

Re: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2020 at 11:24:29AM +, Bharat Bhushan wrote:
> 
> 
> > -Original Message-
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, May 6, 2020 5:53 AM
> > To: Bharat Bhushan 
> > Cc: jean-phili...@linaro.org; j...@8bytes.org; jasow...@redhat.com;
> > virtualizat...@lists.linux-foundation.org; iommu@lists.linux-foundation.org;
> > linux-ker...@vger.kernel.org; eric.auger@gmail.com; 
> > eric.au...@redhat.com
> > Subject: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported 
> > by
> > endpoint
> > 
> > External Email
> > 
> > --
> > On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> > > Different endpoint can support different page size, probe endpoint if
> > > it supports specific page size otherwise use global page sizes.
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > > v4->v5:
> > >  - Rebase to Linux v5.7-rc4
> > >
> > > v3->v4:
> > >  - Fix whitespace error
> > >
> > > v2->v3:
> > >  - Fixed error return for incompatible endpoint
> > >  - __u64 changed to __le64 in header file
> > >
> > >  drivers/iommu/virtio-iommu.c  | 48 ---
> > >  include/uapi/linux/virtio_iommu.h |  7 +
> > >  2 files changed, 51 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/iommu/virtio-iommu.c
> > > b/drivers/iommu/virtio-iommu.c index d5cac4f46ca5..9513d2ab819e 100644
> > > --- a/drivers/iommu/virtio-iommu.c
> > > +++ b/drivers/iommu/virtio-iommu.c
> > > @@ -78,6 +78,7 @@ struct viommu_endpoint {
> > >   struct viommu_dev   *viommu;
> > >   struct viommu_domain*vdomain;
> > >   struct list_headresv_regions;
> > > + u64 pgsize_bitmap;
> > >  };
> > >
> > >  struct viommu_request {
> > > @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct
> > viommu_domain *vdomain)
> > >   return ret;
> > >  }
> > >
> > > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> > > + struct virtio_iommu_probe_pgsize_mask *mask,
> > > + size_t len)
> > > +{
> > > + u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> > > +
> > > + if (len < sizeof(*mask))
> > 
> > This is too late to validate length, you have dereferenced it already.
> > do it before the read pls.
> 
> Yes, Will change here and other places as well
> 
> > 
> > > + return -EINVAL;
> > 
> > OK but note that guest will then just proceed to ignore the property. Is 
> > that really
> > OK? Wouldn't host want to know?
> 
> 
> Guest need to be in sync with device, so yes seems like guest need to tell 
> device which page-size-mask it is using.
> 
> Corresponding spec change patch 
> (https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06214.html)
> 
> Would like Jean/Eric to comment here as well.
> 
> > 
> > 
> > > +
> > > + vdev->pgsize_bitmap = pgsize_bitmap;
> > 
> > what if bitmap is 0? Is that a valid size? I see a bunch of BUG_ON with 
> > that value ...
> 
> As per spec proposed device is supposed to set at-least one bit.
> Will add a bug_on her.

Or better fail probe ...

> Should we add bug_on or switch to global config page-size mask if this is 
> zero (notify device which page-size-mask it is using).

It's a spec violation, I wouldn't try to use the device.

> > 
> > I also see a bunch of code like e.g. this:
> > 
> > pg_size = 1UL << __ffs(pgsize_bitmap);
> > 
> > which probably won't DTRT on a 32 bit guest if the bitmap has bits set in 
> > the high
> > word.
> > 
> 
> My thought is that in that case viommu_domain_finalise() will fail, do not 
> proceed.

That's undefined behaviour in C. You need to make sure this condition
is never reached. And spec does not make this illegal at all
so it looks like we actually need to handle this gracefully.


> > 
> > 
> > > + return 0;
> > > +}
> > > +
> > >  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> > >  struct virtio_iommu_probe_resv_mem *mem,
> > >  size_t len)
> > > @@ -499,6 +513,9 @@ static int viommu_pr

Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-05 Thread Michael S. Tsirkin
On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> v4->v5:
>  - Rebase to Linux v5.7-rc4
> 
> v3->v4:
>  - Fix whitespace error
> 
> v2->v3:
>  - Fixed error return for incompatible endpoint
>  - __u64 changed to __le64 in header file
> 
>  drivers/iommu/virtio-iommu.c  | 48 ---
>  include/uapi/linux/virtio_iommu.h |  7 +
>  2 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index d5cac4f46ca5..9513d2ab819e 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask,
> + size_t len)
> +{
> + u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
> +
> + if (len < sizeof(*mask))

This is too late to validate length, you have dereferenced it already.
do it before the read pls.

> + return -EINVAL;

OK but note that guest will then just proceed to ignore the
property. Is that really OK? Wouldn't host want to know?


> +
> + vdev->pgsize_bitmap = pgsize_bitmap;

what if bitmap is 0? Is that a valid size? I see a bunch of
BUG_ON with that value ...

I also see a bunch of code like e.g. this:

pg_size = 1UL << __ffs(pgsize_bitmap);

which probably won't DTRT on a 32 bit guest if the bitmap has bits
set in the high word.



> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -499,6 +513,9 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }
> @@ -630,7 +647,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>  
>   vdomain->id = (unsigned int)ret;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
>   vdomain->map_flags  = viommu->map_flags;
> @@ -654,6 +671,29 @@ static void viommu_domain_free(struct iommu_domain 
> *domain)
>   kfree(vdomain);
>  }
>  
> +/*
> + * Check whether the endpoint's capabilities are compatible with other
> + * endpoints in the domain. Report any inconsistency.
> + */
> +static bool viommu_endpoint_is_compatible(struct viommu_endpoint *vdev,
> +   struct viommu_domain *vdomain)
> +{
> + struct device *dev = vdev->dev;
> +
> + if (vdomain->viommu != vdev->viommu) {
> + dev_err(dev, "cannot attach to foreign vIOMMU\n");
> + return false;
> + }
> +
> + if (vdomain->domain.pgsize_bitmap != vdev->pgsize_bitmap) {
> + dev_err(dev, "incompatible domain bitmap 0x%lx != 0x%llx\n",
> + vdomain->domain.pgsize_bitmap, vdev->pgsize_bitmap);
> + return false;
> + }

I'm confused by this. So let's assume host supports pages sizes
of 4k, 2M, 1G. It signals this in the properties. Nice.
Now domain supports 4k, 2M and that's all. Why is that a problem?
Just don't use 1G ...


> +
> + return true;
> +}
> +
>  static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>   int i;
> @@ -670,9 +710,8 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
> struct device *dev)
>* owns it.
>*/
>   ret = viommu_domain_finalise(vdev, domain);
> - } else if (vdomain->viommu != vdev->viommu) {
> - dev_err(dev, "cannot attach to foreign vIOMMU\n");
> - ret = -EXDEV;
> + } else if (!viommu_endpoint_is_compatible(vdev, vdomain)) {
> + ret = -EINVAL;
>   }
>   mutex_unlock(>mutex);
>  
> @@ -886,6 +925,7 @@ static int 

Re: [RFC/PATCH 0/1] virtio_mmio: hypervisor specific interfaces for MMIO

2020-04-30 Thread Michael S. Tsirkin
On Thu, Apr 30, 2020 at 03:32:55PM +0530, Srivatsa Vaddagiri wrote:
> The Type-1 hypervisor we are dealing with does not allow for MMIO transport. 

How about PCI then?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:
> On 29.04.20 12:20, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > > That would still not work I think where swiotlb is used for pass-thr 
> > > devices
> > > (when private memory is fine) as well as virtio devices (when shared 
> > > memory is
> > > required).
> > 
> > So that is a separate question. When there are multiple untrusted
> > devices, at the moment it looks like a single bounce buffer is used.
> > 
> > Which to me seems like a security problem, I think we should protect
> > untrusted devices from each other.
> > 
> 
> Definitely. That's the model we have for ivshmem-virtio as well.
> 
> Jan

Want to try implementing that?

> -- 
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> That would still not work I think where swiotlb is used for pass-thr devices
> (when private memory is fine) as well as virtio devices (when shared memory is
> required).

So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.





> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 03:14:10PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin  [2020-04-29 02:50:41]:
> 
> > So it seems that with modern Linux, all one needs
> > to do on x86 is mark the device as untrusted.
> > It's already possible to do this with ACPI and with OF - would that be
> > sufficient for achieving what this patchset is trying to do?
> 
> In my case, its not sufficient to just mark virtio device untrusted and thus
> activate the use of swiotlb. All of the secondary VM memory, including those
> allocate by swiotlb driver, is private to it.

So why not make the bounce buffer memory shared then?

> An additional piece of memory is
> available to secondary VM which is shared between VMs and which is where I 
> need
> swiotlb driver to do its work.
> 
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote:
> On 2020/4/29 12:57, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> > > On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > > > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > > > > 
> > > > > > Okay, but how is all this virtio specific?  For example, why not 
> > > > > > allow
> > > > > > separate swiotlbs for any type of device?
> > > > > > For example, this might make sense if a given device is from a
> > > > > > different, less trusted vendor.
> > > > > Is swiotlb commonly used for multiple devices that may be on 
> > > > > different trust
> > > > > boundaries (and not behind a hardware iommu)?
> > > > Even a hardware iommu does not imply a 100% security from malicious
> > > > hardware. First lots of people use iommu=pt for performance reasons.
> > > > Second even without pt, unmaps are often batched, and sub-page buffers
> > > > might be used for DMA, so we are not 100% protected at all times.
> > > > 
> > > 
> > > For untrusted devices, IOMMU is forced on even iommu=pt is used;
> > 
> > I think you are talking about untrusted *drivers* like with VFIO.
> 
> No. I am talking about untrusted devices like thunderbolt peripherals.
> We always trust drivers hosted in kernel and the DMA APIs are designed
> for them, right?
> 
> Please refer to this series.
> 
> https://lkml.org/lkml/2019/9/6/39
> 
> Best regards,
> baolu

Oh, thanks for that! I didn't realize Linux is doing this.

So it seems that with modern Linux, all one needs
to do on x86 is mark the device as untrusted.
It's already possible to do this with ACPI and with OF - would that be
sufficient for achieving what this patchset is trying to do?

Adding more ways to mark a device as untrusted, and adding
support for more platforms to use bounce buffers
sounds like a reasonable thing to do.

> > 
> > On the other hand, I am talking about things like thunderbolt
> > peripherals being less trusted than on-board ones.
> 
> 
> 
> > 
> > Or possibly even using swiotlb for specific use-cases where
> > speed is less of an issue.
> > 
> > E.g. my wifi is pretty slow anyway, and that card is exposed to
> > malicious actors all the time, put just that behind swiotlb
> > for security, and leave my graphics card with pt since
> > I'm trusting it with secrets anyway.
> > 
> > 
> > > and
> > > iotlb flush is in strict mode (no batched flushes); ATS is also not
> > > allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> > > only apply page granularity protection. Swiotlb is now used for devices
> > > from different trust zone.
> > > 
> > > Best regards,
> > > baolu
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > > 
> > > > Okay, but how is all this virtio specific?  For example, why not allow
> > > > separate swiotlbs for any type of device?
> > > > For example, this might make sense if a given device is from a
> > > > different, less trusted vendor.
> > > Is swiotlb commonly used for multiple devices that may be on different 
> > > trust
> > > boundaries (and not behind a hardware iommu)?
> > Even a hardware iommu does not imply a 100% security from malicious
> > hardware. First lots of people use iommu=pt for performance reasons.
> > Second even without pt, unmaps are often batched, and sub-page buffers
> > might be used for DMA, so we are not 100% protected at all times.
> > 
> 
> For untrusted devices, IOMMU is forced on even iommu=pt is used;

I think you are talking about untrusted *drivers* like with VFIO.

On the other hand, I am talking about things like thunderbolt
peripherals being less trusted than on-board ones.

Or possibly even using swiotlb for specific use-cases where
speed is less of an issue.

E.g. my wifi is pretty slow anyway, and that card is exposed to
malicious actors all the time, put just that behind swiotlb
for security, and leave my graphics card with pt since
I'm trusting it with secrets anyway.


> and
> iotlb flush is in strict mode (no batched flushes); ATS is also not
> allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> only apply page granularity protection. Swiotlb is now used for devices
> from different trust zone.
> 
> Best regards,
> baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> 
> > Okay, but how is all this virtio specific?  For example, why not allow
> > separate swiotlbs for any type of device?
> > For example, this might make sense if a given device is from a
> > different, less trusted vendor.
> 
> Is swiotlb commonly used for multiple devices that may be on different trust
> boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.


> If so, then yes it sounds like a
> good application of multiple swiotlb pools.
> 
> > All this can then maybe be hidden behind the DMA API.
> 
> Won't we still need some changes to virtio to make use of its own pool (to
> bounce buffers)? Something similar to its own DMA ops proposed in this patch?

If you are doing this for all devices, you need to either find a way
to do this without chaning DMA ops, or by doing some automatic change
to all drivers.


> > > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> > > +{
> > > + if (!bounce_buf_paddr)
> > > + return;
> > > +
> > > + set_dma_ops(vdev->dev.parent, _dma_ops);
> > 
> > 
> > I don't think DMA API maintainers will be happy with new users
> > of set_dma_ops.
> 
> Is there an alternate API that is more preffered?

all this is supposed to be part of DMA API itself. new drivers aren't
supposed to have custom DMA ops.

> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 05:09:18PM +0530, Srivatsa Vaddagiri wrote:
> For better security, its desirable that a guest VM's memory is
> not accessible to any entity that executes outside the context of
> guest VM. In case of virtio, backend drivers execute outside the
> context of guest VM and in general will need access to complete
> guest VM memory.  One option to restrict the access provided to
> backend driver is to make use of a bounce buffer. The bounce
> buffer is accessible to both backend and frontend drivers. All IO
> buffers that are in private space of guest VM are bounced to be
> accessible to backend.
> 
> This patch proposes a new memory pool to be used for this bounce
> purpose, rather than the default swiotlb memory pool. That will
> avoid any conflicts that may arise in situations where a VM needs
> to use swiotlb pool for driving any pass-through devices (in
> which case swiotlb memory needs not be shared with another VM) as
> well as virtio devices (which will require swiotlb memory to be
> shared with backend VM). As a possible extension to this patch,
> we can provide an option for virtio to make use of default
> swiotlb memory pool itself, where no such conflicts may exist in
> a given deployment.
> 
> Signed-off-by: Srivatsa Vaddagiri 


Okay, but how is all this virtio specific?  For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.
All this can then maybe be hidden behind the DMA API.



> ---
>  drivers/virtio/Makefile|   2 +-
>  drivers/virtio/virtio.c|   2 +
>  drivers/virtio/virtio_bounce.c | 150 
> +
>  include/linux/virtio.h |   4 ++
>  4 files changed, 157 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/virtio/virtio_bounce.c
> 
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 29a1386e..3fd3515 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
> +obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32..bc2f779 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
>  
>   dev->index = err;
>   dev_set_name(>dev, "virtio%u", dev->index);
> + virtio_bounce_set_dma_ops(dev);
>  
>   spin_lock_init(>config_lock);
>   dev->config_enabled = false;
> @@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore);
>  
>  static int virtio_init(void)
>  {
> + virtio_map_bounce_buffer();
>   if (bus_register(_bus) != 0)
>   panic("virtio bus registration failed");
>   return 0;
> diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
> new file mode 100644
> index 000..3de8e0e
> --- /dev/null
> +++ b/drivers/virtio/virtio_bounce.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Virtio DMA ops to bounce buffers
> + *
> + *  Copyright (c) 2020, The Linux Foundation. All rights reserved.
> + *
> + * This module allows bouncing of IO buffers to a region which will be
> + * accessible to backend drivers.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static phys_addr_t bounce_buf_paddr;
> +static void *bounce_buf_vaddr;
> +static size_t bounce_buf_size;
> +struct swiotlb_pool *virtio_pool;
> +
> +#define VIRTIO_MAX_BOUNCE_SIZE   (16*4096)
> +
> +static void *virtio_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs)
> +{
> + phys_addr_t addr;
> +
> + if (!virtio_pool)
> + return NULL;
> +
> + addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX);
> + if (addr == DMA_MAPPING_ERROR)
> + return NULL;
> +
> + *dma_handle = (addr - bounce_buf_paddr);
> +
> + return bounce_buf_vaddr + (addr - bounce_buf_paddr);
> +}
> +
> +static void virtio_free_coherent(struct device *dev, size_t size, void 
> *vaddr,
> + dma_addr_t dma_handle, unsigned long attrs)
> +{
> + phys_addr_t addr = (dma_handle + bounce_buf_paddr);
> +
> + swiotlb_free(virtio_pool, addr, size);
> +}
> +
> +static dma_addr_t virtio_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir, unsigned long attrs)
> +{
> + void *ptr = page_address(page) + offset;
> + phys_addr_t paddr = virt_to_phys(ptr);
> + dma_addr_t handle;
> +
> + if (!virtio_pool)
> + 

Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-04-13 Thread Michael S. Tsirkin
On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 237e36a280cb..ec57d215086a 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -16,6 +16,7 @@
>  #define VIRTIO_IOMMU_F_BYPASS3
>  #define VIRTIO_IOMMU_F_PROBE 4
>  #define VIRTIO_IOMMU_F_MMIO  5
> +#define VIRTIO_IOMMU_F_TOPOLOGY  6
>  
>  struct virtio_iommu_range_64 {
>   __le64  start;
> @@ -27,6 +28,12 @@ struct virtio_iommu_range_32 {
>   __le32  end;
>  };
>  
> +struct virtio_iommu_topo_config {
> + __le32  offset;

Any restrictions on offset? E.g. alignment?

> + __le32  num_items;
> + __le32  item_length;
> +};
> +
>  struct virtio_iommu_config {
>   /* Supported page sizes */
>   __le64  page_size_mask;
> @@ -36,6 +43,25 @@ struct virtio_iommu_config {
>   struct virtio_iommu_range_32domain_range;
>   /* Probe buffer size */
>   __le32  probe_size;
> + struct virtio_iommu_topo_config topo_config;
> +};
> +
> +#define VIRTIO_IOMMU_TOPO_PCI_RANGE  0x1
> +#define VIRTIO_IOMMU_TOPO_ENDPOINT   0x2
> +
> +struct virtio_iommu_topo_pci_range {
> + __le16  type;
> + __le16  hierarchy;
> + __le16  requester_start;
> + __le16  requester_end;
> + __le32  endpoint_start;
> +};
> +
> +struct virtio_iommu_topo_endpoint {
> + __le16  type;
> + __le16  reserved;
> + __le32  endpoint;
> + __le64  address;
>  };
>  
>  /* Request types */

As any UAPI change, this needs to be copied to virtio TC.

I believe an old version of QEMU patches was published there
but I don't think it was the latest one you tested against.

Description should preferably be added to spec too.

In partucular please add comments (in this header, too)
documenting the new fields, values and structures.

> -- 
> 2.25.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-11 Thread Michael S. Tsirkin
On Wed, Mar 11, 2020 at 06:48:22PM +0100, Jean-Philippe Brucker wrote:
> Yes "not elegant" in part because of the PCI fixup. Fixups are used to
> work around bugs

Not really - they are for anything unusual that common PCI code can not
handle on its own.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-05 Thread Michael S. Tsirkin
On Wed, Mar 04, 2020 at 10:54:49PM +0100, Joerg Roedel wrote:
> On Wed, Mar 04, 2020 at 01:37:44PM -0800, Jacob Pan (Jun) wrote:
> > + Mike and Erik who work closely on UEFI and ACPICA.
> > 
> > Copy paste Erik's initial response below on how to get a new table,
> > seems to confirm with the process you stated above.
> > 
> > "Fairly easy. You reserve a 4-letter symbol by sending a message
> > requesting to reserve the signature to Mike or the ASWG mailing list or
> > i...@acpi.info
> 
> Great! I think this is going to be the path forward here.
> 
> Regards,
> 
>   Joerg

I don't, I don't see why we should bother with ACPI. This is a PV device
anyway, we can just make it self-describing. The need for extra firmware
on some platforms is a bug not a feature. No point in emulating that.

> > 
> > There is also another option. You can have ASWG own this new table so
> > that not one entity or company owns the new table."

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-05 Thread Michael S. Tsirkin
On Wed, Mar 04, 2020 at 10:50:02PM +0100, Joerg Roedel wrote:
> On Wed, Mar 04, 2020 at 02:34:33PM -0500, Michael S. Tsirkin wrote:
> > All these extra levels of indirection is one of the reasons
> > hypervisors such as kata try to avoid ACPI.
> 
> Platforms that don't use ACPI need another hardware detection mechanism,
> which can also be supported. But the first step here is to enable the
> general case, and for x86 platforms this means ACPI.
> 
> Regards,
> 
>   Joerg

Frankly since a portable way to do it is needed anyway I don't see why
we also need ACPI.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-04 Thread Michael S. Tsirkin
On Wed, Mar 04, 2020 at 02:37:08PM +0100, Joerg Roedel wrote:
> Hi Michael,
> 
> On Tue, Mar 03, 2020 at 11:09:41AM -0500, Michael S. Tsirkin wrote:
> > No. It's coded into the hardware. Which might even be practical
> > for bare-metal (e.g. on-board flash), but is very practical
> > when the device is part of a hypervisor.
> 
> If its that way on PPC, than fine for them. But since this is enablement
> for x86, it should follow the x86 platform best practices, and that
> means describing hardware through ACPI.

For hardware, sure.  Hypervisors aren't hardware
though and a bunch of hypervisors don't use ACPI.


> > This "hardware" is actually part of hypervisor so there's no
> > reason it can't be completely self-descriptive. It's specified
> > by the same entity as the "firmware".
> 
> That is just an implementation detail. Yes, QEMU emulates the hardware
> and builds the ACPI tables. But it could also be implemented in a way
> where the ACPI tables are build by guest firmware.

All these extra levels of indirection is one of the reasons
hypervisors such as kata try to avoid ACPI.

> > I don't see why it would be much faster. The interface isn't that
> > different from command queues of VTD.
> 
> VirtIO IOMMU doesn't need to build page-tables that the hypervisor then
> has to shadow, which makes things much faster. If you emulate one of the
> other IOMMUs (VT-d or AMD-Vi) the code has to shadow the full page-table
> at once when device passthrough is used. VirtIO-IOMMU doesn't need that,
> and that makes it much faster and efficient.


IIUC VT-d at least supports range invalidations.

> 
> > Making ACPI meet the goals of embedded projects such as kata containers
> > would be a gigantic task with huge stability implications.  By
> > comparison this 400-line parser is well contained and does the job.  I
> > didn't yet see compelling reasons not to merge this, but I'll be
> > interested to see some more specific concerns.
> 
> An ACPI table parse wouldn't need more lines of code.

It realies on ACPI OSPM itself to handle ACPI bytecode, which is huge.


> For embedded
> systems there is still the DT way of describing things.

For some embedded systems.

> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Michael S. Tsirkin
On Tue, Mar 03, 2020 at 04:53:19PM +0100, Joerg Roedel wrote:
> On Tue, Mar 03, 2020 at 09:00:05AM -0500, Michael S. Tsirkin wrote:
> > Not necessarily. E.g. some power systems have neither.
> > There are also systems looking to bypass ACPI e.g. for boot speed.
> 
> If there is no firmware layer between the hardware and the OS the
> necessary information the OS needs to run on the hardware is probably
> hard-coded into the kernel?

No. It's coded into the hardware. Which might even be practical
for bare-metal (e.g. on-board flash), but is very practical
when the device is part of a hypervisor.

> In that case the same can be done with
> virtio-iommu tolopology.
> 
> > That sentence doesn't really answer the question, does it?
> 
> To be more elaborate, putting this information into config space is a
> layering violation. Hardware is never completly self-descriptive


This "hardware" is actually part of hypervisor so there's no
reason it can't be completely self-descriptive. It's specified
by the same entity as the "firmware".


> and
> that is why there is the firmware which provides the information about
> the hardware to the OS in a generic way.
>
> > Frankly with platform specific interfaces like ACPI, virtio-iommu is
> > much less compelling.  Describing topology as part of the device in a
> > way that is first, portable, and second, is a good fit for hypervisors,
> > is to me one of the main reasons virtio-iommu makes sense at all.
> 
> Virtio-IOMMU makes sense in the first place because it is much faster
> than emulating one of the hardware IOMMUs.

I don't see why it would be much faster. The interface isn't that
different from command queues of VTD.

> And an ACPI table is also
> portable to all ACPI platforms, same with device-tree.
> 
> Regards,
> 
>   Joerg

Making ACPI meet the goals of embedded projects such as kata containers
would be a gigantic task with huge stability implications.  By
comparison this 400-line parser is well contained and does the job.  I
didn't yet see compelling reasons not to merge this, but I'll be
interested to see some more specific concerns.


-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Michael S. Tsirkin
On Mon, Mar 02, 2020 at 05:16:12PM +0100, Joerg Roedel wrote:
> On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
> > This solution isn't elegant nor foolproof, but is the best we can do at
> > the moment and works with existing virtio-iommu implementations. It also
> > enables an IOMMU for lightweight hypervisors that do not rely on
> > firmware methods for booting.
> 
> I appreciate the enablement on x86, but putting the conmfiguration into
> mmio-space isn't really something I want to see upstream.

It's in virtio config space, not in mmio-space. With a PCI virtio-IOMMU
device this will be in memory.

> What is the
> problem with defining an ACPI table instead? This would also make things
> work on AARCH64 UEFI machines.
> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Michael S. Tsirkin
On Tue, Mar 03, 2020 at 02:01:56PM +0100, Joerg Roedel wrote:
> Hi Eric,
> 
> On Tue, Mar 03, 2020 at 11:19:20AM +0100, Auger Eric wrote:
> > Michael has pushed this solution (putting the "configuration in the PCI
> > config space"), I think for those main reasons:
> > - ACPI may not be supported on some archs/hyps
> 
> But on those there is device-tree, right?

Not necessarily. E.g. some power systems have neither.
There are also systems looking to bypass ACPI e.g. for boot speed.


> > - the virtio-iommu is a PCIe device so binding should not need ACPI
> > description
> 
> The other x86 IOMMUs are PCI devices too and they definitly need a ACPI
> table to be configured.
> 
> > Another issue with ACPI integration is we have different flavours of
> > tables that exist: IORT, DMAR, IVRS
> 
> An integration with IORT might be the best, but if it is not possible
> ther can be a new table-type for Virtio-iommu. That would still follow
> platform best practices.
> 
> > x86 ACPI integration was suggested with IORT. But it seems ARM is
> > reluctant to extend IORT to support para-virtualized IOMMU. So the VIOT
> > was proposed as a different atternative in "[RFC 00/13] virtio-iommu on
> > non-devicetree platforms"
> > (https://patchwork.kernel.org/cover/11257727/). Proposing a table that
> > may be used by different vendors seems to be a challenging issue here.
> 
> Yeah, if I am reading that right this proposes a one-fits-all solution.
> That is not needed as the other x86 IOMMUs already have their tables
> defined and implemented. There is no need to change anything there.
> 
> > So even if the above solution does not look perfect, it looked a
> > sensible compromise given the above arguments. Please could you explain
> > what are the most compelling arguments against it?
> 
> It is a hack and should be avoided if at all possible.

That sentence doesn't really answer the question, does it?

> And defining an
> own ACPI table type seems very much possible.

Frankly with platform specific interfaces like ACPI, virtio-iommu is
much less compelling.  Describing topology as part of the device in a
way that is first, portable, and second, is a good fit for hypervisors,
is to me one of the main reasons virtio-iommu makes sense at all.

> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-01 Thread Michael S. Tsirkin
On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
> Platforms without device-tree do not currently have a method for
> describing the vIOMMU topology. Provide a topology description embedded
> into the virtio device.
> 
> Use PCI FIXUP to probe the config space early, because we need to
> discover the topology before any DMA configuration takes place, and the
> virtio driver may be loaded much later. Since we discover the topology
> description when probing the PCI hierarchy, the virtual IOMMU cannot
> manage other platform devices discovered earlier.
> 
> This solution isn't elegant nor foolproof, but is the best we can do at
> the moment and works with existing virtio-iommu implementations. It also
> enables an IOMMU for lightweight hypervisors that do not rely on
> firmware methods for booting.
> 
> Signed-off-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  MAINTAINERS   |   2 +
>  drivers/iommu/Kconfig |  10 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu-topology.c | 343 ++
>  drivers/iommu/virtio-iommu.c  |   3 +
>  include/linux/virt_iommu.h|  19 ++
>  include/uapi/linux/virtio_iommu.h |  26 ++
>  7 files changed, 404 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virt_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fcd79fc38928..65a03ce53096 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17781,6 +17781,8 @@ M:Jean-Philippe Brucker 
>  L:   virtualizat...@lists.linux-foundation.org
>  S:   Maintained
>  F:   drivers/iommu/virtio-iommu.c
> +F:   drivers/iommu/virtio-iommu-topology.c
> +F:   include/linux/virt_iommu.h
>  F:   include/uapi/linux/virtio_iommu.h
>  
>  VIRTUAL BOX GUEST DEVICE DRIVER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index c5df570ef84a..f8cb45d84bb0 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -516,4 +516,14 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY
> + bool "Topology properties for the virtio-iommu"
> + depends on VIRTIO_IOMMU
> + default y
> + help
> +   Enable early probing of the virtio-iommu device, to detect the
> +   built-in topology description.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 9f33fdb3bb05..5da24280d08c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -37,3 +37,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
> diff --git a/drivers/iommu/virtio-iommu-topology.c 
> b/drivers/iommu/virtio-iommu-topology.c
> new file mode 100644
> index ..2188624ef216
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu-topology.c
> @@ -0,0 +1,343 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct viommu_cap_config {
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */
> +};
> +
> +union viommu_topo_cfg {
> + __le16  type;
> + struct virtio_iommu_topo_pci_range  pci;
> + struct virtio_iommu_topo_endpoint   ep;
> +};
> +
> +struct viommu_spec {
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headlist;
> + size_t  num_items;
> + /* The config array of length num_items follows */
> + union viommu_topo_cfg   cfg[];
> +};
> +
> +static LIST_HEAD(viommus);
> +static DEFINE_MUTEX(viommus_lock);
> +
> +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field)
> +
> +static inline int viommu_pci_find_capability(struct pci_dev *dev, u8 
> cfg_type,
> +  struct viommu_cap_config *cap)
> +{
> + int pos;
> + u8 bar;
> +
> + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +  pos > 0;
> +  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> + u8 type;
> +
> + pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), );
> + if (type != cfg_type)
> + continue;
> +
> + pci_read_config_byte(dev, pos + VPCI_FIELD(bar), );
> +
> + /* Ignore structures with reserved BAR values */
> +   

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-24 Thread Michael S. Tsirkin
On Mon, Feb 24, 2020 at 05:26:20PM +0800, Jason Wang wrote:
> 
> On 2020/2/24 下午3:48, Michael S. Tsirkin wrote:
> > On Mon, Feb 24, 2020 at 02:45:03PM +0800, Jason Wang wrote:
> > > On 2020/2/24 下午2:06, Michael S. Tsirkin wrote:
> > > > On Mon, Feb 24, 2020 at 12:01:57PM +0800, Jason Wang wrote:
> > > > > On 2020/2/21 下午10:56, Halil Pasic wrote:
> > > > > > On Fri, 21 Feb 2020 14:22:26 +0800
> > > > > > Jason Wang  wrote:
> > > > > > 
> > > > > > > On 2020/2/21 上午12:06, Halil Pasic wrote:
> > > > > > > > Currently if one intends to run a memory protection enabled VM 
> > > > > > > > with
> > > > > > > > virtio devices and linux as the guest OS, one needs to specify 
> > > > > > > > the
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the 
> > > > > > > > guest
> > > > > > > > linux use the DMA API, which in turn handles the memory
> > > > > > > > encryption/protection stuff if the guest decides to turn itself 
> > > > > > > > into
> > > > > > > > a protected one. This however makes no sense due to multiple 
> > > > > > > > reasons:
> > > > > > > > * The device is not changed by the fact that the guest RAM is
> > > > > > > > protected. The so called IOMMU bypass quirk is not affected.
> > > > > > > > * This usage is not congruent with  standardised semantics of
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an 
> > > > > > > > orthogonal reason
> > > > > > > > for using DMA API in virtio (orthogonal with respect to what is
> > > > > > > > expressed by VIRTIO_F_IOMMU_PLATFORM).
> > > > > > > > 
> > > > > > > > This series aims to decouple 'have to use DMA API because my 
> > > > > > > > (guest) RAM
> > > > > > > > is protected' and 'have to use DMA API because the device told 
> > > > > > > > me
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM'.
> > > > > > > > 
> > > > > > > > Please find more detailed explanations about the conceptual 
> > > > > > > > aspects in
> > > > > > > > the individual patches. There is however also a very practical 
> > > > > > > > problem
> > > > > > > > that is addressed by this series.
> > > > > > > > 
> > > > > > > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the 
> > > > > > > > following side
> > > > > > > > effect The vhost code assumes it the addresses on the virtio 
> > > > > > > > descriptor
> > > > > > > > ring are not guest physical addresses but iova's, and insists 
> > > > > > > > on doing a
> > > > > > > > translation of these regardless of what transport is used (e.g. 
> > > > > > > > whether
> > > > > > > > we emulate a PCI or a CCW device). (For details see commit 
> > > > > > > > 6b1e6cc7855b
> > > > > > > > "vhost: new device IOTLB API".) On s390 this results in severe
> > > > > > > > performance degradation (c.a. factor 10).
> > > > > > > Do you see a consistent degradation on the performance, or it only
> > > > > > > happen when for during the beginning of the test?
> > > > > > > 
> > > > > > AFAIK the degradation is consistent.
> > > > > > 
> > > > > > > > BTW with ccw I/O there is
> > > > > > > > (architecturally) no IOMMU, so the whole address translation 
> > > > > > > > makes no
> > > > > > > > sense in the context of virtio-ccw.
> > > > > > > I suspect we can do optimization in qemu side.
> > > > > > > 
> > > > > > > E.g send memtable entry via IOTLB API when vIOMMU is not enabled.
> > > > > > > 
> > > > > > > If this makes sense, I can draft patch to see if there's any 
> > > > > > > difference.
> > > > > > Frankly I would

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-23 Thread Michael S. Tsirkin
On Mon, Feb 24, 2020 at 02:45:03PM +0800, Jason Wang wrote:
> 
> On 2020/2/24 下午2:06, Michael S. Tsirkin wrote:
> > On Mon, Feb 24, 2020 at 12:01:57PM +0800, Jason Wang wrote:
> > > On 2020/2/21 下午10:56, Halil Pasic wrote:
> > > > On Fri, 21 Feb 2020 14:22:26 +0800
> > > > Jason Wang  wrote:
> > > > 
> > > > > On 2020/2/21 上午12:06, Halil Pasic wrote:
> > > > > > Currently if one intends to run a memory protection enabled VM with
> > > > > > virtio devices and linux as the guest OS, one needs to specify the
> > > > > > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the 
> > > > > > guest
> > > > > > linux use the DMA API, which in turn handles the memory
> > > > > > encryption/protection stuff if the guest decides to turn itself into
> > > > > > a protected one. This however makes no sense due to multiple 
> > > > > > reasons:
> > > > > > * The device is not changed by the fact that the guest RAM is
> > > > > > protected. The so called IOMMU bypass quirk is not affected.
> > > > > > * This usage is not congruent with  standardised semantics of
> > > > > > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal 
> > > > > > reason
> > > > > > for using DMA API in virtio (orthogonal with respect to what is
> > > > > > expressed by VIRTIO_F_IOMMU_PLATFORM).
> > > > > > 
> > > > > > This series aims to decouple 'have to use DMA API because my 
> > > > > > (guest) RAM
> > > > > > is protected' and 'have to use DMA API because the device told me
> > > > > > VIRTIO_F_IOMMU_PLATFORM'.
> > > > > > 
> > > > > > Please find more detailed explanations about the conceptual aspects 
> > > > > > in
> > > > > > the individual patches. There is however also a very practical 
> > > > > > problem
> > > > > > that is addressed by this series.
> > > > > > 
> > > > > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following 
> > > > > > side
> > > > > > effect The vhost code assumes it the addresses on the virtio 
> > > > > > descriptor
> > > > > > ring are not guest physical addresses but iova's, and insists on 
> > > > > > doing a
> > > > > > translation of these regardless of what transport is used (e.g. 
> > > > > > whether
> > > > > > we emulate a PCI or a CCW device). (For details see commit 
> > > > > > 6b1e6cc7855b
> > > > > > "vhost: new device IOTLB API".) On s390 this results in severe
> > > > > > performance degradation (c.a. factor 10).
> > > > > Do you see a consistent degradation on the performance, or it only
> > > > > happen when for during the beginning of the test?
> > > > > 
> > > > AFAIK the degradation is consistent.
> > > > 
> > > > > > BTW with ccw I/O there is
> > > > > > (architecturally) no IOMMU, so the whole address translation makes 
> > > > > > no
> > > > > > sense in the context of virtio-ccw.
> > > > > I suspect we can do optimization in qemu side.
> > > > > 
> > > > > E.g send memtable entry via IOTLB API when vIOMMU is not enabled.
> > > > > 
> > > > > If this makes sense, I can draft patch to see if there's any 
> > > > > difference.
> > > > Frankly I would prefer to avoid IOVAs on the descriptor ring (and the
> > > > then necessary translation) for virtio-ccw altogether. But Michael
> > > > voiced his opinion that we should mandate F_IOMMU_PLATFORM for devices
> > > > that could be used with guests running in protected mode. I don't share
> > > > his opinion, but that's an ongoing discussion.
> > > > 
> > > > Should we end up having to do translation from IOVA in vhost, we are
> > > > very interested in that translation being fast and efficient.
> > > > 
> > > > In that sense we would be very happy to test any optimization that aim
> > > > into that direction.
> > > > 
> > > > Thank you very much for your input!
> > > 
> > > Using IOTLB API on platform without IOMMU support is not intended.

Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-23 Thread Michael S. Tsirkin
On Mon, Feb 24, 2020 at 12:01:57PM +0800, Jason Wang wrote:
> 
> On 2020/2/21 下午10:56, Halil Pasic wrote:
> > On Fri, 21 Feb 2020 14:22:26 +0800
> > Jason Wang  wrote:
> > 
> > > On 2020/2/21 上午12:06, Halil Pasic wrote:
> > > > Currently if one intends to run a memory protection enabled VM with
> > > > virtio devices and linux as the guest OS, one needs to specify the
> > > > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
> > > > linux use the DMA API, which in turn handles the memory
> > > > encryption/protection stuff if the guest decides to turn itself into
> > > > a protected one. This however makes no sense due to multiple reasons:
> > > > * The device is not changed by the fact that the guest RAM is
> > > > protected. The so called IOMMU bypass quirk is not affected.
> > > > * This usage is not congruent with  standardised semantics of
> > > > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> > > > for using DMA API in virtio (orthogonal with respect to what is
> > > > expressed by VIRTIO_F_IOMMU_PLATFORM).
> > > > 
> > > > This series aims to decouple 'have to use DMA API because my (guest) RAM
> > > > is protected' and 'have to use DMA API because the device told me
> > > > VIRTIO_F_IOMMU_PLATFORM'.
> > > > 
> > > > Please find more detailed explanations about the conceptual aspects in
> > > > the individual patches. There is however also a very practical problem
> > > > that is addressed by this series.
> > > > 
> > > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> > > > effect The vhost code assumes it the addresses on the virtio descriptor
> > > > ring are not guest physical addresses but iova's, and insists on doing a
> > > > translation of these regardless of what transport is used (e.g. whether
> > > > we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> > > > "vhost: new device IOTLB API".) On s390 this results in severe
> > > > performance degradation (c.a. factor 10).
> > > 
> > > Do you see a consistent degradation on the performance, or it only
> > > happen when for during the beginning of the test?
> > > 
> > AFAIK the degradation is consistent.
> > 
> > > > BTW with ccw I/O there is
> > > > (architecturally) no IOMMU, so the whole address translation makes no
> > > > sense in the context of virtio-ccw.
> > > 
> > > I suspect we can do optimization in qemu side.
> > > 
> > > E.g send memtable entry via IOTLB API when vIOMMU is not enabled.
> > > 
> > > If this makes sense, I can draft patch to see if there's any difference.
> > Frankly I would prefer to avoid IOVAs on the descriptor ring (and the
> > then necessary translation) for virtio-ccw altogether. But Michael
> > voiced his opinion that we should mandate F_IOMMU_PLATFORM for devices
> > that could be used with guests running in protected mode. I don't share
> > his opinion, but that's an ongoing discussion.
> > 
> > Should we end up having to do translation from IOVA in vhost, we are
> > very interested in that translation being fast and efficient.
> > 
> > In that sense we would be very happy to test any optimization that aim
> > into that direction.
> > 
> > Thank you very much for your input!
> 
> 
> Using IOTLB API on platform without IOMMU support is not intended. Please
> try the attached patch to see if it helps.
> 
> Thanks
> 
> 
> > 
> > Regards,
> > Halil
> > 
> > > Thanks
> > > 
> > > 
> > > > Halil Pasic (2):
> > > > mm: move force_dma_unencrypted() to mem_encrypt.h
> > > > virtio: let virtio use DMA API when guest RAM is protected
> > > > 
> > > >drivers/virtio/virtio_ring.c |  3 +++
> > > >include/linux/dma-direct.h   |  9 -
> > > >include/linux/mem_encrypt.h  | 10 ++
> > > >3 files changed, 13 insertions(+), 9 deletions(-)
> > > > 
> > > > 
> > > > base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2

> >From 66fa730460875ac99e81d7db2334cd16bb1d2b27 Mon Sep 17 00:00:00 2001
> From: Jason Wang 
> Date: Mon, 24 Feb 2020 12:00:10 +0800
> Subject: [PATCH] virtio: turn on IOMMU_PLATFORM properly
> 
> When transport does not support IOMMU, we should clear IOMMU_PLATFORM
> even if the device and vhost claims to support that. This help to
> avoid the performance overhead caused by unnecessary IOTLB miss/update
> transactions on such platform.
> 
> Signed-off-by: Jason Wang 
> ---
>  hw/virtio/virtio-bus.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index d6332d45c3..2741b9fdd2 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -47,7 +47,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
> **errp)
>  VirtioBusState *bus = VIRTIO_BUS(qbus);
>  VirtioBusClass *klass = VIRTIO_BUS_GET_CLASS(bus);
>  VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
> -bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
>  Error *local_err = 

Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-22 Thread Michael S. Tsirkin
On Fri, Feb 21, 2020 at 03:33:40PM +0100, Halil Pasic wrote:
> AFAIU you have a positive attitude towards the idea, that 
> !F_VIRTIO_PLATFORM implies 'no DMA API is used by virtio' 
> should be scrapped. 
> 
> I would like to accomplish that without adverse effects to virtio-ccw
> (because caring for virtio-ccw is a part of job description).
> 
> Regards,
> Halil

It is possible, in theory. IIRC the main challenge is that DMA API
has overhead of indirect function calls even when all it
does it return back the PA without changes. So that will lead to
a measureable performance degradation. That might be fixable,
possibly using some kind of logic along the lines of
if (iova is pa)
return pa
else
indirect call

and for unmapping, we might need an API that says "unmap
is a nop, safe to skip" so we don't maintain the
dma address until unmap.


-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-21 Thread Michael S. Tsirkin
On Fri, Feb 21, 2020 at 02:12:30PM +0100, Halil Pasic wrote:
> On Thu, 20 Feb 2020 15:55:14 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> > > Currently the advanced guest memory protection technologies (AMD SEV,
> > > powerpc secure guest technology and s390 Protected VMs) abuse the
> > > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> > > is in turn necessary, to make IO work with guest memory protection.
> > > 
> > > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> > > different beast: with virtio devices whose implementation runs on an SMP
> > > CPU we are still fine with doing all the usual optimizations, it is just
> > > that we need to make sure that the memory protection mechanism does not
> > > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> > > side of the guest (and possibly he host side as well) than we actually
> > > need.
> > > 
> > > An additional benefit of teaching the guest to make the right decision
> > > (and use DMA API) on it's own is: removing the need, to mandate special
> > > VM configuration for guests that may run with protection. This is
> > > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> > > the virtio control structures into the first 2G of guest memory:
> > > something we don't necessarily want to do per-default.
> > > 
> > > Signed-off-by: Halil Pasic 
> > > Tested-by: Ram Pai 
> > > Tested-by: Michael Mueller 
> > 
> > This might work for you but it's fragile, since without
> > VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
> > GPA's, not DMA addresses.
> > 
> 
> Thanks for your constructive approach. I do want the hypervisor to
> assume it gets GPA's. My train of thought was that the guys that need
> to use IOVA's that are not GPA's when force_dma_unencrypted() will have
> to to specify VIRTIO_F_ACCESS_PLATFORM (at the device) anyway, because
> otherwise it won't work. But I see your point: in case of a
> mis-configuration and provided the DMA API returns IOVA's one could end
> up trying to touch wrong memory locations. But this should be similar to
> what would happen if DMA ops are not used, and memory is not made accessible.
> 
> > 
> > 
> > IOW this looks like another iteration of:
> > 
> > virtio: Support encrypted memory on powerpc secure guests
> > 
> > which I was under the impression was abandoned as unnecessary.
> 
> Unnecessary for powerpc because they do normal PCI. In the context of
> CCW there are only guest physical addresses (CCW I/O has no concept of
> IOMMU or IOVAs).
> 
> > 
> > 
> > To summarize, the necessary conditions for a hack along these lines
> > (using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:
> > 
> >   - secure guest mode is enabled - so we know that since we don't share
> > most memory regular virtio code won't
> > work, even though the buggy hypervisor didn't set 
> > VIRTIO_F_ACCESS_PLATFORM
> 
> force_dma_unencrypted(>dev) is IMHO exactly about this.
> 
> >   - DMA API is giving us addresses that are actually also physical
> > addresses
> 
> In case of s390 this is given.
> I talked with the power people before
> posting this, and they ensured me they can are willing to deal with
> this. I was hoping to talk abut this with the AMD SEV people here (hence
> the cc).

We'd need a part of DMA API that promises this though. Platform
maintainers aren't going to go out of their way to do the
right thing just for virtio, and I can't track all arches
to make sure they don't violate virtio requirements.

> 
> >   - Hypervisor is buggy and didn't enable VIRTIO_F_ACCESS_PLATFORM
> > 
> 
> I don't get this point. The argument where the hypervisor is buggy is a
> bit hard to follow for me. If hypervisor is buggy we have already lost
> anyway most of the time, or?

If VIRTIO_F_ACCESS_PLATFORM is set then things just work.  If
VIRTIO_F_ACCESS_PLATFORM is clear device is supposed to have access to
all of memory.  You can argue in various ways but it's easier to just
declare a behaviour that violates this a bug. Which might still be worth
working around, for various reasons.


> > I don't see how this patch does this.
> 
> I do get your point. I don't know of a good way to check that DMA API
> is giving us addresses that are actually physical addresses, and the
> situation you describe definitely has some risk to it.

One way would be to extend the DMA API with such an A

Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-21 Thread Michael S. Tsirkin
On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> On Fri, 21 Feb 2020 14:27:27 +1100
> David Gibson  wrote:
> 
> > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > bounce buffers when they are NEEDED. 
> > > > Forcing the user to specify iommu_platform just because you need bounce 
> > > > buffers
> > > > really feels wrong. And obviously we have a severe performance issue
> > > > because of the indirections.
> > > 
> > > The point is that the user should not have to specify iommu_platform.
> > > We need to make sure any new hypervisor (especially one that might require
> > > bounce buffering) always sets it,
> > 
> > So, I have draft qemu patches which enable iommu_platform by default.
> > But that's really because of other problems with !iommu_platform, not
> > anything to do with bounce buffering or secure VMs.
> > 
> > The thing is that the hypervisor *doesn't* require bounce buffering.
> > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > to know whether the guest needs bounce buffering.  As far as the
> > hypervisor and qemu are concerned that's a guest internal detail, it
> > just expects to get addresses it can access whether those are GPAs
> > (iommu_platform=off) or IOVAs (iommu_platform=on).
> 
> I very much agree!
> 
> > 
> > > as was a rather bogus legacy hack
> > 
> > It was certainly a bad idea, but it was a bad idea that went into a
> > public spec and has been widely deployed for many years.  We can't
> > just pretend it didn't happen and move on.
> > 
> > Turning iommu_platform=on by default breaks old guests, some of which
> > we still care about.  We can't (automatically) do it only for guests
> > that need bounce buffering, because the hypervisor doesn't know that
> > ahead of time.
> 
> Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> because for CCW I/O there is no such thing as IOMMU and the addresses
> are always physical addresses.

Fix the name then. The spec calls is ACCESS_PLATFORM now, which
makes much more sense.

> > 
> > > that isn't extensibe for cases that for example require bounce buffering.
> > 
> > In fact bounce buffering isn't really the issue from the hypervisor
> > (or spec's) point of view.  It's the fact that not all of guest memory
> > is accessible to the hypervisor.  Bounce buffering is just one way the
> > guest might deal with that.
> > 
> 
> Agreed.
> 
> Regards,
> Halil
> 
> 
> 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> effect The vhost code assumes it the addresses on the virtio descriptor
> ring are not guest physical addresses but iova's, and insists on doing a
> translation of these regardless of what transport is used (e.g. whether
> we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> "vhost: new device IOTLB API".) On s390 this results in severe
> performance degradation (c.a. factor 10). BTW with ccw I/O there is
> (architecturally) no IOMMU, so the whole address translation makes no
> sense in the context of virtio-ccw.

So it sounds like a host issue: the emulation of s390 unnecessarily complicated.
Working around it by the guest looks wrong ...

> Halil Pasic (2):
>   mm: move force_dma_unencrypted() to mem_encrypt.h
>   virtio: let virtio use DMA API when guest RAM is protected
> 
>  drivers/virtio/virtio_ring.c |  3 +++
>  include/linux/dma-direct.h   |  9 -
>  include/linux/mem_encrypt.h  | 10 ++
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> 
> base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> * This usage is not congruent with  standardised semantics of
> VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> for using DMA API in virtio (orthogonal with respect to what is
> expressed by VIRTIO_F_IOMMU_PLATFORM). 

Quoting the spec:

  \item[VIRTIO_F_ACCESS_PLATFORM(33)] This feature indicates that
  the device can be used on a platform where device access to data
  in memory is limited and/or translated. E.g. this is the case if the device 
can be located
  behind an IOMMU that translates bus addresses from the device into physical
  addresses in memory, if the device can be limited to only access
  certain memory addresses or if special commands such as
  a cache flush can be needed to synchronise data in memory with
  the device. Whether accesses are actually limited or translated
  is described by platform-specific means.
  If this feature bit is set to 0, then the device
  has same access to memory addresses supplied to it as the
  driver has.
  In particular, the device will always use physical addresses
  matching addresses used by the driver (typically meaning
  physical addresses used by the CPU)
  and not translated further, and can access any address supplied to it by
  the driver. When clear, this overrides any platform-specific description of
  whether device access is limited or translated in any way, e.g.
  whether an IOMMU may be present.

since device can't access encrypted memory,
this seems to match your case reasonably well.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote:
> Currently the advanced guest memory protection technologies (AMD SEV,
> powerpc secure guest technology and s390 Protected VMs) abuse the
> VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which
> is in turn necessary, to make IO work with guest memory protection.
> 
> But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a
> different beast: with virtio devices whose implementation runs on an SMP
> CPU we are still fine with doing all the usual optimizations, it is just
> that we need to make sure that the memory protection mechanism does not
> get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the
> side of the guest (and possibly he host side as well) than we actually
> need.
> 
> An additional benefit of teaching the guest to make the right decision
> (and use DMA API) on it's own is: removing the need, to mandate special
> VM configuration for guests that may run with protection. This is
> especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all
> the virtio control structures into the first 2G of guest memory:
> something we don't necessarily want to do per-default.
> 
> Signed-off-by: Halil Pasic 
> Tested-by: Ram Pai 
> Tested-by: Michael Mueller 

This might work for you but it's fragile, since without
VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets
GPA's, not DMA addresses.



IOW this looks like another iteration of:

virtio: Support encrypted memory on powerpc secure guests

which I was under the impression was abandoned as unnecessary.


To summarize, the necessary conditions for a hack along these lines
(using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:

  - secure guest mode is enabled - so we know that since we don't share
most memory regular virtio code won't
work, even though the buggy hypervisor didn't set VIRTIO_F_ACCESS_PLATFORM
  - DMA API is giving us addresses that are actually also physical
addresses
  - Hypervisor is buggy and didn't enable VIRTIO_F_ACCESS_PLATFORM
 
I don't see how this patch does this.


> ---
>  drivers/virtio/virtio_ring.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 867c7ebd3f10..fafc8f924955 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   if (!virtio_has_iommu_quirk(vdev))
>   return true;
>  
> + if (force_dma_unencrypted(>dev))
> + return true;
> +
>   /* Otherwise, we are left to guess. */
>   /*
>* In theory, it's possible to have a buggy QEMU-supposed
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM

2020-02-20 Thread Michael S. Tsirkin
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote:
> Currently if one intends to run a memory protection enabled VM with
> virtio devices and linux as the guest OS, one needs to specify the
> VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest
> linux use the DMA API, which in turn handles the memory
> encryption/protection stuff if the guest decides to turn itself into
> a protected one. This however makes no sense due to multiple reasons:
> * The device is not changed by the fact that the guest RAM is
> protected. The so called IOMMU bypass quirk is not affected.
> * This usage is not congruent with  standardised semantics of
> VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason
> for using DMA API in virtio (orthogonal with respect to what is
> expressed by VIRTIO_F_IOMMU_PLATFORM). 
> 
> This series aims to decouple 'have to use DMA API because my (guest) RAM
> is protected' and 'have to use DMA API because the device told me
> VIRTIO_F_IOMMU_PLATFORM'.
> 
> Please find more detailed explanations about the conceptual aspects in
> the individual patches. There is however also a very practical problem
> that is addressed by this series. 
> 
> For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side
> effect The vhost code assumes it the addresses on the virtio descriptor
> ring are not guest physical addresses but iova's, and insists on doing a
> translation of these regardless of what transport is used (e.g. whether
> we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b
> "vhost: new device IOTLB API".) On s390 this results in severe
> performance degradation (c.a. factor 10). BTW with ccw I/O there is
> (architecturally) no IOMMU, so the whole address translation makes no
> sense in the context of virtio-ccw.

That's just a QEMU thing. It uses the same flag for VIRTIO_F_ACCESS_PLATFORM
and vhost IOTLB. QEMU can separate them, no need to change linux.


> Halil Pasic (2):
>   mm: move force_dma_unencrypted() to mem_encrypt.h
>   virtio: let virtio use DMA API when guest RAM is protected
> 
>  drivers/virtio/virtio_ring.c |  3 +++
>  include/linux/dma-direct.h   |  9 -
>  include/linux/mem_encrypt.h  | 10 ++
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> 
> base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
> -- 
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote:
> On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:
> > On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:
> > > On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> > > > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > > > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > > > > With the built-in topology description in place, x86 platforms can 
> > > > > > now
> > > > > > use the virtio-iommu.
> > > > > > 
> > > > > > Signed-off-by: Jean-Philippe Brucker 
> > > > > > ---
> > > > > >drivers/iommu/Kconfig | 3 ++-
> > > > > >1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > > > index 068d4e0e3541..adcbda44d473 100644
> > > > > > --- a/drivers/iommu/Kconfig
> > > > > > +++ b/drivers/iommu/Kconfig
> > > > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > > > > >config VIRTIO_IOMMU
> > > > > > bool "Virtio IOMMU driver"
> > > > > > depends on VIRTIO=y
> > > > > > -   depends on ARM64
> > > > > > +   depends on (ARM64 || X86)
> > > > > > select IOMMU_API
> > > > > > +   select IOMMU_DMA
> > > > > 
> > > > > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > > > > virtio-iommu itself (and really shouldn't be), but is merely to 
> > > > > satisfy the
> > > > > x86 arch code's expectation that IOMMU drivers bring their own DMA 
> > > > > ops,
> > > > > right?
> > > > > 
> > > > > Robin.
> > > > 
> > > > In fact does not this work on any platform now?
> > > 
> > > There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
> > > has been converted recently [1] but VT-d still implements its own DMA ops
> > > (conversion patches are on the list [2]). On Arm the arch Kconfig selects
> > > IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
> > > complete. Until then I can add a "if X86" here for clarity.
> > > 
> > > Thanks,
> > > Jean
> > > 
> > > [1] 
> > > https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
> > > [2] 
> > > https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/
> > 
> > What about others? E.g. PPC?
> 
> That was the point I was getting at - while iommu-dma should build just fine
> for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code
> to correctly wire up iommu_dma_ops to devices. Thus there's currently no
> point pulling it in and pretending it's anything more than a waste of space
> for architectures other than arm64 and x86. It's merely a historical
> artefact of the x86 DMA API implementation that when the IOMMU drivers were
> split out to form drivers/iommu they took some of their relevant arch code
> with them.
> 
> Robin.


Rather than white-listing architectures, how about making the
architectures in question set some kind of symbol, and depend on it?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:
> On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > > With the built-in topology description in place, x86 platforms can now
> > > > use the virtio-iommu.
> > > > 
> > > > Signed-off-by: Jean-Philippe Brucker 
> > > > ---
> > > >   drivers/iommu/Kconfig | 3 ++-
> > > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > index 068d4e0e3541..adcbda44d473 100644
> > > > --- a/drivers/iommu/Kconfig
> > > > +++ b/drivers/iommu/Kconfig
> > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > > >   config VIRTIO_IOMMU
> > > > bool "Virtio IOMMU driver"
> > > > depends on VIRTIO=y
> > > > -   depends on ARM64
> > > > +   depends on (ARM64 || X86)
> > > > select IOMMU_API
> > > > +   select IOMMU_DMA
> > > 
> > > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > > virtio-iommu itself (and really shouldn't be), but is merely to satisfy 
> > > the
> > > x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
> > > right?
> > > 
> > > Robin.
> > 
> > In fact does not this work on any platform now?
> 
> There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
> has been converted recently [1] but VT-d still implements its own DMA ops
> (conversion patches are on the list [2]). On Arm the arch Kconfig selects
> IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
> complete. Until then I can add a "if X86" here for clarity.
> 
> Thanks,
> Jean
> 
> [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
> [2] 
> https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/

What about others? E.g. PPC?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-16 Thread Michael S. Tsirkin
On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > With the built-in topology description in place, x86 platforms can now
> > use the virtio-iommu.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >   drivers/iommu/Kconfig | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 068d4e0e3541..adcbda44d473 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> >   config VIRTIO_IOMMU
> > bool "Virtio IOMMU driver"
> > depends on VIRTIO=y
> > -   depends on ARM64
> > +   depends on (ARM64 || X86)
> > select IOMMU_API
> > +   select IOMMU_DMA
> 
> Can that have an "if X86" for clarity? AIUI it's not necessary for
> virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
> x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
> right?
> 
> Robin.

In fact does not this work on any platform now?

> > select INTERVAL_TREE
> > help
> >   Para-virtualised IOMMU driver with virtio.
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 00/13] virtio-iommu on non-devicetree platforms

2019-11-22 Thread Michael S. Tsirkin
On Fri, Nov 22, 2019 at 11:49:47AM +0100, Jean-Philippe Brucker wrote:
> I'm seeking feedback on multi-platform support for virtio-iommu. At the
> moment only devicetree (DT) is supported and we don't have a pleasant
> solution for other platforms. Once we figure out the topology
> description, x86 support is trivial.
> 
> Since the IOMMU manages memory accesses from other devices, the guest
> kernel needs to initialize the IOMMU before endpoints start issuing DMA.
> It's a solved problem: firmware or hypervisor describes through DT or
> ACPI tables the device dependencies, and probe of endpoints is deferred
> until the IOMMU is probed. But:
> 
> (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT
> for Arm). From my point of view IORT is easier to extend, since we
> just need to introduce a new node type. There are no dependencies to
> Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> 
> However, there are concerns about other OS vendors feeling obligated
> to implement this new node, so Arm proposed introducing another ACPI
> table, that can wrap any of DMAR, IVRS and IORT to extend it with
> new virtual nodes. A draft of this VIOT table specification is
> available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf
> 
> I'm afraid this could increase fragmentation as guests would need to
> implement or modify their support for all of DMAR, IVRS and IORT. If
> we end up doing VIOT, I suggest limiting it to IORT.
> 
> (2) In addition, there are some concerns about having virtio depend on
> ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
> [1])

power?

> don't currently implement those methods.
> 
> It was suggested to embed the topology description into the device.
> It can work, as demonstrated at the end of this RFC, with the
> following limitations:
> 
> - The topology description must be read before any endpoint managed
>   by the IOMMU is probed, and even before the virtio module is
>   loaded. This RFC uses a PCI quirk to manually parse the virtio
>   configuration. It assumes that all endpoints managed by the IOMMU
>   are under this same PCI host.
> 
> - I don't have a solution for the virtio-mmio transport at the
>   moment, because I haven't had time to modify a host to test it. I
>   think it could either use a notifier on the platform bus, or
>   better, a new 'iommu' command-line argument to the virtio-mmio
>   driver.

A notifier seems easier for users. What are the disadvantages of
that?

>   So the current prototype doesn't work for firecracker and
>   microvm, which rely on virtio-mmio.
> 
> - For Arm, if the platform has an ITS, the hypervisor needs IORT or
>   DT to describe it anyway. More generally, not using either ACPI or
>   DT might prevent from supporting other features as well. I suspect
>   the above users will have to implement a standard method sooner or
>   later.
> 
> - Even when reusing as much existing code as possible, guest support
>   is still going to be around a few hundred lines since we can't
>   rely on the normal virtio infrastructure to be loaded at that
>   point. As you can see below, the diffstat for the incomplete
>   topology implementation is already bigger than the exhaustive IORT
>   support, even when jumping through the VIOT hoop.
> 
> So it's a lightweight solution for very specific use-cases, and we
> should still support ACPI for the general case. Multi-platform
> guests such as Linux will then need to support three topology
> descriptions instead of two.
> 
> In this RFC I present both solutions, but I'd rather not keep all of it.
> Please see the individual patches for details:
> 
> (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
> driver and patches 2, 11 add the VIOT glue.
> 
> (2) Patch 12 adds the built-in topology description to the virtio-iommu
> specification. Patch 13 is a partial implementation for the Linux
> virtio-iommu driver. It only supports PCI, not platform devices.
> 
> You can find Linux and QEMU code on my virtio-iommu/devel branches at
> http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu
> 
> 
> I split the diffstat since there are two independent features. The first
> one is for patches 1-11, and the second one for patch 13.
> 
> Jean-Philippe Brucker (11):
>   ACPI/IORT: Move IORT to the ACPI folder
>   ACPI: Add VIOT definitions
>   ACPI/IORT: Allow registration of external tables
>   ACPI/IORT: Add node categories
>   ACPI/IORT: Support VIOT virtio-mmio node
>   ACPI/IORT: Support VIOT virtio-pci node
>   ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
>   ACPI/IORT: Add callback to update a device's fwnode
>   iommu/virtio: Create fwnode if necessary
>   iommu/virtio: Update IORT fwnode
>   ACPI: Add VIOT 

Re: [RFC 13/13] iommu/virtio: Add topology description to

2019-11-22 Thread Michael S. Tsirkin
On Fri, Nov 22, 2019 at 11:50:00AM +0100, Jean-Philippe Brucker wrote:
> Some hypervisors don't implement either device-tree or ACPI, but still
> need a method to describe the IOMMU topology. Read the virtio-iommu
> config early and parse the topology description. Hook into the
> dma_setup() callbacks to initialize the IOMMU before probing endpoints.
> 
> If the virtio-iommu uses the virtio-pci transport, this will only work
> if the PCI root complex is the first device probed. We don't currently
> support virtio-mmio.
> 
> Initially I tried to generate a fake IORT table and feed it to the IORT
> driver, in order to avoid rewriting the whole DMA code, but it wouldn't
> work with platform endpoints, which are references to items in the ACPI
> table on IORT.
> 
> Signed-off-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

Overall this looks good to me. The only point is that
I think the way the interface is designed makes writing
the driver a bit too difficult. Idea: if instead we just
have a length field and then an array of records
(preferably unions so we don't need to work hard),
we can shadow that into memory, then iterate over
the unions.

Maybe add a uniform record length + number of records field.
Then just skip types you do not know how to handle.
This will also help make sure it's within bounds.

What do you think?


You will need to do something to address the TODO I think.

> ---
> Note that we only call virt_dma_configure() if the host didn't provide
> either DT or ACPI method. If you want to test this with QEMU, you'll
> need to manually disable the acpi_dma_configure() part in pci-driver.c
> ---
>  drivers/base/platform.c   |   3 +
>  drivers/iommu/Kconfig |   9 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu-topology.c | 410 ++
>  drivers/iommu/virtio-iommu.c  |   3 +
>  drivers/pci/pci-driver.c  |   3 +
>  include/linux/virtio_iommu.h  |  18 ++
>  include/uapi/linux/virtio_iommu.h |  26 ++
>  8 files changed, 473 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virtio_iommu.h
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index b230beb6ccb4..70b12c8ef2fb 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev)
>   } else if (has_acpi_companion(dev)) {
>   attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
>   ret = acpi_dma_configure(dev, attr);
> + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
> + ret = virt_dma_configure(dev);
>   }
>  
>   return ret;
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e6eb4f238d1a..d02c0d36019d 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -486,4 +486,13 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY
> + bool "Topology properties for the virtio-iommu"
> + depends on VIRTIO_IOMMU
> + help
> +   Enable early probing of the virtio-iommu device, to detect the
> +   topology description.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 4f405f926e73..6b51c4186ebc 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
> diff --git a/drivers/iommu/virtio-iommu-topology.c 
> b/drivers/iommu/virtio-iommu-topology.c
> new file mode 100644
> index ..ec22510ace3d
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu-topology.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct viommu_cap_config {
> + u8 pos; /* PCI capability position */
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */
> +};
> +
> +struct viommu_spec {
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headtopology;
> + struct list_headlist;
> +};
> +
> +struct viommu_topology {
> + union {
> + struct virtio_iommu_topo_head head;
> + struct virtio_iommu_topo_pci_range pci;
> + 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-09-05 Thread Michael S. Tsirkin
On Mon, Aug 12, 2019 at 02:15:32PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 11, 2019 at 04:55:27AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> > > So we need a flag on the virtio device, exposed by the
> > > hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> > > don't take a shortcut.
> > 
> > The point here is that it's actually still not real. So we would still
> > use a physical address. However Linux decides that it wants extra
> > security by moving all data through the bounce buffer.  The distinction
> > made is that one can actually give device a physical address of the
> > bounce buffer.
> 
> Sure.  The problem is just that you keep piling hacks on top of hacks.
> We need the per-device flag anyway to properly support hardware virtio
> device in all circumstances.  Instead of coming up with another ad-hoc
> hack to force DMA uses implement that one proper bit and reuse it here.

The flag that you mention literally means "I am a real device" so for
example, you can use VFIO with it. And this device isn't a real one,
and you can't use VFIO with it, even though it's part of a power
system which always has an IOMMU.



Or here's another way to put it: we have a broken device that can only
access physical addresses, not DMA addresses. But to enable SEV Linux
requires DMA API.  So we can still make it work if DMA address happens
to be a physical address (not necessarily of the same page).


This is where dma_addr_is_a_phys_addr() is coming from: it tells us this
weird configuration can still work.  What are we going to do for SEV if
dma_addr_is_a_phys_addr does not apply? Fail probe I guess.


So the proposal is really to make things safe and to this end,
to add this in probe:

if (sev_active() &&
!dma_addr_is_a_phys_addr(dev) &&
!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
return -EINVAL;


the point being to prevent loading driver where it would
corrupt guest memory. Put this way, any objections to adding
dma_addr_is_a_phys_addr to the DMA API?





-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> So we need a flag on the virtio device, exposed by the
> hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> don't take a shortcut.

The point here is that it's actually still not real. So we would still
use a physical address. However Linux decides that it wants extra
security by moving all data through the bounce buffer.  The distinction
made is that one can actually give device a physical address of the
bounce buffer.

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sat, Aug 10, 2019 at 11:46:21PM -0700, Ram Pai wrote:
> On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> > sev_active() is gone now in linux-next, at least as a global API.
> > 
> > And once again this is entirely going in the wrong direction.  The only
> > way using the DMA API is going to work at all is if the device is ready
> > for it.  So we need a flag on the virtio device, exposed by the
> > hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> > don't take a shortcut.
> > 
> > And that means on power and s390 qemu will always have to set thos if
> > you want to be ready for the ultravisor and co games.  It's not like we
> > haven't been through this a few times before, have we?
> 
> 
> We have been through this so many times, but I dont think, we ever
> understood each other.   I have a fundamental question, the answer to
> which was never clear. Here it is...
> 
> If the hypervisor (hardware for hw virtio devices) does not mandate a
> DMA API, why is it illegal for the driver to request, special handling
> of its i/o buffers? Why are we associating this special handling to
> always mean, some DMA address translation? Can't there be 
> any other kind of special handling needs, that has nothing to do with
> DMA address translation?

I think the answer to that is, extend the DMA API to cover that special
need then. And that's exactly what dma_addr_is_phys_addr is trying to
do.

> 
> -- 
> Ram Pai


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> And once again this is entirely going in the wrong direction.  The only
> way using the DMA API is going to work at all is if the device is ready
> for it.

So the point made is that if DMA addresses are also physical addresses
(not necessarily the same physical addresses that driver supplied), then
DMA API actually works even though device itself uses CPU page tables.


To put it in other terms: it would be possible to make all or part of
memory unenecrypted and then have virtio access all of it.  SEV guests
at the moment make a decision to instead use a bounce buffer, forcing an
extra copy but gaining security.

-- 
MST


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-11 Thread Michael S. Tsirkin
On Sat, Aug 10, 2019 at 03:07:02PM -0700, Ram Pai wrote:
> On Sat, Aug 10, 2019 at 02:57:17PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jan 29, 2019 at 03:08:12PM -0200, Thiago Jung Bauermann wrote:
> > > 
> > > Hello,
> > > 
> > > With Christoph's rework of the DMA API that recently landed, the patch
> > > below is the only change needed in virtio to make it work in a POWER
> > > secure guest under the ultravisor.
> > > 
> > > The other change we need (making sure the device's dma_map_ops is NULL
> > > so that the dma-direct/swiotlb code is used) can be made in
> > > powerpc-specific code.
> > > 
> > > Of course, I also have patches (soon to be posted as RFC) which hook up
> > >  to the powerpc secure guest support code.
> > > 
> > > What do you think?
> > > 
> > > >From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> > > From: Thiago Jung Bauermann 
> > > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
> > > 
> > > The host can't access the guest memory when it's encrypted, so using
> > > regular memory pages for the ring isn't an option. Go through the DMA API.
> > > 
> > > Signed-off-by: Thiago Jung Bauermann 
> > > ---
> > >  drivers/virtio/virtio_ring.c | 5 -
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index cd7e755484e3..321a27075380 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device 
> > > *vdev)
> > >* not work without an even larger kludge.  Instead, enable
> > >* the DMA API if we're a Xen guest, which at least allows
> > >* all of the sensible Xen configurations to work correctly.
> > > +  *
> > > +  * Also, if guest memory is encrypted the host can't access
> > > +  * it directly. In this case, we'll need to use the DMA API.
> > >*/
> > > - if (xen_domain())
> > > + if (xen_domain() || sev_active())
> > >   return true;
> > > 
> > >   return false;
> > 
> > So I gave this lots of thought, and I'm coming round to
> > basically accepting something very similar to this patch.
> > 
> > But not exactly like this :).
> > 
> > Let's see what are the requirements.
> > 
> > If
> > 
> > 1. We do not trust the device (so we want to use a bounce buffer with it)
> > 2. DMA address is also a physical address of a buffer
> > 
> > then we should use DMA API with virtio.
> > 
> > 
> > sev_active() above is one way to put (1).  I can't say I love it but
> > it's tolerable.
> > 
> > 
> > But we also want promise from DMA API about 2.
> > 
> > 
> > Without promise 2 we simply can't use DMA API with a legacy device.
> > 
> > 
> > Otherwise, on a SEV system with an IOMMU which isn't 1:1
> > and with a virtio device without ACCESS_PLATFORM, we are trying
> > to pass a virtual address, and devices without ACCESS_PLATFORM
> > can only access CPU physical addresses.
> > 
> > So something like:
> > 
> > dma_addr_is_phys_addr?
> 
> 
> On our Secure pseries platform,  dma address is physical address and this
> proposal will help us, use DMA API. 
> 
> On our normal pseries platform, dma address is physical address too.
> But we do not necessarily need to use the DMA API.  We can use the DMA
> API, but our handlers will do the same thing, the generic virtio handlers
> would do. If there is an opt-out option; even when dma addr is same as
> physical addr, than there will be less code duplication.
> 
> Would something like this be better.
> 
> (dma_addr_is_phys_addr  && arch_want_to_use_dma_api()) ?
> 
> 
> RP

I think sev_active() is an OK replacement for arch_want_to_use_dma_api.
So just the addition of dma_addr_is_phys_addr would be enough.

> 
> > -- 
> > MST
> 
> -- 
> Ram Pai
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-10 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 03:08:12PM -0200, Thiago Jung Bauermann wrote:
> 
> Hello,
> 
> With Christoph's rework of the DMA API that recently landed, the patch
> below is the only change needed in virtio to make it work in a POWER
> secure guest under the ultravisor.
> 
> The other change we need (making sure the device's dma_map_ops is NULL
> so that the dma-direct/swiotlb code is used) can be made in
> powerpc-specific code.
> 
> Of course, I also have patches (soon to be posted as RFC) which hook up
>  to the powerpc secure guest support code.
> 
> What do you think?
> 
> >From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> From: Thiago Jung Bauermann 
> Date: Thu, 24 Jan 2019 22:08:02 -0200
> Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
> 
> The host can't access the guest memory when it's encrypted, so using
> regular memory pages for the ring isn't an option. Go through the DMA API.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  drivers/virtio/virtio_ring.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cd7e755484e3..321a27075380 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>* not work without an even larger kludge.  Instead, enable
>* the DMA API if we're a Xen guest, which at least allows
>* all of the sensible Xen configurations to work correctly.
> +  *
> +  * Also, if guest memory is encrypted the host can't access
> +  * it directly. In this case, we'll need to use the DMA API.
>*/
> - if (xen_domain())
> + if (xen_domain() || sev_active())
>   return true;
> 
>   return false;

So I gave this lots of thought, and I'm coming round to
basically accepting something very similar to this patch.

But not exactly like this :).

Let's see what are the requirements.

If

1. We do not trust the device (so we want to use a bounce buffer with it)
2. DMA address is also a physical address of a buffer

then we should use DMA API with virtio.


sev_active() above is one way to put (1).  I can't say I love it but
it's tolerable.


But we also want promise from DMA API about 2.


Without promise 2 we simply can't use DMA API with a legacy device.


Otherwise, on a SEV system with an IOMMU which isn't 1:1
and with a virtio device without ACCESS_PLATFORM, we are trying
to pass a virtual address, and devices without ACCESS_PLATFORM
can only access CPU physical addresses.

So something like:

dma_addr_is_phys_addr?



-- 
MST


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-24 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:51PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index c8be1c4f5b55..37c143971211 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
> >>   {
> >>size_t max_segment_size = SIZE_MAX;
> >>   -if (vring_use_dma_api(vdev))
> >> +  if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
> >
> > Hmm, might it make sense to roll that check up into vring_use_dma_api() 
> > itself? After all, if the device has no mask then it's likely that other 
> > DMA API ops wouldn't really work as expected either.
> 
> Makes sense to me.

Christoph - would a documented API wrapping dma_mask make sense?
With the documentation explaining how users must
desist from using DMA APIs if that returns false ...


-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:30PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 11:33:35AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> > > Do not call dma_max_mapping_size for devices that have no DMA
> > > mask set, otherwise we can hit a NULL pointer dereference.
> > > 
> > > This occurs when a virtio-blk-pci device is protected with
> > > a virtual IOMMU.
> > > 
> > > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > > Signed-off-by: Eric Auger 
> > > Suggested-by: Christoph Hellwig 
> > 
> > Christoph, I wonder why did you suggest this?
> > The connection between dma_mask and dma_max_mapping_size
> > is far from obvious.  The documentation doesn't exist.
> > Do we really have to teach all users about this hack?
> > Why not just make dma_max_mapping_size DTRT?
> 
> Because we should not call into dma API functions for devices that
> are not DMA capable.

I'd rather call is_device_dma_capable then, better than poking
at DMA internals.

-- 
MST


Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 06:56:49PM +0200, Auger Eric wrote:
> Hi Christoph,
> 
> On 7/22/19 6:51 PM, Eric Auger wrote:
> > We currently have cases where the dma_addressing_limited() gets
> > called with dma_mask unset. This causes a NULL pointer dereference.
> > 
> > Use dma_get_mask() accessor to prevent the crash.
> > 
> > Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
> > Signed-off-by: Eric Auger 
> 
> As a follow-up of my last email, here is a patch featuring
> dma_get_mask(). But you don't have the WARN_ON_ONCE anymore, pointing
> out suspect users.

OTOH these users then simply become okay so no need for WARN_ON_ONCE
then :)

> Feel free to pick up your preferred approach
> 
> Thanks
> 
> Eric
> > 
> > ---
> > 
> > v1 -> v2:
> > - was [PATCH 1/2] dma-mapping: Protect dma_addressing_limited
> >   against NULL dma_mask
> > - Use dma_get_mask
> > ---
> >  include/linux/dma-mapping.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> > index e11b115dd0e4..f7d1eea32c78 100644
> > --- a/include/linux/dma-mapping.h
> > +++ b/include/linux/dma-mapping.h
> > @@ -689,8 +689,8 @@ static inline int dma_coerce_mask_and_coherent(struct 
> > device *dev, u64 mask)
> >   */
> >  static inline bool dma_addressing_limited(struct device *dev)
> >  {
> > -   return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
> > -   dma_get_required_mask(dev);
> > +   return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
> > +   dma_get_required_mask(dev);
> >  }
> >  
> >  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 06:51:49PM +0200, Eric Auger wrote:
> We currently have cases where the dma_addressing_limited() gets
> called with dma_mask unset. This causes a NULL pointer dereference.
> 
> Use dma_get_mask() accessor to prevent the crash.
> 
> Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
> Signed-off-by: Eric Auger 

Acked-by: Michael S. Tsirkin 

If possible I really prefer this approach: avoids changing all callers
and uses documented interfaces.

> ---
> 
> v1 -> v2:
> - was [PATCH 1/2] dma-mapping: Protect dma_addressing_limited
>   against NULL dma_mask
> - Use dma_get_mask
> ---
>  include/linux/dma-mapping.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index e11b115dd0e4..f7d1eea32c78 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -689,8 +689,8 @@ static inline int dma_coerce_mask_and_coherent(struct 
> device *dev, u64 mask)
>   */
>  static inline bool dma_addressing_limited(struct device *dev)
>  {
> - return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
> - dma_get_required_mask(dev);
> + return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
> + dma_get_required_mask(dev);
>  }
>  
>  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> -- 
> 2.20.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
> On 22/07/2019 15:55, Eric Auger wrote:
> > Do not call dma_max_mapping_size for devices that have no DMA
> > mask set, otherwise we can hit a NULL pointer dereference.
> > 
> > This occurs when a virtio-blk-pci device is protected with
> > a virtual IOMMU.
> > 
> > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > Signed-off-by: Eric Auger 
> > Suggested-by: Christoph Hellwig 
> > ---
> >   drivers/virtio/virtio_ring.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c8be1c4f5b55..37c143971211 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
> >   {
> > size_t max_segment_size = SIZE_MAX;
> > -   if (vring_use_dma_api(vdev))
> > +   if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
> 
> Hmm, might it make sense to roll that check up into vring_use_dma_api()
> itself? After all, if the device has no mask then it's likely that other DMA
> API ops wouldn't really work as expected either.
> 
> Robin.

Nope, Eric pointed out it's just dma_addressing_limited that is broken.

Other APIs call dma_get_mask which handles the NULL mask case fine.


> > max_segment_size = dma_max_mapping_size(>dev);
> > return max_segment_size;
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 05:27:10PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> > Do not call dma_max_mapping_size for devices that have no DMA
> > mask set, otherwise we can hit a NULL pointer dereference.
> > 
> > This occurs when a virtio-blk-pci device is protected with
> > a virtual IOMMU.
> > 
> > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > Signed-off-by: Eric Auger 
> > Suggested-by: Christoph Hellwig 
> 
> Looks good.  virtio maintainers, let me know if you want to queue
> it up or if I should pick the patch up through the dma-mapping tree.

Personally I dislike this API because I feel presence of dma mask does
not strictly have to reflect max size. And generally the requirement to
check presence of mask feels like an undocumented hack to me.  Even
reading code will not help you avoid the warning, everyone will get it
wrong and get the warning splat in their logs.  So I would prefer just
v1 of the patch that makes dma API do the right thing for us.

However, if that's not going to be the case, let's fix it up in virtio.
In any case, for both v1 and v2 of the patches, you can merge them
through your tree:

Acked-by: Michael S. Tsirkin 

-- 
MST


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> Do not call dma_max_mapping_size for devices that have no DMA
> mask set, otherwise we can hit a NULL pointer dereference.
> 
> This occurs when a virtio-blk-pci device is protected with
> a virtual IOMMU.
> 
> Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> Signed-off-by: Eric Auger 
> Suggested-by: Christoph Hellwig 

Christoph, I wonder why did you suggest this?
The connection between dma_mask and dma_max_mapping_size
is far from obvious.  The documentation doesn't exist.
Do we really have to teach all users about this hack?
Why not just make dma_max_mapping_size DTRT?

> ---
>  drivers/virtio/virtio_ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c8be1c4f5b55..37c143971211 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
>  {
>   size_t max_segment_size = SIZE_MAX;
>  
> - if (vring_use_dma_api(vdev))
> + if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
>   max_segment_size = dma_max_mapping_size(>dev);
>  
>   return max_segment_size;
> -- 
> 2.20.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/virtio: Update to most recent specification

2019-07-22 Thread Michael S. Tsirkin
On Mon, Jul 22, 2019 at 03:40:07PM +0100, Jean-Philippe Brucker wrote:
> Following specification review a few things were changed in v8 of the
> virtio-iommu series [1], but have been omitted when merging the base
> driver. Add them now:
> 
> * Remove the EXEC flag.
> * Add feature bit for the MMIO flag.
> * Change domain_bits to domain_range.
> * Add NOMEM status flag.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20190530170929.19366-1-jean-philippe.bruc...@arm.com/
> 
> Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
> Reported-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

I'm merging this for this release, unless someone objects.

> ---
>  drivers/iommu/virtio-iommu.c  | 40 ++-
>  include/uapi/linux/virtio_iommu.h | 32 ++---
>  2 files changed, 47 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 433f4d2ee956..80a740df0737 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -2,7 +2,7 @@
>  /*
>   * Virtio driver for the paravirtualized IOMMU
>   *
> - * Copyright (C) 2018 Arm Limited
> + * Copyright (C) 2019 Arm Limited
>   */
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -47,7 +47,10 @@ struct viommu_dev {
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
> - u8  domain_bits;
> + u32 first_domain;
> + u32 last_domain;
> + /* Supported MAP flags */
> + u32 map_flags;
>   u32 probe_size;
>  };
>  
> @@ -62,6 +65,7 @@ struct viommu_domain {
>   struct viommu_dev   *viommu;
>   struct mutexmutex; /* protects viommu pointer */
>   unsigned intid;
> + u32 map_flags;
>  
>   spinlock_t  mappings_lock;
>   struct rb_root_cached   mappings;
> @@ -113,6 +117,8 @@ static int viommu_get_req_errno(void *buf, size_t len)
>   return -ENOENT;
>   case VIRTIO_IOMMU_S_FAULT:
>   return -EFAULT;
> + case VIRTIO_IOMMU_S_NOMEM:
> + return -ENOMEM;
>   case VIRTIO_IOMMU_S_IOERR:
>   case VIRTIO_IOMMU_S_DEVERR:
>   default:
> @@ -607,15 +613,15 @@ static int viommu_domain_finalise(struct viommu_dev 
> *viommu,
>  {
>   int ret;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
> - unsigned int max_domain = viommu->domain_bits > 31 ? ~0 :
> -   (1U << viommu->domain_bits) - 1;
>  
>   vdomain->viommu = viommu;
> + vdomain->map_flags  = viommu->map_flags;
>  
>   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
> - ret = ida_alloc_max(>domain_ids, max_domain, GFP_KERNEL);
> + ret = ida_alloc_range(>domain_ids, viommu->first_domain,
> +   viommu->last_domain, GFP_KERNEL);
>   if (ret >= 0)
>   vdomain->id = (unsigned int)ret;
>  
> @@ -710,7 +716,7 @@ static int viommu_map(struct iommu_domain *domain, 
> unsigned long iova,
> phys_addr_t paddr, size_t size, int prot)
>  {
>   int ret;
> - int flags;
> + u32 flags;
>   struct virtio_iommu_req_map map;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> @@ -718,6 +724,9 @@ static int viommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>   (prot & IOMMU_WRITE ? VIRTIO_IOMMU_MAP_F_WRITE : 0) |
>   (prot & IOMMU_MMIO ? VIRTIO_IOMMU_MAP_F_MMIO : 0);
>  
> + if (flags & ~vdomain->map_flags)
> + return -EINVAL;
> +
>   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
>   if (ret)
>   return ret;
> @@ -1027,7 +1036,8 @@ static int viommu_probe(struct virtio_device *vdev)
>   goto err_free_vqs;
>   }
>  
> - viommu->domain_bits = 32;
> + viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
> + viommu->last_domain = ~0U;
>  
>   /* Optional features */
>   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
> @@ -1038,9 +1048,13 @@ static int viommu_probe(struct virtio_device *vdev)
>struct virtio_iommu_config, input_range.end,
>_end);
>  
> - virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_BITS,
> -  struct virtio_iommu_config, domain_bits,
> -  >domain_bits);
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
> +  struct virtio_iommu_config, domain_range.start,
> +  >first_domain);
> +
> + virtio_cread_feature(vdev, 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 07:03:03PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > So this is what I would call this option:
> >> >> >
> >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >> >> >
> >> >> > and the explanation should state that all device
> >> >> > addresses are translated by the platform to identical
> >> >> > addresses.
> >> >> >
> >> >> > In fact this option then becomes more, not less restrictive
> >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> >> >> > by guest to only create identity mappings,
> >> >> > and only before driver_ok is set.
> >> >> > This option then would always be negotiated together with
> >> >> > VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> > Host then must verify that
> >> >> > 1. full 1:1 mappings are created before driver_ok
> >> >> > or can we make sure this happens before features_ok?
> >> >> > that would be ideal as we could require that features_ok fails
> >> >> > 2. mappings are not modified between driver_ok and reset
> >> >> > i guess attempts to change them will fail -
> >> >> > possibly by causing a guest crash
> >> >> > or some other kind of platform-specific error
> >> >>
> >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
> >> >> SLOF as I mentioned above, another is that we would be requiring all
> >> >> guests running on the machine (secure guests or not, since we would use
> >> >> the same configuration for all guests) to support it. But
> >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
> >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
> >> >> it and wouldn't be able to use the device.
> >> >
> >> > OK and your target is to enable use with kernel drivers within
> >> > guests, right?
> >>
> >> Right.
> >>
> >> > My question is, we are defining a new flag here, I guess old guests
> >> > then do not set it. How does it help old guests? Or maybe it's
> >> > not designed to ...
> >>
> >> Indeed. The idea is that QEMU can offer the flag, old guests can reject
> >> it (or even new guests can reject it, if they decide not to convert into
> >> secure VMs) and the feature negotiation will succeed with the flag
> >> unset.
> >
> > OK. And then what does QEMU do? Assume guest is not encrypted I guess?
> 
> There's nothing different that QEMU needs to do, with or without the
> flag. the perspective of the host, a secure guest and a regular guest
> work the same way with respect to virtio.

OK. So now let's get back to implementation. What will
Linux guest driver do? It can't activate DMA API blindly since that
will assume translation also works, right?
Or do we somehow limit it to just a specific platform?

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> >>
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > So this is what I would call this option:
> >> >
> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >> >
> >> > and the explanation should state that all device
> >> > addresses are translated by the platform to identical
> >> > addresses.
> >> >
> >> > In fact this option then becomes more, not less restrictive
> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> >> > by guest to only create identity mappings,
> >> > and only before driver_ok is set.
> >> > This option then would always be negotiated together with
> >> > VIRTIO_F_ACCESS_PLATFORM.
> >> >
> >> > Host then must verify that
> >> > 1. full 1:1 mappings are created before driver_ok
> >> > or can we make sure this happens before features_ok?
> >> > that would be ideal as we could require that features_ok fails
> >> > 2. mappings are not modified between driver_ok and reset
> >> > i guess attempts to change them will fail -
> >> > possibly by causing a guest crash
> >> > or some other kind of platform-specific error
> >>
> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
> >> SLOF as I mentioned above, another is that we would be requiring all
> >> guests running on the machine (secure guests or not, since we would use
> >> the same configuration for all guests) to support it. But
> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
> >> it and wouldn't be able to use the device.
> >
> > OK and your target is to enable use with kernel drivers within
> > guests, right?
> 
> Right.
> 
> > My question is, we are defining a new flag here, I guess old guests
> > then do not set it. How does it help old guests? Or maybe it's
> > not designed to ...
> 
> Indeed. The idea is that QEMU can offer the flag, old guests can reject
> it (or even new guests can reject it, if they decide not to convert into
> secure VMs) and the feature negotiation will succeed with the flag
> unset.

OK. And then what does QEMU do? Assume guest is not encrypted I guess?

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Thu, Jun 27, 2019 at 10:58:40PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >> >> I rephrased it in terms of address translation. What do you think of
> >> >> >> this version? The flag name is slightly different too:
> >> >> >>
> >> >> >>
> >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> >> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not 
> >> >> >> set,
> >> >> >> with the exception that address translation is guaranteed to be
> >> >> >> unnecessary when accessing memory addresses supplied to the 
> >> >> >> device
> >> >> >> by the driver. Which is to say, the device will always use 
> >> >> >> physical
> >> >> >> addresses matching addresses used by the driver (typically 
> >> >> >> meaning
> >> >> >> physical addresses used by the CPU) and not translated further. 
> >> >> >> This
> >> >> >> flag should be set by the guest if offered, but to allow for
> >> >> >> backward-compatibility device implementations allow for it to be
> >> >> >> left unset by the guest. It is an error to set both this flag and
> >> >> >> VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> >> >> > drivers. This is why devices fail when it's not negotiated.
> >> >>
> >> >> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> >> >> implemented in guest userspace such as with VFIO? Or unprivileged in
> >> >> some other sense such as needing to use bounce buffers for some reason?
> >> >
> >> > I had drivers in guest userspace in mind.
> >>
> >> Great. Thanks for clarifying.
> >>
> >> I don't think this flag would work for guest userspace drivers. Should I
> >> add a note about that in the flag definition?
> >
> > I think you need to clarify access protection rules. Is it only
> > translation that is bypassed or is any platform-specific
> > protection mechanism bypassed too?
> 
> It is only translation. In a secure guest, if the device tries to access
> a memory address that wasn't provided by the driver then the
> architecture will deny that access. If the device accesses addresses
> provided to it by the driver, then there's no protection mechanism or
> translation to get in the way.
> 
> >> >> > This confuses me.
> >> >> > If driver is unpriveledged then what happens with this flag?
> >> >> > It can supply any address it wants. Will that corrupt kernel
> >> >> > memory?
> >> >>
> >> >> Not needing address translation doesn't necessarily mean that there's no
> >> >> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> >> >> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> >> >> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> >> >> to program the IOMMU.
> >> >>
> >> >> For our use case, we don't need address translation because we set up an
> >> >> identity mapping in the IOMMU so that the device can use guest physical
> >> >> addresses.
> >
> > OK so I think I am beginning to see it in a different light.  Right now the 
> > specific
> > platform creates an identity mapping. That in turn means DMA API can be
> > fast - it does not need to do anything.  What you are looking for is a
> > way to tell host it's an identity mapping - just as an optimization.
> >
> > Is that right?
> 
> Almost. Theoretically it is just an optimization. But in practice the
> pseries boot firmware (SLOF) doesn't support IOMM

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-01 Thread Michael S. Tsirkin
On Thu, Jun 27, 2019 at 10:58:40PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> >>
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >> I rephrased it in terms of address translation. What do you think of
> >> >> this version? The flag name is slightly different too:
> >> >>
> >> >>
> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >> >> with the exception that address translation is guaranteed to be
> >> >> unnecessary when accessing memory addresses supplied to the device
> >> >> by the driver. Which is to say, the device will always use physical
> >> >> addresses matching addresses used by the driver (typically meaning
> >> >> physical addresses used by the CPU) and not translated further. This
> >> >> flag should be set by the guest if offered, but to allow for
> >> >> backward-compatibility device implementations allow for it to be
> >> >> left unset by the guest. It is an error to set both this flag and
> >> >> VIRTIO_F_ACCESS_PLATFORM.
> >> >
> >> >
> >> >
> >> >
> >> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> >> > drivers. This is why devices fail when it's not negotiated.
> >>
> >> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> >> implemented in guest userspace such as with VFIO? Or unprivileged in
> >> some other sense such as needing to use bounce buffers for some reason?
> >
> > I had drivers in guest userspace in mind.
> 
> Great. Thanks for clarifying.
> 
> I don't think this flag would work for guest userspace drivers. Should I
> add a note about that in the flag definition?

I think you need to clarify access protection rules. Is it only
translation that is bypassed or is any platform-specific
protection mechanism bypassed too?

> >> > This confuses me.
> >> > If driver is unpriveledged then what happens with this flag?
> >> > It can supply any address it wants. Will that corrupt kernel
> >> > memory?
> >>
> >> Not needing address translation doesn't necessarily mean that there's no
> >> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> >> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> >> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> >> to program the IOMMU.
> >>
> >> For our use case, we don't need address translation because we set up an
> >> identity mapping in the IOMMU so that the device can use guest physical
> >> addresses.

OK so I think I am beginning to see it in a different light.  Right now the 
specific
platform creates an identity mapping. That in turn means DMA API can be
fast - it does not need to do anything.  What you are looking for is a
way to tell host it's an identity mapping - just as an optimization.

Is that right?  So this is what I would call this option:

VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS

and the explanation should state that all device
addresses are translated by the platform to identical
addresses.

In fact this option then becomes more, not less restrictive
than VIRTIO_F_ACCESS_PLATFORM - it's a promise
by guest to only create identity mappings,
and only before driver_ok is set.
This option then would always be negotiated together with
VIRTIO_F_ACCESS_PLATFORM.

Host then must verify that
1. full 1:1 mappings are created before driver_ok
or can we make sure this happens before features_ok?
that would be ideal as we could require that features_ok fails
2. mappings are not modified between driver_ok and reset
i guess attempts to change them will fail -
possibly by causing a guest crash
or some other kind of platform-specific error

So far so good, but now a question:

how are we handling guest address width limitations?
Is VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS subject to
guest address width limitations?
I am guessing we can make them so ...
This needs to be documented.




> >
> > And can it access any guest physical address?
> 
> Sorry, I was mistaken. We do support VFIO in guests but not for virtio
> devices, only for regular PCI devices. In which case they will use
> address translation.

Not su

Re: [virtio-dev] Re: [PATCH v8 2/7] dt-bindings: virtio: Add virtio-pci-iommu node

2019-06-16 Thread Michael S. Tsirkin
On Fri, May 31, 2019 at 12:13:47PM +0100, Jean-Philippe Brucker wrote:
> On 30/05/2019 18:45, Michael S. Tsirkin wrote:
> > On Thu, May 30, 2019 at 06:09:24PM +0100, Jean-Philippe Brucker wrote:
> >> Some systems implement virtio-iommu as a PCI endpoint. The operating
> >> system needs to discover the relationship between IOMMU and masters long
> >> before the PCI endpoint gets probed. Add a PCI child node to describe the
> >> virtio-iommu device.
> >>
> >> The virtio-pci-iommu is conceptually split between a PCI programming
> >> interface and a translation component on the parent bus. The latter
> >> doesn't have a node in the device tree. The virtio-pci-iommu node
> >> describes both, by linking the PCI endpoint to "iommus" property of DMA
> >> master nodes and to "iommu-map" properties of bus nodes.
> >>
> >> Reviewed-by: Rob Herring 
> >> Reviewed-by: Eric Auger 
> >> Signed-off-by: Jean-Philippe Brucker 
> > 
> > So this is just an example right?
> > We are not defining any new properties or anything like that.
> 
> Yes it's just an example. The properties already exist but it's good to
> describe how to put them together for this particular case, because
> there isn't a precedent describing the topology for an IOMMU that
> appears on the PCI bus.
> 
> > I think down the road for non dt platforms we want to put this
> > info in the config space of the device. I do not think ACPI
> > is the best option for this since not all systems have it.
> > But that can wait.
> 
> There is the probe order problem - PCI needs this info before starting
> to probe devices on the bus.


This isn't all that special - it's pretty common for
IOMMUs to be pci devices. The solution is to have the device on
bus 0. For example, add it with

DECLARE_PCI_FIXUP_EARLY
or
DECLARE_PCI_FIXUP_CLASS_EARLY

in e.g.
arch/x86/kernel/quirks.c
or
drivers/pci/quirks.c

You can also use the configuration access capability
if there's need to access the device before its memory is
enabled.

> Maybe we could store the info in a separate
> memory region, that is referenced on the command-line and that the guest
> can read early.
> 
> Thanks,
> Jean

The point is to avoid command line hacks. Devices should be
self describing.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-06-03 Thread Michael S. Tsirkin
On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> I rephrased it in terms of address translation. What do you think of
> >> this version? The flag name is slightly different too:
> >>
> >>
> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >> with the exception that address translation is guaranteed to be
> >> unnecessary when accessing memory addresses supplied to the device
> >> by the driver. Which is to say, the device will always use physical
> >> addresses matching addresses used by the driver (typically meaning
> >> physical addresses used by the CPU) and not translated further. This
> >> flag should be set by the guest if offered, but to allow for
> >> backward-compatibility device implementations allow for it to be
> >> left unset by the guest. It is an error to set both this flag and
> >> VIRTIO_F_ACCESS_PLATFORM.
> >
> >
> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> > drivers. This is why devices fail when it's not negotiated.
> 
> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> implemented in guest userspace such as with VFIO? Or unprivileged in
> some other sense such as needing to use bounce buffers for some reason?

I had drivers in guest userspace in mind.

> > This confuses me.
> > If driver is unpriveledged then what happens with this flag?
> > It can supply any address it wants. Will that corrupt kernel
> > memory?
> 
> Not needing address translation doesn't necessarily mean that there's no
> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> to program the IOMMU.
> 
> For our use case, we don't need address translation because we set up an
> identity mapping in the IOMMU so that the device can use guest physical
> addresses.

And can it access any guest physical address?

> If the guest kernel is concerned that an unprivileged driver could
> jeopardize its integrity it should not negotiate this feature flag.

Unfortunately flag negotiation is done through config space
and so can be overwritten by the driver.

> Perhaps there should be a note about this in the flag definition? This
> concern is platform-dependant though. I don't believe it's an issue in
> pseries.

Again ACCESS_PLATFORM has a pretty open definition. It does actually
say it's all up to the platform.

Specifically how will VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION be
implemented portably? virtio has no portable way to know
whether DMA API bypasses translation.



> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [PATCH v8 2/7] dt-bindings: virtio: Add virtio-pci-iommu node

2019-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2019 at 06:09:24PM +0100, Jean-Philippe Brucker wrote:
> Some systems implement virtio-iommu as a PCI endpoint. The operating
> system needs to discover the relationship between IOMMU and masters long
> before the PCI endpoint gets probed. Add a PCI child node to describe the
> virtio-iommu device.
> 
> The virtio-pci-iommu is conceptually split between a PCI programming
> interface and a translation component on the parent bus. The latter
> doesn't have a node in the device tree. The virtio-pci-iommu node
> describes both, by linking the PCI endpoint to "iommus" property of DMA
> master nodes and to "iommu-map" properties of bus nodes.
> 
> Reviewed-by: Rob Herring 
> Reviewed-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

So this is just an example right?
We are not defining any new properties or anything like that.

I think down the road for non dt platforms we want to put this
info in the config space of the device. I do not think ACPI
is the best option for this since not all systems have it.
But that can wait.

> ---
>  .../devicetree/bindings/virtio/iommu.txt  | 66 +++
>  1 file changed, 66 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
> 
> diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
> b/Documentation/devicetree/bindings/virtio/iommu.txt
> new file mode 100644
> index ..2407fea0651c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/virtio/iommu.txt
> @@ -0,0 +1,66 @@
> +* virtio IOMMU PCI device
> +
> +When virtio-iommu uses the PCI transport, its programming interface is
> +discovered dynamically by the PCI probing infrastructure. However the
> +device tree statically describes the relation between IOMMU and DMA
> +masters. Therefore, the PCI root complex that hosts the virtio-iommu
> +contains a child node representing the IOMMU device explicitly.
> +
> +Required properties:
> +
> +- compatible:Should be "virtio,pci-iommu"
> +- reg:   PCI address of the IOMMU. As defined in the PCI Bus
> + Binding reference [1], the reg property is a five-cell
> + address encoded as (phys.hi phys.mid phys.lo size.hi
> + size.lo). phys.hi should contain the device's BDF as
> + 0b  dfff . The other cells
> + should be zero.
> +- #iommu-cells:  Each platform DMA master managed by the IOMMU is 
> assigned
> + an endpoint ID, described by the "iommus" property [2].
> + For virtio-iommu, #iommu-cells must be 1.
> +
> +Notes:
> +
> +- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
> +  virtio-iommu node doesn't have an "iommus" property, and is omitted from
> +  the iommu-map property of the root complex.
> +
> +Example:
> +
> +pcie@1000 {
> + compatible = "pci-host-ecam-generic";
> + ...
> +
> + /* The IOMMU programming interface uses slot 00:01.0 */
> + iommu0: iommu@0008 {
> + compatible = "virtio,pci-iommu";
> + reg = <0x0800 0 0 0 0>;
> + #iommu-cells = <1>;
> + };
> +
> + /*
> +  * The IOMMU manages all functions in this PCI domain except
> +  * itself. Omit BDF 00:01.0.
> +  */
> + iommu-map = <0x0  0x0 0x8>
> + <0x9  0x9 0xfff7>;
> +};
> +
> +pcie@2000 {
> + compatible = "pci-host-ecam-generic";
> + ...
> + /*
> +  * The IOMMU also manages all functions from this domain,
> +  * with endpoint IDs 0x1 - 0x1
> +  */
> + iommu-map = <0x0  0x1 0x1>;
> +};
> +
> +ethernet@fe001000 {
> + ...
> + /* The IOMMU manages this platform device with endpoint ID 0x2 */
> + iommus = < 0x2>;
> +};
> +
> +[1] Documentation/devicetree/bindings/pci/pci.txt
> +[2] Documentation/devicetree/bindings/iommu/iommu.txt
> -- 
> 2.21.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-05-27 Thread Michael S. Tsirkin
On Mon, May 27, 2019 at 11:26:04AM +0200, Joerg Roedel wrote:
> On Sun, May 12, 2019 at 12:31:59PM -0400, Michael S. Tsirkin wrote:
> > OK this has been in next for a while.
> > 
> > Last time IOMMU maintainers objected. Are objections
> > still in force?
> > 
> > If not could we get acks please?
> 
> No objections against the code, I only hesitated because the Spec was
> not yet official.
> 
> So for the code:
> 
>   Acked-by: Joerg Roedel 

Last spec patch had a bunch of comments not yet addressed.
But I do not remember whether comments are just about wording
or about the host/guest interface as well.
Jean-Philippe could you remind me please?

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-05-20 Thread Michael S. Tsirkin
On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> I rephrased it in terms of address translation. What do you think of
> this version? The flag name is slightly different too:
> 
> 
> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> with the exception that address translation is guaranteed to be
> unnecessary when accessing memory addresses supplied to the device
> by the driver. Which is to say, the device will always use physical
> addresses matching addresses used by the driver (typically meaning
> physical addresses used by the CPU) and not translated further. This
> flag should be set by the guest if offered, but to allow for
> backward-compatibility device implementations allow for it to be
> left unset by the guest. It is an error to set both this flag and
> VIRTIO_F_ACCESS_PLATFORM.


OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
drivers. This is why devices fail when it's not negotiated.

This confuses me.
If driver is unpriveledged then what happens with this flag?
It can supply any address it wants. Will that corrupt kernel
memory?

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-05-20 Thread Michael S. Tsirkin
On Fri, Apr 26, 2019 at 08:56:43PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 24, 2019 at 10:01:56PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >> >> >>
> >> >> >> Michael S. Tsirkin  writes:
> >> >> >>
> >> >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann 
> >> >> >> > wrote:
> >> >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the 
> >> >> >> >> >host will
> >> >> >> >> only ever try to access memory addresses that are supplied to it 
> >> >> >> >> by the
> >> >> >> >> guest, so all of the secure guest memory that the host cares 
> >> >> >> >> about is
> >> >> >> >> accessible:
> >> >> >> >>
> >> >> >> >> If this feature bit is set to 0, then the device has same 
> >> >> >> >> access to
> >> >> >> >> memory addresses supplied to it as the driver has. In 
> >> >> >> >> particular,
> >> >> >> >> the device will always use physical addresses matching 
> >> >> >> >> addresses
> >> >> >> >> used by the driver (typically meaning physical addresses used 
> >> >> >> >> by the
> >> >> >> >> CPU) and not translated further, and can access any address 
> >> >> >> >> supplied
> >> >> >> >> to it by the driver. When clear, this overrides any
> >> >> >> >> platform-specific description of whether device access is 
> >> >> >> >> limited or
> >> >> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >> >> >>
> >> >> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> >> >> guests or not.
> >> >> >> >>
> >> >> >> >> Or are you saying that a virtio device may want to access memory
> >> >> >> >> addresses that weren't supplied to it by the driver?
> >> >> >> >
> >> >> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> >> >> > specific encrypted memory regions that driver has access to but 
> >> >> >> > device
> >> >> >> > does not. that seems to violate the constraint.
> >> >> >>
> >> >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> >> >> the device can ignore the IOMMU for all practical purposes I would
> >> >> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >> >> >>
> >> >> >> I guess I'm still struggling with the purpose of signalling to the
> >> >> >> driver that the host may not have access to memory addresses that it
> >> >> >> will never try to access.
> >> >> >
> >> >> > For example, one of the benefits is to signal to host that driver does
> >> >> > not expect ability to access all memory. If it does, host can
> >> >> > fail initialization gracefully.
> >> >>
> >> >> But why would the ability to access all memory be necessary or even
> >> >> useful? When would the host access memory that the driver didn't tell it
> >> >> to access?
> >> >
> >> > When I say all memory I mean even memory not allowed by the IOMMU.
> >>
> >> Yes, but why? How is that memory relevant?
> >
> > It's relevant when driver is not trusted to only supply correct
> > addresses. The feature was originally designed to support userspace
> > drivers within guests.
> 
> Ah, thanks for clarifying. I don't think that's a problem in our case.
> If the guest provides an incorrect address, the hardware simply 

Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-05-12 Thread Michael S. Tsirkin
On Tue, Jan 15, 2019 at 12:19:52PM +, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.9 [1].
> 
> This is a simple rebase onto Linux v5.0-rc2. We now use the
> dev_iommu_fwspec_get() helper introduced in v5.0 instead of accessing
> dev->iommu_fwspec, but there aren't any functional change from v6 [2].
> 
> Our current goal for virtio-iommu is to get a paravirtual IOMMU working
> on Arm, and enable device assignment to guest userspace. In this
> use-case the mappings are static, and don't require optimal performance,
> so this series tries to keep things simple. However there is plenty more
> to do for features and optimizations, and having this base in v5.1 would
> be good. Given that most of the changes are to drivers/iommu, I believe
> the driver and future changes should go via the IOMMU tree.
> 
> You can find Linux driver and kvmtool device on v0.9.2 branches [3],
> module and x86 support on virtio-iommu/devel. Also tested with Eric's
> QEMU device [4]. Please note that the series depends on Robin's
> probe-deferral fix [5], which will hopefully land in v5.0.
> 
> [1] Virtio-iommu specification v0.9, sources and pdf
> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> 
> [2] [PATCH v6 0/7] Add virtio-iommu driver
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2018-December/032127.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.2
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9.2
> 
> [4] [RFC v9 00/17] VIRTIO-IOMMU device
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
> 
> [5] [PATCH] iommu/of: Fix probe-deferral
>     https://www.spinics.net/lists/arm-kernel/msg698371.html

For virtio things:

Acked-by: Michael S. Tsirkin 



> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   of: Allow the iommu-map property to omit untranslated devices
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>  MAINTAINERS   |7 +
>  drivers/iommu/Kconfig |   11 +
>  drivers/iommu/Makefile|1 +
>  drivers/iommu/virtio-iommu.c  | 1158 +
>  drivers/of/base.c |   10 +-
>  drivers/pci/of.c  |7 +
>  include/uapi/linux/virtio_ids.h   |1 +
>  include/uapi/linux/virtio_iommu.h |  161 +++
>  10 files changed, 1449 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> -- 
> 2.19.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-05-12 Thread Michael S. Tsirkin
On Tue, Jan 15, 2019 at 12:19:52PM +, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.9 [1].
> 
> This is a simple rebase onto Linux v5.0-rc2. We now use the
> dev_iommu_fwspec_get() helper introduced in v5.0 instead of accessing
> dev->iommu_fwspec, but there aren't any functional change from v6 [2].
> 
> Our current goal for virtio-iommu is to get a paravirtual IOMMU working
> on Arm, and enable device assignment to guest userspace. In this
> use-case the mappings are static, and don't require optimal performance,
> so this series tries to keep things simple. However there is plenty more
> to do for features and optimizations, and having this base in v5.1 would
> be good. Given that most of the changes are to drivers/iommu, I believe
> the driver and future changes should go via the IOMMU tree.
> 
> You can find Linux driver and kvmtool device on v0.9.2 branches [3],
> module and x86 support on virtio-iommu/devel. Also tested with Eric's
> QEMU device [4]. Please note that the series depends on Robin's
> probe-deferral fix [5], which will hopefully land in v5.0.
> 
> [1] Virtio-iommu specification v0.9, sources and pdf
> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> 
> [2] [PATCH v6 0/7] Add virtio-iommu driver
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2018-December/032127.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.2
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9.2
> 
> [4] [RFC v9 00/17] VIRTIO-IOMMU device
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
> 
> [5] [PATCH] iommu/of: Fix probe-deferral
> https://www.spinics.net/lists/arm-kernel/msg698371.html


OK this has been in next for a while.

Last time IOMMU maintainers objected. Are objections
still in force?

If not could we get acks please?


> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   of: Allow the iommu-map property to omit untranslated devices
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>  MAINTAINERS   |7 +
>  drivers/iommu/Kconfig |   11 +
>  drivers/iommu/Makefile|1 +
>  drivers/iommu/virtio-iommu.c  | 1158 +
>  drivers/of/base.c |   10 +-
>  drivers/pci/of.c  |7 +
>  include/uapi/linux/virtio_ids.h   |1 +
>  include/uapi/linux/virtio_iommu.h |  161 +++
>  10 files changed, 1449 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> -- 
> 2.19.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-24 Thread Michael S. Tsirkin
On Wed, Apr 24, 2019 at 10:01:56PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host 
> >> >> >> >will
> >> >> >> only ever try to access memory addresses that are supplied to it by 
> >> >> >> the
> >> >> >> guest, so all of the secure guest memory that the host cares about is
> >> >> >> accessible:
> >> >> >>
> >> >> >> If this feature bit is set to 0, then the device has same access 
> >> >> >> to
> >> >> >> memory addresses supplied to it as the driver has. In particular,
> >> >> >> the device will always use physical addresses matching addresses
> >> >> >> used by the driver (typically meaning physical addresses used by 
> >> >> >> the
> >> >> >> CPU) and not translated further, and can access any address 
> >> >> >> supplied
> >> >> >> to it by the driver. When clear, this overrides any
> >> >> >> platform-specific description of whether device access is 
> >> >> >> limited or
> >> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >> >>
> >> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> >> guests or not.
> >> >> >>
> >> >> >> Or are you saying that a virtio device may want to access memory
> >> >> >> addresses that weren't supplied to it by the driver?
> >> >> >
> >> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> >> > specific encrypted memory regions that driver has access to but device
> >> >> > does not. that seems to violate the constraint.
> >> >>
> >> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> >> the device can ignore the IOMMU for all practical purposes I would
> >> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >> >>
> >> >> I guess I'm still struggling with the purpose of signalling to the
> >> >> driver that the host may not have access to memory addresses that it
> >> >> will never try to access.
> >> >
> >> > For example, one of the benefits is to signal to host that driver does
> >> > not expect ability to access all memory. If it does, host can
> >> > fail initialization gracefully.
> >>
> >> But why would the ability to access all memory be necessary or even
> >> useful? When would the host access memory that the driver didn't tell it
> >> to access?
> >
> > When I say all memory I mean even memory not allowed by the IOMMU.
> 
> Yes, but why? How is that memory relevant?

It's relevant when driver is not trusted to only supply correct
addresses. The feature was originally designed to support userspace
drivers within guests.

> >> >> >> >> > But the name "sev_active" makes me scared because at least AMD 
> >> >> >> >> > guys who
> >> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >> >> >>
> >> >> >> >> My understanding is, AMD guest-platform knows in advance that 
> >> >> >> >> their
> >> >> >> >> guest will run in secure mode and hence sets the flag at the time 
> >> >> >> >> of VM
> >> >> >> >> instantiation. Unfortunately we dont have that luxury on our 
> >> >> >> >> platforms.
> >> >> >> >
> >> >> >> > Well you do have that luxury. It looks like that there are existing
> >> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not 
> >> >>

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-19 Thread Michael S. Tsirkin
On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
> >> >> only ever try to access memory addresses that are supplied to it by the
> >> >> guest, so all of the secure guest memory that the host cares about is
> >> >> accessible:
> >> >>
> >> >> If this feature bit is set to 0, then the device has same access to
> >> >> memory addresses supplied to it as the driver has. In particular,
> >> >> the device will always use physical addresses matching addresses
> >> >> used by the driver (typically meaning physical addresses used by the
> >> >> CPU) and not translated further, and can access any address supplied
> >> >> to it by the driver. When clear, this overrides any
> >> >> platform-specific description of whether device access is limited or
> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >>
> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> guests or not.
> >> >>
> >> >> Or are you saying that a virtio device may want to access memory
> >> >> addresses that weren't supplied to it by the driver?
> >> >
> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> > specific encrypted memory regions that driver has access to but device
> >> > does not. that seems to violate the constraint.
> >>
> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> the device can ignore the IOMMU for all practical purposes I would
> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >>
> >> I guess I'm still struggling with the purpose of signalling to the
> >> driver that the host may not have access to memory addresses that it
> >> will never try to access.
> >
> > For example, one of the benefits is to signal to host that driver does
> > not expect ability to access all memory. If it does, host can
> > fail initialization gracefully.
> 
> But why would the ability to access all memory be necessary or even
> useful? When would the host access memory that the driver didn't tell it
> to access?

When I say all memory I mean even memory not allowed by the IOMMU.


> >> >> >> > But the name "sev_active" makes me scared because at least AMD 
> >> >> >> > guys who
> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >> >>
> >> >> >> My understanding is, AMD guest-platform knows in advance that their
> >> >> >> guest will run in secure mode and hence sets the flag at the time of 
> >> >> >> VM
> >> >> >> instantiation. Unfortunately we dont have that luxury on our 
> >> >> >> platforms.
> >> >> >
> >> >> > Well you do have that luxury. It looks like that there are existing
> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
> >> >> > with how that path is slow. So you are trying to optimize for
> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability
> >> >> > to invoke DMA API.
> >> >> >
> >> >> > For example if there was another flag just like ACCESS_PLATFORM
> >> >> > just not yet used by anyone, you would be all fine using that right?
> >> >>
> >> >> Yes, a new flag sounds like a great idea. What about the definition
> >> >> below?
> >> >>
> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
> >> >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
> >> >> exception that the IOMMU is explicitly defined to be off or bypassed
> >> >> when accessing memory addresses supplied to the device by the
> >> >> driver. This flag should be set by the guest if offered, but to
> >> >> allow for backward-compatibility device implem

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-26 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 08:44:27AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 29, 2019 at 09:36:08PM -0500, Michael S. Tsirkin wrote:
> > This has been discussed ad nauseum. virtio is all about compatibility.
> > Losing a couple of lines of code isn't worth breaking working setups.
> > People that want "just use DMA API no tricks" now have the option.
> > Setting a flag in a feature bit map is literally a single line
> > of code in the hypervisor. So stop pushing for breaking working
> > legacy setups and just fix it in the right place.
> 
> I agree with the legacy aspect.  What I am missing is an extremely
> strong wording that says you SHOULD always set this flag for new
> hosts, including an explanation why.


So as far as power is concerned, IIUC the issue they are struggling with is
that some platforms do not support pass-through mode in the emulated IOMMU.
Disabling PLATFORM_ACCESS is so far a way around that, unfortunately just for
virtio devices.  I would like virtio-iommu to be able to address that need as
well.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-23 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> Another way of looking at this issue which also explains our reluctance
> >> >> is that the only difference between a secure guest and a regular guest
> >> >> (at least regarding virtio) is that the former uses swiotlb while the
> >> >> latter doens't.
> >> >
> >> > But swiotlb is just one implementation. It's a guest internal thing. The
> >> > issue is that memory isn't host accessible.
> >>
> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
> >> only ever try to access memory addresses that are supplied to it by the
> >> guest, so all of the secure guest memory that the host cares about is
> >> accessible:
> >>
> >> If this feature bit is set to 0, then the device has same access to
> >> memory addresses supplied to it as the driver has. In particular,
> >> the device will always use physical addresses matching addresses
> >> used by the driver (typically meaning physical addresses used by the
> >> CPU) and not translated further, and can access any address supplied
> >> to it by the driver. When clear, this overrides any
> >> platform-specific description of whether device access is limited or
> >> translated in any way, e.g. whether an IOMMU may be present.
> >>
> >> All of the above is true for POWER guests, whether they are secure
> >> guests or not.
> >>
> >> Or are you saying that a virtio device may want to access memory
> >> addresses that weren't supplied to it by the driver?
> >
> > Your logic would apply to IOMMUs as well.  For your mode, there are
> > specific encrypted memory regions that driver has access to but device
> > does not. that seems to violate the constraint.
> 
> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> the device can ignore the IOMMU for all practical purposes I would
> indeed say that the logic would apply to IOMMUs as well. :-)
> 
> I guess I'm still struggling with the purpose of signalling to the
> driver that the host may not have access to memory addresses that it
> will never try to access.

For example, one of the benefits is to signal to host that driver does
not expect ability to access all memory. If it does, host can
fail initialization gracefully.

> >> >> And from the device's point of view they're
> >> >> indistinguishable. It can't tell one guest that is using swiotlb from
> >> >> one that isn't. And that implies that secure guest vs regular guest
> >> >> isn't a virtio interface issue, it's "guest internal affairs". So
> >> >> there's no reason to reflect that in the feature flags.
> >> >
> >> > So don't. The way not to reflect that in the feature flags is
> >> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
> >> >
> >> >
> >> > Without ACCESS_PLATFORM
> >> > virtio has a very specific opinion about the security of the
> >> > device, and that opinion is that device is part of the guest
> >> > supervisor security domain.
> >>
> >> Sorry for being a bit dense, but not sure what "the device is part of
> >> the guest supervisor security domain" means. In powerpc-speak,
> >> "supervisor" is the operating system so perhaps that explains my
> >> confusion. Are you saying that without ACCESS_PLATFORM, the guest
> >> considers the host to be part of the guest operating system's security
> >> domain?
> >
> > I think so. The spec says "device has same access as driver".
> 
> Ok, makes sense.
> 
> >> If so, does that have any other implication besides "the host
> >> can access any address supplied to it by the driver"? If that is the
> >> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
> >> include that information because it's not part of the current
> >> definition.
> >>
> >> >> > But the name "sev_active" makes me scared because at least AMD guys 
> >> >> > who
> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >>
> >> >> My understanding is, AMD guest-platform knows in advance that their
> >> >> guest will run in 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> Another way of looking at this issue which also explains our reluctance
> >> is that the only difference between a secure guest and a regular guest
> >> (at least regarding virtio) is that the former uses swiotlb while the
> >> latter doens't.
> >
> > But swiotlb is just one implementation. It's a guest internal thing. The
> > issue is that memory isn't host accessible.
> 
> >From what I understand of the ACCESS_PLATFORM definition, the host will
> only ever try to access memory addresses that are supplied to it by the
> guest, so all of the secure guest memory that the host cares about is
> accessible:
> 
> If this feature bit is set to 0, then the device has same access to
> memory addresses supplied to it as the driver has. In particular,
> the device will always use physical addresses matching addresses
> used by the driver (typically meaning physical addresses used by the
> CPU) and not translated further, and can access any address supplied
> to it by the driver. When clear, this overrides any
> platform-specific description of whether device access is limited or
> translated in any way, e.g. whether an IOMMU may be present.
> 
> All of the above is true for POWER guests, whether they are secure
> guests or not.
> 
> Or are you saying that a virtio device may want to access memory
> addresses that weren't supplied to it by the driver?

Your logic would apply to IOMMUs as well.  For your mode, there are
specific encrypted memory regions that driver has access to but device
does not. that seems to violate the constraint.


> >> And from the device's point of view they're
> >> indistinguishable. It can't tell one guest that is using swiotlb from
> >> one that isn't. And that implies that secure guest vs regular guest
> >> isn't a virtio interface issue, it's "guest internal affairs". So
> >> there's no reason to reflect that in the feature flags.
> >
> > So don't. The way not to reflect that in the feature flags is
> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
> >
> >
> > Without ACCESS_PLATFORM
> > virtio has a very specific opinion about the security of the
> > device, and that opinion is that device is part of the guest
> > supervisor security domain.
> 
> Sorry for being a bit dense, but not sure what "the device is part of
> the guest supervisor security domain" means. In powerpc-speak,
> "supervisor" is the operating system so perhaps that explains my
> confusion. Are you saying that without ACCESS_PLATFORM, the guest
> considers the host to be part of the guest operating system's security
> domain?

I think so. The spec says "device has same access as driver".

> If so, does that have any other implication besides "the host
> can access any address supplied to it by the driver"? If that is the
> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
> include that information because it's not part of the current
> definition.
> 
> >> That said, we still would like to arrive at a proper design for this
> >> rather than add yet another hack if we can avoid it. So here's another
> >> proposal: considering that the dma-direct code (in kernel/dma/direct.c)
> >> automatically uses swiotlb when necessary (thanks to Christoph's recent
> >> DMA work), would it be ok to replace virtio's own direct-memory code
> >> that is used in the !ACCESS_PLATFORM case with the dma-direct code? That
> >> way we'll get swiotlb even with !ACCESS_PLATFORM, and virtio will get a
> >> code cleanup (replace open-coded stuff with calls to existing
> >> infrastructure).
> >
> > Let's say I have some doubts that there's an API that
> > matches what virtio with its bag of legacy compatibility exactly.
> 
> Ok.
> 
> >> > But the name "sev_active" makes me scared because at least AMD guys who
> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >>
> >> My understanding is, AMD guest-platform knows in advance that their
> >> guest will run in secure mode and hence sets the flag at the time of VM
> >> instantiation. Unfortunately we dont have that luxury on our platforms.
> >
> > Well you do have that luxury. It looks like that there are existing
> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
> > with how that path is slow. So you are trying to optimize for
> > them by clearing ACCESS_PLATFORM and then you have lost ability
> > to invoke DMA API.
> >
> > For example if there was another flag just like ACCESS_PLATFORM
> > just not yet used by anyone, you would be all fine using that right?
> 
> Yes, a new flag sounds like a great idea. What about the definition
> below?
> 
> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
> exception that the IOMMU is explicitly defined to be off or bypassed
> when accessing memory addresses supplied to 

Re: [PATCH 0/5 v6] Fix virtio-blk issue with SWIOTLB

2019-02-05 Thread Michael S. Tsirkin
On Fri, Feb 01, 2019 at 09:09:46AM +0100, Christoph Hellwig wrote:
> For some reason patch 5 didn't make it to my inbox, but assuming
> nothing has changed this whole series looks good to me now.

Could you send a formal series ack pls?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5 v6] Fix virtio-blk issue with SWIOTLB

2019-02-05 Thread Michael S. Tsirkin
On Thu, Jan 31, 2019 at 05:33:58PM +0100, Joerg Roedel wrote:
> Hi,
> 
> here is the next version of this patch-set. Previous
> versions can be found here:
> 
>   V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/
> 
>   V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/
> 
>   V3: https://lore.kernel.org/lkml/20190123163049.24863-1-j...@8bytes.org/
> 
>   V4: https://lore.kernel.org/lkml/20190129084342.26030-1-j...@8bytes.org/
> 
>   V5: https://lore.kernel.org/lkml/20190130164007.26497-1-j...@8bytes.org/
> 
> The problem solved here is a limitation of the SWIOTLB implementation,
> which does not support allocations larger than 256kb.  When the
> virtio-blk driver tries to read/write a block larger than that, the
> allocation of the dma-handle fails and an IO error is reported.
> 
> Changes to v5 are:
> 
>   - Changed patch 3 to uninline dma_max_mapping_size()

And this lead to problems reported by kbuild :(

BTW when you repost, can I ask you to pls include
the version in all patches? Both --subject-prefix
and -v flags to git format-patch will do this for you.


> Please review.
> 
> Thanks,
> 
>   Joerg
> 
> Joerg Roedel (5):
>   swiotlb: Introduce swiotlb_max_mapping_size()
>   swiotlb: Add is_swiotlb_active() function
>   dma: Introduce dma_max_mapping_size()
>   virtio: Introduce virtio_max_dma_size()
>   virtio-blk: Consider virtio_max_dma_size() for maximum segment size
> 
>  Documentation/DMA-API.txt|  8 
>  drivers/block/virtio_blk.c   | 10 ++
>  drivers/virtio/virtio_ring.c | 11 +++
>  include/linux/dma-mapping.h  |  8 
>  include/linux/swiotlb.h  | 11 +++
>  include/linux/virtio.h   |  2 ++
>  kernel/dma/direct.c  | 11 +++
>  kernel/dma/mapping.c | 14 ++
>  kernel/dma/swiotlb.c | 14 ++
>  9 files changed, 85 insertions(+), 4 deletions(-)
> 
> -- 
> 2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-05 Thread Michael S. Tsirkin
On Tue, Feb 05, 2019 at 08:24:07AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 04, 2019 at 04:38:21PM -0500, Michael S. Tsirkin wrote:
> > It was designed to make, when set, as many guests as we can work
> > correctly, and it seems to be successful in doing exactly that.
> > 
> > Unfortunately there could be legacy guests that do work correctly but
> > become slow. Whether trying to somehow work around that
> > can paint us into a corner where things again don't
> > work for some people is a question worth discussing.
> 
> The other problem is that some qemu machines just throw passthrough
> devices and virtio devices on the same virtual PCI(e) bus, and have a
> common IOMMU setup for the whole bus / root port / domain.  I think
> this is completely bogus, but unfortunately it is out in the field.
> 
> Given that power is one of these examples I suspect that is what
> Thiago referes to.  But in this case the answer can't be that we
> pile on hack ontop of another, but instead introduce a new qemu
> machine that separates these clearly, and make that mandatory for
> the secure guest support.

That could we one approach, assuming one exists that guests
already support.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Michael S. Tsirkin
On Mon, Feb 04, 2019 at 04:15:41PM -0200, Thiago Jung Bauermann wrote:
> 
> Christoph Hellwig  writes:
> 
> > On Tue, Jan 29, 2019 at 09:36:08PM -0500, Michael S. Tsirkin wrote:
> >> This has been discussed ad nauseum. virtio is all about compatibility.
> >> Losing a couple of lines of code isn't worth breaking working setups.
> >> People that want "just use DMA API no tricks" now have the option.
> >> Setting a flag in a feature bit map is literally a single line
> >> of code in the hypervisor. So stop pushing for breaking working
> >> legacy setups and just fix it in the right place.
> >
> > I agree with the legacy aspect.  What I am missing is an extremely
> > strong wording that says you SHOULD always set this flag for new
> > hosts, including an explanation why.
> 
> My understanding of ACCESS_PLATFORM is that it means "this device will
> behave in all aspects like a regular device attached to this bus".


Not really. Look it up in the spec:

VIRTIO_F_ACCESS_PLATFORM(33) This feature indicates that the device can be used 
on a platform
where device access to data in memory is limited and/or translated. 
E.g. this is the case if the device
can be located behind an IOMMU that translates bus addresses from the 
device into physical addresses
in memory, if the device can be limited to only access certain memory 
addresses or if special commands
such as a cache flush can be needed to synchronise data in memory with 
the device. Whether accesses
are actually limited or translated is described by platform-specific 
means. If this feature bit is set to 0,
then the device has same access to memory addresses supplied to it as 
the driver has. In particular, the
device will always use physical addresses matching addresses used by 
the driver (typically meaning
physical addresses used by the CPU) and not translated further, and can 
access any address supplied
to it by the driver. When clear, this overrides any platform-specific 
description of whether device access
is limited or translated in any way, e.g. whether an IOMMU may be 
present.



> Is
> that it? Therefore it should be set because it's the sane thing to do?

It's the sane thing to do unless you want the very specific thing that
having it clear means, which is just have it be another CPU.

It was designed to make, when set, as many guests as we can work
correctly, and it seems to be successful in doing exactly that.

Unfortunately there could be legacy guests that do work correctly but
become slow. Whether trying to somehow work around that
can paint us into a corner where things again don't
work for some people is a question worth discussing.


> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Michael S. Tsirkin
On Mon, Feb 04, 2019 at 04:14:20PM -0200, Thiago Jung Bauermann wrote:
> 
> Hello Michael,
> 
> Michael S. Tsirkin  writes:
> 
> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> >>
> >> Fixing address of powerpc mailing list.
> >>
> >> Thiago Jung Bauermann  writes:
> >>
> >> > Hello,
> >> >
> >> > With Christoph's rework of the DMA API that recently landed, the patch
> >> > below is the only change needed in virtio to make it work in a POWER
> >> > secure guest under the ultravisor.
> >> >
> >> > The other change we need (making sure the device's dma_map_ops is NULL
> >> > so that the dma-direct/swiotlb code is used) can be made in
> >> > powerpc-specific code.
> >> >
> >> > Of course, I also have patches (soon to be posted as RFC) which hook up
> >> >  to the powerpc secure guest support code.
> >> >
> >> > What do you think?
> >> >
> >> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> >> > From: Thiago Jung Bauermann 
> >> > Date: Thu, 24 Jan 2019 22:08:02 -0200
> >> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> >> > encrypted
> >> >
> >> > The host can't access the guest memory when it's encrypted, so using
> >> > regular memory pages for the ring isn't an option. Go through the DMA 
> >> > API.
> >> >
> >> > Signed-off-by: Thiago Jung Bauermann 
> >
> > Well I think this will come back to bite us (witness xen which is now
> > reworking precisely this path - but at least they aren't to blame, xen
> > came before ACCESS_PLATFORM).
> >
> > I also still think the right thing would have been to set
> > ACCESS_PLATFORM for all systems where device can't access all memory.
> 
> I understand. The problem with that approach for us is that because we
> don't know which guests will become secure guests and which will remain
> regular guests, QEMU would need to offer ACCESS_PLATFORM to all guests.
> 
> And the problem with that is that for QEMU on POWER, having
> ACCESS_PLATFORM turned off means that it can bypass the IOMMU for the
> device (which makes sense considering that the name of the flag was
> IOMMU_PLATFORM). And we need that for regular guests to avoid
> performance degradation.

You don't really, ACCESS_PLATFORM means just that, platform decides.

> So while ACCESS_PLATFORM solves our problems for secure guests, we can't
> turn it on by default because we can't affect legacy systems. Doing so
> would penalize existing systems that can access all memory. They would
> all have to unnecessarily go through address translations, and take a
> performance hit.

So as step one, you just give hypervisor admin an option to run legacy
systems faster by blocking secure mode. I don't see why that is
so terrible.

But as step two, assuming you use above step one to make legacy
guests go fast - maybe there is a point in detecting
such a hypervisor and doing something smarter with it.
By all means let's have a discussion around this but that is no longer
"to make it work" as the commit log says it's more a performance
optimization.


> The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
> in advance - right when the VM is instantiated - that it will not have
> access to all guest memory.

Not quite. It just means that hypervisor can live with not having
access to all memory. If platform wants to give it access
to all memory that is quite all right.


> Unfortunately that assumption is subtly
> broken on our secure-platform. The hypervisor/QEMU realizes that the
> platform is going secure only *after the VM is instantiated*. It's the
> kernel running in the VM that determines that it wants to switch the
> platform to secure-mode.

ACCESS_PLATFORM is there so guests can detect legacy hypervisors
which always assumed it's another CPU.

> Another way of looking at this issue which also explains our reluctance
> is that the only difference between a secure guest and a regular guest
> (at least regarding virtio) is that the former uses swiotlb while the
> latter doens't.

But swiotlb is just one implementation. It's a guest internal thing. The
issue is that memory isn't host accessible.  Yes linux does not use that
info too much right now but it already begins to seep out of the
abstraction.  For example as you are doing data copies you should maybe
calculate the packet checksum just as well.  Not something DMA API will
let you know right now, but that's because any bounce buffer users so
far weren't terribly fast a

Re: [PATCH] dma: Uninline dma_max_mapping_size()

2019-01-31 Thread Michael S. Tsirkin
On Thu, Jan 31, 2019 at 03:37:23PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 31, 2019 at 02:01:27PM +0100, Joerg Roedel wrote:
> > On Thu, Jan 31, 2019 at 11:41:29AM +0100, Christoph Hellwig wrote:
> > > Sorry for not noticing last time, but since 5.0 we keep all non-fast
> > > path DMA mapping interfaces out of line, so this should move to
> > > kernel/dma/mapping.c.
> > 
> > Okay, attached patch does that. It applies on-top of this patch-set.
> > 
> > Michael, feel free to either apply this on-top of the patch-set or merge
> > the diff into patch 3, whatever you prefer.
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig 
> 
> I'd prefer it to be squashed if possible.

OK. Joerg can you repost the series with this squashed
and all acks applied?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >