Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-04-11 Thread Jean-Philippe Brucker
On 23/03/18 08:27, Tian, Kevin wrote:
>>> The host kernel needs to have *some* MSI region in place before the
>>> guest can start configuring interrupts, otherwise it won't know what
>>> address to give to the underlying hardware. However, as soon as the host
>>> kernel has picked a region, host userspace needs to know that it can no
>>> longer use addresses in that region for DMA-able guest memory. It's a
>>> lot easier when the address is fixed in hardware and the host userspace
>>> will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in
>>> the
>>> more general case where MSI writes undergo IOMMU address translation
>>> so
>>> it's an arbitrary IOVA, this has the potential to conflict with stuff
>>> like guest memory hotplug.
>>>
>>> What we currently have is just the simplest option, with the host kernel
>>> just picking something up-front and pretending to host userspace that
>>> it's a fixed hardware address. There's certainly scope for it to be a
>>> bit more dynamic in the sense of adding an interface to let userspace
>>> move it around (before attaching any devices, at least), but I don't
>>> think it's feasible for the host kernel to second-guess userspace enough
>>> to make it entirely transparent like it is in the DMA API domain case.
>>>
>>> Of course, that's all assuming the host itself is using a virtio-iommu
>>> (e.g. in a nested virt or emulation scenario). When it's purely within a
>>> guest then an MSI reservation shouldn't matter so much, since the guest
>>> won't be anywhere near the real hardware configuration anyway.
>>>
>>> Robin.
>>
>> Curious since anyway we are defining a new iommu architecture
>> is it possible to avoid those ARM-specific burden completely?
>>
> 
> OK, after some study around those tricks below is my learning:
> 
> - MSI_IOVA window is used only on request (iommu_dma_get
> _msi_page), not meant to take effect on all architectures once 
> initialized. e.g. ARM GIC does it but not x86. So it is reasonable 
> for virtio-iommu driver to implement such capability;
> 
> - I thought whether hardware MSI doorbell can be always reported
> on virtio-iommu since it's newly defined. Looks there is a problem
> if underlying IOMMU is sw-managed MSI style - valid mapping is
> expected in all level of translations, meaning guest has to manage
> stage-1 mapping in nested configuration since stage-1 is owned
> by guest. 
> 
> Then virtio-iommu is naturally expected to report the same MSI 
> model as supported by underlying hardware. Below are some
> further thoughts along this route (use 'IOMMU' to represent the
> physical one and 'virtio-iommu' for virtual one):
> 
> 
> 
> In the scope of current virtio-iommu spec v.6, there is no nested
> consideration yet. Guest driver is expected to use MAP/UNMAP
> interface on assigned endpoints. In this case the MAP requests
> (IOVA->GPA) is caught and maintained within Qemu which then 
> further talks to VFIO to map IOVA->HPA in IOMMU.
> 
> Qemu can learn the MSI model of IOMMU from sysfs.
> 
> For hardware MSI doorbell (x86 and some ARM):
> * Host kernel reports to Qemu as IOMMU_RESV_MSI
> * Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_MSI
> * Guest takes the range as IOMMU_RESV_MSI. reserved
> * Qemu MAP database has no mapping for the doorbell
> * Physical IOMMU page table has no mapping for the doorbell
> * MSI from passthrough device bypass IOMMU
> * MSI from emulated device bypass virtio-iommu
> 
> For software MSI doorbell (most ARM):
> * Host kernel reports to Qemu as IOMMU_RESV_SW_MSI
> * Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_RESERVED
> * Guest takes the range as IOMMU_RESV_RESERVED
> * vGIC requests to map 'GPA of the virtual doorbell'
> * a map request (IOVA->GPA) sent on endpoint
> * Qemu maintains the mapping in MAP database
>   * but no VFIO_MAP request since it's purely virtual
> * GIC requests to map 'HPA of the physical doorbell'
>   * e.g. triggered by VFIO enable msi
> * IOMMU now includes a valid mapping (IOVA->HPA)
> * MSI from emulated device go through Qemu MAP
> database (IOVA->'GPA of virtual doorbell') and then hit vGIC
> * MSI from passthrough device go through IOMMU
> (IOVA->'HPA of physical doorbell') and then hit GIC
> 
> In this case, host doorbell is treated as reserved resource in
> guest side. Guest has its own sw-management for virtual
> doorbell which is only used for emulated device. two paths 
> are completely separated.
> 
> If above captures the right flow, current v0.6 spec is complete
> regarding to required function definition.

Yes I think this summarizes well the current state or SW/HW MSI

> Then comes nested case, with two level page tables (stage-1
> and stage-2) in IOMMU. stage-1 is for IOVA->GPA and stage-2
> is for GPA->HPA. VFIO map/unmap happens on stage-2, 
> while stage-1 is directly managed by guest (and bound to
> IOMMU which enables nested translation from IOVA->GPA
> ->HPA).
> 
> For hardware MSI, there is nothing special compared to
> 

Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-04-11 Thread Jean-Philippe Brucker
On 23/03/18 14:48, Robin Murphy wrote:
[..]
>> + * Virtio driver for the paravirtualized IOMMU
>> + *
>> + * Copyright (C) 2018 ARM Limited
>> + * Author: Jean-Philippe Brucker 
>> + *
>> + * SPDX-License-Identifier: GPL-2.0
> 
> This wants to be a // comment at the very top of the file (thankfully 
> the policy is now properly documented in-tree since 
> Documentation/process/license-rules.rst got merged)

Ok

[...]
>> +
>> +/*
>> + * viommu_del_mappings - remove mappings from the internal tree
>> + *
>> + * @vdomain: the domain
>> + * @iova: start of the range
>> + * @size: size of the range. A size of 0 corresponds to the entire address
>> + *  space.
>> + * @out_mapping: if not NULL, the first removed mapping is returned in 
>> there.
>> + *  This allows the caller to reuse the buffer for the unmap request. When
>> + *  the returned size is greater than zero, if a mapping is returned, the
>> + *  caller must free it.
> 
> This "free multiple mappings except maybe hand one of them off to the 
> caller" interface is really unintuitive. AFAICS it's only used by 
> viommu_unmap() to grab mapping->req, but that doesn't seem to care about 
> mapping itself, so I wonder whether it wouldn't make more sense to just 
> have a global kmem_cache of struct virtio_iommu_req_unmap for that and 
> avoid a lot of complexity...

Well it's a small complication for what I hoped would be a meanignful
performance difference, but more below.

>> +
>> +/* IOMMU API */
>> +
>> +static bool viommu_capable(enum iommu_cap cap)
>> +{
>> +return false;
>> +}
> 
> The .capable callback is optional, so it's only worth implementing once 
> you want it to do something beyond the default behaviour.
> 

Ah, right

[...]
>> +static size_t viommu_unmap(struct iommu_domain *domain, unsigned long iova,
>> +   size_t size)
>> +{
>> +int ret = 0;
>> +size_t unmapped;
>> +struct viommu_mapping *mapping = NULL;
>> +struct viommu_domain *vdomain = to_viommu_domain(domain);
>> +
>> +unmapped = viommu_del_mappings(vdomain, iova, size, );
>> +if (unmapped < size) {
>> +ret = -EINVAL;
>> +goto out_free;
>> +}
>> +
>> +/* Device already removed all mappings after detach. */
>> +if (!vdomain->endpoints)
>> +goto out_free;
>> +
>> +if (WARN_ON(!mapping))
>> +return 0;
>> +
>> +mapping->req.unmap = (struct virtio_iommu_req_unmap) {
>> +.head.type  = VIRTIO_IOMMU_T_UNMAP,
>> +.domain = cpu_to_le32(vdomain->id),
>> +.virt_start = cpu_to_le64(iova),
>> +.virt_end   = cpu_to_le64(iova + unmapped - 1),
>> +};
> 
> ...In fact, the kmem_cache idea might be moot since it looks like with a 
> bit of restructuring you could get away with just a single per-viommu 
> virtio_iommu_req_unmap structure; this lot could be passed around on the 
> stack until request_lock is taken, at which point it would be copied 
> into the 'real' DMA-able structure. The equivalent might apply to 
> viommu_map() too - now that I'm looking at it, it seems like it would 
> take pretty minimal effort to encapsulate the whole business cleanly in 
> viommu_send_req_sync(), which could do something like this instead of 
> going through viommu_send_reqs_sync():
> 
>   ...
>   spin_lock_irqsave(>request_lock, flags);
>   viommu_copy_req(viommu->dma_req, req);
>   ret = _viommu_send_reqs_sync(viommu, viommu->dma_req, 1, );
>   spin_unlock_irqrestore(>request_lock, flags);
>   ...

I'll have to come back to that sorry, still conducting some experiments
with map/unmap.

I'd rather avoid introducing a single dma_req per viommu, because I'd
like to move to the iotlb_range_add/iotlb_sync interface as soon as
possible, and the request logic changes a lot when multiple threads are
susceptible to interleave map/unmap requests.

I ran some tests, and adding a kmem_cache (or simply using kmemdup, it
doesn't make a noticeable difference at our scale) reduces netperf
stream/maerts throughput by 1.1%/1.4% (+/- 0.5% over 30 tests). That's
for a virtio-net device (1 tx/rx vq), and with a vfio device the
difference isn't measurable. At this point I'm not fussy about such
small difference, so don't mind simplifying viommu_del_mapping.

[...]
>> +/*
>> + * Last step creates a default domain and attaches to it. Everything
>> + * must be ready.
>> + */
>> +group = iommu_group_get_for_dev(dev);
>> +if (!IS_ERR(group))
>> +iommu_group_put(group);
> 
> Since you create the sysfs IOMMU device, maybe also create the links for 
> the masters?

Ok

>> +
>> +return PTR_ERR_OR_ZERO(group);
>> +}
>> +
>> +static void viommu_remove_device(struct device *dev)
>> +{
> 
> You need to remove dev from its group, too (basically, .remove_device 
> should always undo everything .add_device did)
> 
> It would also be good practice to verify that 

Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-23 Thread Robin Murphy

On 14/02/18 14:53, Jean-Philippe Brucker wrote:

The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
  MAINTAINERS   |   6 +
  drivers/iommu/Kconfig |  11 +
  drivers/iommu/Makefile|   1 +
  drivers/iommu/virtio-iommu.c  | 960 ++
  include/uapi/linux/virtio_ids.h   |   1 +
  include/uapi/linux/virtio_iommu.h | 116 +
  6 files changed, 1095 insertions(+)
  create mode 100644 drivers/iommu/virtio-iommu.c
  create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..2a181924d420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14818,6 +14818,12 @@ S: Maintained
  F:drivers/virtio/virtio_input.c
  F:include/uapi/linux/virtio_input.h
  
+VIRTIO IOMMU DRIVER

+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
  VIRTUAL BOX GUEST DEVICE DRIVER
  M:Hans de Goede 
  M:Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea0ec74524f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,4 +381,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
  
+config VIRTIO_IOMMU

+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
  endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..9c68be1365e1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..a9c9245e8ba2
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,960 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0


This wants to be a // comment at the very top of the file (thankfully 
the policy is now properly documented in-tree since 
Documentation/process/license-rules.rst got merged)



+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   unsigned long   endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int 

RE: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-23 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Thursday, March 22, 2018 6:06 PM
> 
> > From: Robin Murphy [mailto:robin.mur...@arm.com]
> > Sent: Wednesday, March 21, 2018 10:24 PM
> >
> > On 21/03/18 13:14, Jean-Philippe Brucker wrote:
> > > On 21/03/18 06:43, Tian, Kevin wrote:
> > > [...]
> > >>> +
> > >>> +#include 
> > >>> +
> > >>> +#define MSI_IOVA_BASE  0x800
> > >>> +#define MSI_IOVA_LENGTH0x10
> > >>
> > >> this is ARM specific, and according to virtio-iommu spec isn't it
> > >> better probed on the endpoint instead of hard-coding here?
> > >
> > > These values are arbitrary, not really ARM-specific even if ARM is the
> > > only user yet: we're just reserving a random IOVA region for mapping
> > MSIs.
> > > It is hard-coded because of the way iommu-dma.c works, but I don't
> > quite
> > > remember why that allocation isn't dynamic.
> >
> > The host kernel needs to have *some* MSI region in place before the
> > guest can start configuring interrupts, otherwise it won't know what
> > address to give to the underlying hardware. However, as soon as the host
> > kernel has picked a region, host userspace needs to know that it can no
> > longer use addresses in that region for DMA-able guest memory. It's a
> > lot easier when the address is fixed in hardware and the host userspace
> > will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in
> > the
> > more general case where MSI writes undergo IOMMU address translation
> > so
> > it's an arbitrary IOVA, this has the potential to conflict with stuff
> > like guest memory hotplug.
> >
> > What we currently have is just the simplest option, with the host kernel
> > just picking something up-front and pretending to host userspace that
> > it's a fixed hardware address. There's certainly scope for it to be a
> > bit more dynamic in the sense of adding an interface to let userspace
> > move it around (before attaching any devices, at least), but I don't
> > think it's feasible for the host kernel to second-guess userspace enough
> > to make it entirely transparent like it is in the DMA API domain case.
> >
> > Of course, that's all assuming the host itself is using a virtio-iommu
> > (e.g. in a nested virt or emulation scenario). When it's purely within a
> > guest then an MSI reservation shouldn't matter so much, since the guest
> > won't be anywhere near the real hardware configuration anyway.
> >
> > Robin.
> 
> Curious since anyway we are defining a new iommu architecture
> is it possible to avoid those ARM-specific burden completely?
> 

OK, after some study around those tricks below is my learning:

- MSI_IOVA window is used only on request (iommu_dma_get
_msi_page), not meant to take effect on all architectures once 
initialized. e.g. ARM GIC does it but not x86. So it is reasonable 
for virtio-iommu driver to implement such capability;

- I thought whether hardware MSI doorbell can be always reported
on virtio-iommu since it's newly defined. Looks there is a problem
if underlying IOMMU is sw-managed MSI style - valid mapping is
expected in all level of translations, meaning guest has to manage
stage-1 mapping in nested configuration since stage-1 is owned
by guest. 

Then virtio-iommu is naturally expected to report the same MSI 
model as supported by underlying hardware. Below are some
further thoughts along this route (use 'IOMMU' to represent the
physical one and 'virtio-iommu' for virtual one):



In the scope of current virtio-iommu spec v.6, there is no nested
consideration yet. Guest driver is expected to use MAP/UNMAP
interface on assigned endpoints. In this case the MAP requests
(IOVA->GPA) is caught and maintained within Qemu which then 
further talks to VFIO to map IOVA->HPA in IOMMU.

Qemu can learn the MSI model of IOMMU from sysfs.

For hardware MSI doorbell (x86 and some ARM):
* Host kernel reports to Qemu as IOMMU_RESV_MSI
* Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_MSI
* Guest takes the range as IOMMU_RESV_MSI. reserved
* Qemu MAP database has no mapping for the doorbell
* Physical IOMMU page table has no mapping for the doorbell
* MSI from passthrough device bypass IOMMU
* MSI from emulated device bypass virtio-iommu

For software MSI doorbell (most ARM):
* Host kernel reports to Qemu as IOMMU_RESV_SW_MSI
* Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_RESERVED
* Guest takes the range as IOMMU_RESV_RESERVED
* vGIC requests to map 'GPA of the virtual doorbell'
* a map request (IOVA->GPA) sent on endpoint
* Qemu maintains the mapping in MAP database
* but no VFIO_MAP request since it's purely virtual
* GIC requests to map 'HPA of the physical doorbell'
* e.g. triggered by VFIO enable msi
* IOMMU now includes a valid mapping (IOVA->HPA)
* MSI from emulated device go through Qemu MAP
database (IOVA->'GPA of virtual doorbell') and then hit vGIC
* MSI from passthrough device go through IOMMU
(IOVA->'HPA of physical doorbell') and then hit GIC

In this case, host 

RE: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-22 Thread Tian, Kevin
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Wednesday, March 21, 2018 10:24 PM
> 
> On 21/03/18 13:14, Jean-Philippe Brucker wrote:
> > On 21/03/18 06:43, Tian, Kevin wrote:
> > [...]
> >>> +
> >>> +#include 
> >>> +
> >>> +#define MSI_IOVA_BASE0x800
> >>> +#define MSI_IOVA_LENGTH  0x10
> >>
> >> this is ARM specific, and according to virtio-iommu spec isn't it
> >> better probed on the endpoint instead of hard-coding here?
> >
> > These values are arbitrary, not really ARM-specific even if ARM is the
> > only user yet: we're just reserving a random IOVA region for mapping
> MSIs.
> > It is hard-coded because of the way iommu-dma.c works, but I don't
> quite
> > remember why that allocation isn't dynamic.
> 
> The host kernel needs to have *some* MSI region in place before the
> guest can start configuring interrupts, otherwise it won't know what
> address to give to the underlying hardware. However, as soon as the host
> kernel has picked a region, host userspace needs to know that it can no
> longer use addresses in that region for DMA-able guest memory. It's a
> lot easier when the address is fixed in hardware and the host userspace
> will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in
> the
> more general case where MSI writes undergo IOMMU address translation
> so
> it's an arbitrary IOVA, this has the potential to conflict with stuff
> like guest memory hotplug.
> 
> What we currently have is just the simplest option, with the host kernel
> just picking something up-front and pretending to host userspace that
> it's a fixed hardware address. There's certainly scope for it to be a
> bit more dynamic in the sense of adding an interface to let userspace
> move it around (before attaching any devices, at least), but I don't
> think it's feasible for the host kernel to second-guess userspace enough
> to make it entirely transparent like it is in the DMA API domain case.
> 
> Of course, that's all assuming the host itself is using a virtio-iommu
> (e.g. in a nested virt or emulation scenario). When it's purely within a
> guest then an MSI reservation shouldn't matter so much, since the guest
> won't be anywhere near the real hardware configuration anyway.
> 
> Robin.

Curious since anyway we are defining a new iommu architecture
is it possible to avoid those ARM-specific burden completely? 

Thanks
Kevin

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-21 Thread Robin Murphy

On 21/03/18 13:14, Jean-Philippe Brucker wrote:

On 21/03/18 06:43, Tian, Kevin wrote:
[...]

+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10


this is ARM specific, and according to virtio-iommu spec isn't it
better probed on the endpoint instead of hard-coding here?


These values are arbitrary, not really ARM-specific even if ARM is the
only user yet: we're just reserving a random IOVA region for mapping MSIs.
It is hard-coded because of the way iommu-dma.c works, but I don't quite
remember why that allocation isn't dynamic.


The host kernel needs to have *some* MSI region in place before the 
guest can start configuring interrupts, otherwise it won't know what 
address to give to the underlying hardware. However, as soon as the host 
kernel has picked a region, host userspace needs to know that it can no 
longer use addresses in that region for DMA-able guest memory. It's a 
lot easier when the address is fixed in hardware and the host userspace 
will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in the 
more general case where MSI writes undergo IOMMU address translation so 
it's an arbitrary IOVA, this has the potential to conflict with stuff 
like guest memory hotplug.


What we currently have is just the simplest option, with the host kernel 
just picking something up-front and pretending to host userspace that 
it's a fixed hardware address. There's certainly scope for it to be a 
bit more dynamic in the sense of adding an interface to let userspace 
move it around (before attaching any devices, at least), but I don't 
think it's feasible for the host kernel to second-guess userspace enough 
to make it entirely transparent like it is in the DMA API domain case.


Of course, that's all assuming the host itself is using a virtio-iommu 
(e.g. in a nested virt or emulation scenario). When it's purely within a 
guest then an MSI reservation shouldn't matter so much, since the guest 
won't be anywhere near the real hardware configuration anyway.


Robin.


As said on the v0.6 spec thread, I'm not sure allocating the IOVA range in
the host is preferable. With nested translation the guest has to map it
anyway, and I believe dealing with IOVA allocation should be left to the
guest when possible.

Thanks,
Jean
___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-21 Thread Jean-Philippe Brucker
On 21/03/18 06:43, Tian, Kevin wrote:
[...]
>> +
>> +#include 
>> +
>> +#define MSI_IOVA_BASE   0x800
>> +#define MSI_IOVA_LENGTH 0x10
> 
> this is ARM specific, and according to virtio-iommu spec isn't it
> better probed on the endpoint instead of hard-coding here?

These values are arbitrary, not really ARM-specific even if ARM is the
only user yet: we're just reserving a random IOVA region for mapping MSIs.
It is hard-coded because of the way iommu-dma.c works, but I don't quite
remember why that allocation isn't dynamic.

As said on the v0.6 spec thread, I'm not sure allocating the IOVA range in
the host is preferable. With nested translation the guest has to map it
anyway, and I believe dealing with IOVA allocation should be left to the
guest when possible.

Thanks,
Jean
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-21 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, February 14, 2018 10:54 PM
> 
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio-mmio transport without
> emulating
> page tables. This implementation handles ATTACH, DETACH, MAP and
> UNMAP
> requests.
> 
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.
> 
> Signed-off-by: Jean-Philippe Brucker 

[...]
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..a9c9245e8ba2
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,960 @@
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 ARM Limited
> + * Author: Jean-Philippe Brucker 
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10

this is ARM specific, and according to virtio-iommu spec isn't it
better probed on the endpoint instead of hard-coding here?

Thanks
Kevin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-27 Thread Michael S. Tsirkin
On Thu, Feb 22, 2018 at 11:04:30AM +, Jean-Philippe Brucker wrote:
> On 21/02/18 20:12, kbuild test robot wrote:
> [...]
> > config: arm64-allmodconfig (attached as .config)
> [...]
> >aarch64-linux-gnu-ld: arch/arm64/kernel/head.o: relocation 
> > R_AARCH64_ABS32 against `_kernel_offset_le_lo32' can not be used when 
> > making a shared object
> >arch/arm64/kernel/head.o: In function `kimage_vaddr':
> >(.idmap.text+0x0): dangerous relocation: unsupported relocation
> 
> Is this related?
> 
> >arch/arm64/kernel/head.o: In function `__primary_switch':
> >(.idmap.text+0x340): dangerous relocation: unsupported relocation
> >(.idmap.text+0x348): dangerous relocation: unsupported relocation
> >drivers/iommu/virtio-iommu.o: In function `viommu_probe':
> >virtio-iommu.c:(.text+0xbdc): undefined reference to 
> > `virtio_check_driver_offered_feature'
> >virtio-iommu.c:(.text+0xcfc): undefined reference to 
> > `virtio_check_driver_offered_feature'
> >virtio-iommu.c:(.text+0xe10): undefined reference to 
> > `virtio_check_driver_offered_feature'
> >drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
> >virtio-iommu.c:(.text+0x130c): undefined reference to `virtqueue_add_sgs'
> >virtio-iommu.c:(.text+0x1398): undefined reference to `virtqueue_kick'
> >virtio-iommu.c:(.text+0x14d4): undefined reference to `virtqueue_get_buf'
> >drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
> >virtio-iommu.c:(.init.text+0x1c): undefined reference to 
> > `register_virtio_driver'
> >drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
> >>> virtio-iommu.c:(.exit.text+0x14): undefined reference to 
> >>> `unregister_virtio_driver'
> 
> Right. At the moment CONFIG_VIRTIO_IOMMU is a bool instead of tristate,
> because the IOMMU subsystem isn't entirely ready to have IOMMU drivers
> built as modules. In addition to exporting symbols it would also needs to
> hold off probing endpoints behind the IOMMU until the IOMMU driver is
> loaded. At the moment (I think) it gives up once userspace is reached (see
> of_iommu_driver_present).
> 
> The above report is due to VIRTIO=m and VIRTIO_IOMMU=y. To solve it we could:
> 
> a) Allow VIRTIO_IOMMU to be built as module by exporting a dozen IOMMU
> symbols. It would be a lie. The driver wouldn't be usable because of the
> probe issue discussed above, but it would build.
> 
> b) Actually make any IOMMU driver work as module. Whilst it would
> certainly be a nice feature, it's a bigger problem and I don't think it
> has its place in this series.
> 
> c) Make VIRTIO_IOMMU depend on VIRTIO_MMIO=y instead of VIRTIO_MMIO, which
> I think is the sanest for now (and does work), even though distro kernels
> probably all have VIRTIO=m.
> 
> I prefer doing c) now and experiment with b) once I got some time.
> 
> Thanks,
> Jean

It kind of means it's a toy for now though. So fine as long
as it's out of tree.

-- 
MST
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-22 Thread Jean-Philippe Brucker
On 21/02/18 20:12, kbuild test robot wrote:
[...]
> config: arm64-allmodconfig (attached as .config)
[...]
>aarch64-linux-gnu-ld: arch/arm64/kernel/head.o: relocation R_AARCH64_ABS32 
> against `_kernel_offset_le_lo32' can not be used when making a shared object
>arch/arm64/kernel/head.o: In function `kimage_vaddr':
>(.idmap.text+0x0): dangerous relocation: unsupported relocation

Is this related?

>arch/arm64/kernel/head.o: In function `__primary_switch':
>(.idmap.text+0x340): dangerous relocation: unsupported relocation
>(.idmap.text+0x348): dangerous relocation: unsupported relocation
>drivers/iommu/virtio-iommu.o: In function `viommu_probe':
>virtio-iommu.c:(.text+0xbdc): undefined reference to 
> `virtio_check_driver_offered_feature'
>virtio-iommu.c:(.text+0xcfc): undefined reference to 
> `virtio_check_driver_offered_feature'
>virtio-iommu.c:(.text+0xe10): undefined reference to 
> `virtio_check_driver_offered_feature'
>drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
>virtio-iommu.c:(.text+0x130c): undefined reference to `virtqueue_add_sgs'
>virtio-iommu.c:(.text+0x1398): undefined reference to `virtqueue_kick'
>virtio-iommu.c:(.text+0x14d4): undefined reference to `virtqueue_get_buf'
>drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
>virtio-iommu.c:(.init.text+0x1c): undefined reference to 
> `register_virtio_driver'
>drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
>>> virtio-iommu.c:(.exit.text+0x14): undefined reference to 
>>> `unregister_virtio_driver'

Right. At the moment CONFIG_VIRTIO_IOMMU is a bool instead of tristate,
because the IOMMU subsystem isn't entirely ready to have IOMMU drivers
built as modules. In addition to exporting symbols it would also needs to
hold off probing endpoints behind the IOMMU until the IOMMU driver is
loaded. At the moment (I think) it gives up once userspace is reached (see
of_iommu_driver_present).

The above report is due to VIRTIO=m and VIRTIO_IOMMU=y. To solve it we could:

a) Allow VIRTIO_IOMMU to be built as module by exporting a dozen IOMMU
symbols. It would be a lie. The driver wouldn't be usable because of the
probe issue discussed above, but it would build.

b) Actually make any IOMMU driver work as module. Whilst it would
certainly be a nice feature, it's a bigger problem and I don't think it
has its place in this series.

c) Make VIRTIO_IOMMU depend on VIRTIO_MMIO=y instead of VIRTIO_MMIO, which
I think is the sanest for now (and does work), even though distro kernels
probably all have VIRTIO=m.

I prefer doing c) now and experiment with b) once I got some time.

Thanks,
Jean
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
>> virtio-iommu.c:(.text+0xa82): undefined reference to `virtqueue_add_sgs'
>> virtio-iommu.c:(.text+0xb52): undefined reference to `virtqueue_kick'
>> virtio-iommu.c:(.text+0xd82): undefined reference to `virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
>> virtio-iommu.c:(.text+0x23f2): undefined reference to 
>> `virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0x2572): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0x26f2): undefined reference to 
`virtio_check_driver_offered_feature'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
>> virtio-iommu.c:(.init.text+0x22): undefined reference to 
>> `register_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   aarch64-linux-gnu-ld: arch/arm64/kernel/head.o: relocation R_AARCH64_ABS32 
against `_kernel_offset_le_lo32' can not be used when making a shared object
   arch/arm64/kernel/head.o: In function `kimage_vaddr':
   (.idmap.text+0x0): dangerous relocation: unsupported relocation
   arch/arm64/kernel/head.o: In function `__primary_switch':
   (.idmap.text+0x340): dangerous relocation: unsupported relocation
   (.idmap.text+0x348): dangerous relocation: unsupported relocation
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
   virtio-iommu.c:(.text+0xbdc): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0xcfc): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0xe10): undefined reference to 
`virtio_check_driver_offered_feature'
   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
   virtio-iommu.c:(.text+0x130c): undefined reference to `virtqueue_add_sgs'
   virtio-iommu.c:(.text+0x1398): undefined reference to `virtqueue_kick'
   virtio-iommu.c:(.text+0x14d4): undefined reference to `virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
   virtio-iommu.c:(.init.text+0x1c): undefined reference to 
`register_virtio_driver'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
>> virtio-iommu.c:(.exit.text+0x14): undefined reference to 
>> `unregister_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-20 Thread Jean-Philippe Brucker
On 19/02/18 12:23, Tomasz Nowicki wrote:
[...]
>> +static int viommu_receive_resp(struct viommu_dev *viommu, int nr_sent,
>> +   struct list_head *sent)
>> +{
>> +
>> +unsigned int len;
>> +int nr_received = 0;
>> +struct viommu_request *req, *pending;
>> +
>> +pending = list_first_entry_or_null(sent, struct viommu_request, list);
>> +if (WARN_ON(!pending))
>> +return 0;
>> +
>> +while ((req = virtqueue_get_buf(viommu->vq, )) != NULL) {
>> +if (req != pending) {
>> +dev_warn(viommu->dev, "discarding stale request\n");
>> +continue;
>> +}
>> +
>> +pending->written = len;
>> +
>> +if (++nr_received == nr_sent) {
>> +WARN_ON(!list_is_last(>list, sent));
>> +break;
>> +} else if (WARN_ON(list_is_last(>list, sent))) {
>> +break;
>> +}
>> +
>> +pending = list_next_entry(pending, list);
> 
> We should remove current element from the pending list. There is no 
> guarantee we get response for each while loop so when we get back for 
> more the _viommu_send_reqs_sync() caller will pass pointer to the out of 
> date head next time.

Right, I'll fix this

Thanks,
Jean
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-19 Thread Tomasz Nowicki

Hi Jean,

On 14.02.2018 15:53, Jean-Philippe Brucker wrote:

The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
  MAINTAINERS   |   6 +
  drivers/iommu/Kconfig |  11 +
  drivers/iommu/Makefile|   1 +
  drivers/iommu/virtio-iommu.c  | 960 ++
  include/uapi/linux/virtio_ids.h   |   1 +
  include/uapi/linux/virtio_iommu.h | 116 +
  6 files changed, 1095 insertions(+)
  create mode 100644 drivers/iommu/virtio-iommu.c
  create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..2a181924d420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14818,6 +14818,12 @@ S: Maintained
  F:drivers/virtio/virtio_input.c
  F:include/uapi/linux/virtio_input.h
  
+VIRTIO IOMMU DRIVER

+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
  VIRTUAL BOX GUEST DEVICE DRIVER
  M:Hans de Goede 
  M:Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea0ec74524f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,4 +381,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
  
+config VIRTIO_IOMMU

+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
  endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..9c68be1365e1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..a9c9245e8ba2
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,960 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   unsigned long   endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int written;
+   struct list_headlist;
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+/* 

[PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-14 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   6 +
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 960 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 116 +
 6 files changed, 1095 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..2a181924d420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14818,6 +14818,12 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO IOMMU DRIVER
+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
 VIRTUAL BOX GUEST DEVICE DRIVER
 M: Hans de Goede 
 M: Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea0ec74524f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,4 +381,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..9c68be1365e1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..a9c9245e8ba2
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,960 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   unsigned long   endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int written;
+   struct list_headlist;
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+/* Virtio transport */
+
+static int viommu_status_to_errno(u8 status)
+{
+   switch