from:"Jean\-Philippe Brucker"

Re: [PATCH v3 0/2] iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH

2023-11-20 Thread Jean-Philippe Brucker

Hi Niklas,

On Mon, Nov 20, 2023 at 03:51:55PM +0100, Niklas Schnelle wrote:
> Hi All,
> 
> Previously I used virtio-iommu as a non-s390x test vehicle[0] for the
> single queue flushing scheme introduced by my s390x DMA API conversion
> series[1]. For this I modified virtio-iommu to a) use .iotlb_sync_map
> and b) enable IOMMU_CAP_DEFERRED_FLUSH. It turned out that deferred
> flush and even just the introduction of ops->iotlb_sync_map yield
> performance uplift[2] even with per-CPU queues. So here is a small
> series of these two changes.
> 
> The code is also available on the b4/viommu-deferred-flush branch of my
> kernel.org git repository[3].
> 
> Note on testing: I tested this series on my AMD Ryzen 3900X workstation
> using QEMU 8.1.2 a pass-through NVMe and Intel 82599 NIC VFs. For the
> NVMe I saw an increase of about 10% in IOPS and 30% in read bandwidth
> compared with v6.7-rc2. One odd thing though is that QEMU seemed to make
> the entire guest resident/pinned once I passed-through a PCI device.
> I seem to remember this wasn't the case with my last version but not
> sure which QEMU version I used back then.

That's probably expected, now that boot-bypass is enabled by default: on
VM boot, endpoints are able to do DMA to the entire guest-physical address
space, until a virtio-iommu driver disables global bypass in the config
space (at which point the pinned memory is hopefully reclaimed by the
host). QEMU enables it by default to mimic other IOMMU implementations,
and to allow running firmware or OS that don't support virtio-iommu. It
can be disabled with boot-bypass=off

> @Jean-Philippe: I didn't include your R-b's as I changed back to the
> nr_endpoints check and this is like 30% of the patches.

Thank you for the patches. For the series:

Reviewed-by: Jean-Philippe Brucker

Re: [PATCH][next] iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()

2023-10-10 Thread Jean-Philippe Brucker

On Mon, Oct 09, 2023 at 12:24:27PM -0600, Gustavo A. R. Silva wrote:
> Prepare for the coming implementation by GCC and Clang of the __counted_by
> attribute. Flexible array members annotated with __counted_by can have
> their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
> array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
> functions).
> 
> While there, use struct_size() helper, instead of the open-coded
> version, to calculate the size for the allocation of the whole
> flexible structure, including of course, the flexible-array member.
> 
> This code was found with the help of Coccinelle, and audited and
> fixed manually.
> 
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  drivers/iommu/virtio-iommu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 17dcd826f5c2..379ebe03efb6 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -85,7 +85,7 @@ struct viommu_request {
>   void*writeback;
>   unsigned intwrite_offset;
>   unsigned intlen;
> - charbuf[];
> + charbuf[] __counted_by(len);
>  };
>  
>  #define VIOMMU_FAULT_RESV_MASK   0xff00
> @@ -230,7 +230,7 @@ static int __viommu_add_req(struct viommu_dev *viommu, 
> void *buf, size_t len,
>   if (write_offset <= 0)
>   return -EINVAL;
>  
> - req = kzalloc(sizeof(*req) + len, GFP_ATOMIC);
> + req = kzalloc(struct_size(req, buf, len), GFP_ATOMIC);
>   if (!req)
>   return -ENOMEM;
>  
> -- 
> 2.34.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-22 Thread Jean-Philippe Brucker

On Tue, Sep 19, 2023 at 11:46:49AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote:
> > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > > index 17dcd826f5c2..3649586f0e5c 100644
> > > > --- a/drivers/iommu/virtio-iommu.c
> > > > +++ b/drivers/iommu/virtio-iommu.c
> > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev 
> > > > *viommu)
> > > > int ret;
> > > > unsigned long flags;
> > > > +   /*
> > > > +* .iotlb_sync_map and .flush_iotlb_all may be called before 
> > > > the viommu
> > > > +* is initialized e.g. via iommu_create_device_direct_mappings()
> > > > +*/
> > > > +   if (!viommu)
> > > > +   return 0;
> > > 
> > > Minor nit: I'd be inclined to make that check explicitly in the places 
> > > where
> > > it definitely is expected, rather than allowing *any* sync to silently do
> > > nothing if called incorrectly. Plus then they could use
> > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
> > > (it did take me a moment to figure out how we could get to .iotlb_sync_map
> > > with a NULL viommu without viommu_map_pages() blowing up first...)
> 
> This makes more sense to me
> 
> Ultimately this driver should reach a point where every iommu_domain
> always has a non-null domain->viommu because it will be set during
> alloc.
> 
> But it can still have nr_endpoints == 0, doesn't it make sense to
> avoid sync in this case?
> 
> (btw this driver is missing locking around vdomain->nr_endpoints)

Yes, that's on my list

> 
> > They're not strictly equivalent: this check works around a temporary issue
> > with the IOMMU core, which calls map/unmap before the domain is
> > finalized.
> 
> Where? The above points to iommu_create_device_direct_mappings() but
> it doesn't because the pgsize_bitmap == 0:

__iommu_domain_alloc() sets pgsize_bitmap in this case:

/*
 * If not already set, assume all sizes by default; the driver
 * may override this later
 */
if (!domain->pgsize_bitmap)
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-22 Thread Jean-Philippe Brucker

On Tue, Sep 19, 2023 at 09:28:08AM +0100, Robin Murphy wrote:
> On 2023-09-19 09:15, Jean-Philippe Brucker wrote:
> > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > > index 17dcd826f5c2..3649586f0e5c 100644
> > > > --- a/drivers/iommu/virtio-iommu.c
> > > > +++ b/drivers/iommu/virtio-iommu.c
> > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev 
> > > > *viommu)
> > > > int ret;
> > > > unsigned long flags;
> > > > +   /*
> > > > +* .iotlb_sync_map and .flush_iotlb_all may be called before 
> > > > the viommu
> > > > +* is initialized e.g. via iommu_create_device_direct_mappings()
> > > > +*/
> > > > +   if (!viommu)
> > > > +   return 0;
> > > 
> > > Minor nit: I'd be inclined to make that check explicitly in the places 
> > > where
> > > it definitely is expected, rather than allowing *any* sync to silently do
> > > nothing if called incorrectly. Plus then they could use
> > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
> > > (it did take me a moment to figure out how we could get to .iotlb_sync_map
> > > with a NULL viommu without viommu_map_pages() blowing up first...)
> > 
> > They're not strictly equivalent: this check works around a temporary issue
> > with the IOMMU core, which calls map/unmap before the domain is finalized.
> > Once we merge domain_alloc() and finalize(), then this check disappears,
> > but we still need to test nr_endpoints in map/unmap to handle detached
> > domains (and we still need to fix the synchronization of nr_endpoints
> > against attach/detach). That's why I preferred doing this on viommu and
> > keeping it in one place.
> 
> Fair enough - it just seems to me that in both cases it's a detached domain,
> so its previous history of whether it's ever been otherwise or not shouldn't
> matter. Even once viommu is initialised, does it really make sense to send
> sync commands for a mapping on a detached domain where we haven't actually
> sent any map/unmap commands?

If no requests were added by map/unmap, then viommu_sync_req() is
essentially a nop because virtio doesn't use sync commands (and
virtqueue_kick() only kicks the host when the queue's not empty, if I
remember correctly). It still does a bit of work so is less efficient than
a preliminary check on nr_endpoints, but it feels nicer to streamline the
case where the domain is attached, since map/unmap on detached domains
happens rarely, if ever.

Either is fine by me. An extra test won't make much difference performance
wise, and I guess will look less confusing. Niklas, do you mind resending
the version with nr_endpoints check (and updated commit messages)?

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-19 Thread Jean-Philippe Brucker

On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:
> > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > index 17dcd826f5c2..3649586f0e5c 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
> > int ret;
> > unsigned long flags;
> > +   /*
> > +* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
> > +* is initialized e.g. via iommu_create_device_direct_mappings()
> > +*/
> > +   if (!viommu)
> > +   return 0;
> 
> Minor nit: I'd be inclined to make that check explicitly in the places where
> it definitely is expected, rather than allowing *any* sync to silently do
> nothing if called incorrectly. Plus then they could use
> vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
> (it did take me a moment to figure out how we could get to .iotlb_sync_map
> with a NULL viommu without viommu_map_pages() blowing up first...)

They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is finalized.
Once we merge domain_alloc() and finalize(), then this check disappears,
but we still need to test nr_endpoints in map/unmap to handle detached
domains (and we still need to fix the synchronization of nr_endpoints
against attach/detach). That's why I preferred doing this on viommu and
keeping it in one place.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush

2023-09-18 Thread Jean-Philippe Brucker

On Mon, Sep 18, 2023 at 01:51:44PM +0200, Niklas Schnelle wrote:
> Add ops->flush_iotlb_all operation to enable virtio-iommu for the
> dma-iommu deferred flush scheme. This results in a significant increase
> in performance in exchange for a window in which devices can still
> access previously IOMMU mapped memory when running with
> CONFIG_IOMMU_DEFAULT_DMA_LAZY. The previous strict behavior can be
> achieved with iommu.strict=1 on the kernel command line or
> CONFIG_IOMMU_DEFAULT_DMA_STRICT.
> 
> Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
> Signed-off-by: Niklas Schnelle 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  drivers/iommu/virtio-iommu.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 3649586f0e5c..4dd330fbcbdd 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -926,6 +926,13 @@ static int viommu_iotlb_sync_map(struct iommu_domain 
> *domain,
>   return viommu_sync_req(vdomain->viommu);
>  }
>  
> +static void viommu_flush_iotlb_all(struct iommu_domain *domain)
> +{
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + viommu_sync_req(vdomain->viommu);
> +}
> +
>  static void viommu_get_resv_regions(struct device *dev, struct list_head 
> *head)
>  {
>   struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> @@ -1051,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum 
> iommu_cap cap)
>   switch (cap) {
>   case IOMMU_CAP_CACHE_COHERENCY:
>   return true;
> + case IOMMU_CAP_DEFERRED_FLUSH:
> + return true;
>   default:
>   return false;
>   }
> @@ -1071,6 +1080,7 @@ static struct iommu_ops viommu_ops = {
>   .map_pages  = viommu_map_pages,
>   .unmap_pages= viommu_unmap_pages,
>   .iova_to_phys   = viommu_iova_to_phys,
> + .flush_iotlb_all= viommu_flush_iotlb_all,
>   .iotlb_sync = viommu_iotlb_sync,
>   .iotlb_sync_map = viommu_iotlb_sync_map,
>   .free   = viommu_domain_free,
> 
> -- 
> 2.39.2
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-18 Thread Jean-Philippe Brucker

On Mon, Sep 18, 2023 at 01:51:43PM +0200, Niklas Schnelle wrote:
> Pull out the sync operation from viommu_map_pages() by implementing
> ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
> elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
> it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.
> 
> Link: 
> https://lore.kernel.org/lkml/20230726111433.1105665-1-schne...@linux.ibm.com/
> Signed-off-by: Niklas Schnelle 

Reviewed-by: Jean-Philippe Brucker 

This must be merged after "iommu/dma: s390 DMA API conversion and
optimized IOTLB flushing" because of the updated iotlb_sync_map()
prototype.

Thanks,
Jean

> ---
>  drivers/iommu/virtio-iommu.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 17dcd826f5c2..3649586f0e5c 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
>   int ret;
>   unsigned long flags;
>  
> + /*
> +  * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
> +  * is initialized e.g. via iommu_create_device_direct_mappings()
> +  */
> + if (!viommu)
> + return 0;
>   spin_lock_irqsave(>request_lock, flags);
>   ret = __viommu_sync_req(viommu);
>   if (ret)
> @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, 
> unsigned long iova,
>   .flags  = cpu_to_le32(flags),
>   };
>  
> - ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
> + ret = viommu_add_req(vdomain->viommu, , sizeof(map));
>   if (ret) {
>   viommu_del_mappings(vdomain, iova, end);
>   return ret;
> @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain 
> *domain,
>   viommu_sync_req(vdomain->viommu);
>  }
>  
> +static int viommu_iotlb_sync_map(struct iommu_domain *domain,
> +  unsigned long iova, size_t size)
> +{
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + return viommu_sync_req(vdomain->viommu);
> +}
> +
>  static void viommu_get_resv_regions(struct device *dev, struct list_head 
> *head)
>  {
>   struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = {
>   .unmap_pages= viommu_unmap_pages,
>   .iova_to_phys   = viommu_iova_to_phys,
>   .iotlb_sync = viommu_iotlb_sync,
> + .iotlb_sync_map = viommu_iotlb_sync_map,
>   .free   = viommu_domain_free,
>   }
>  };
> 
> -- 
> 2.39.2
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush

2023-09-06 Thread Jean-Philippe Brucker

On Wed, Sep 06, 2023 at 09:55:49AM +0200, Niklas Schnelle wrote:
> On Mon, 2023-09-04 at 17:33 +0100, Robin Murphy wrote:
> > On 2023-09-04 16:34, Jean-Philippe Brucker wrote:
> > > On Fri, Aug 25, 2023 at 05:21:26PM +0200, Niklas Schnelle wrote:
> > > > Add ops->flush_iotlb_all operation to enable virtio-iommu for the
> > > > dma-iommu deferred flush scheme. This results inn a significant increase
> > > 
> > > in
> > > 
> > > > in performance in exchange for a window in which devices can still
> > > > access previously IOMMU mapped memory. To get back to the prior behavior
> > > > iommu.strict=1 may be set on the kernel command line.
> > > 
> > > Maybe add that it depends on CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT} as
> > > well, because I've seen kernel configs that enable either.
> > 
> > Indeed, I'd be inclined phrase it in terms of the driver now actually 
> > being able to honour lazy mode when requested (which happens to be the 
> > default on x86), rather than as if it might be some 
> > potentially-unexpected change in behaviour.
> > 
> > Thanks,
> > Robin.
> 
> I kept running this series on a KVM guest on my private workstation
> (QEMU v8.0.4) and while running iperf3 on a passed-through Intel 82599
> VF. I got a bunch of IOMMU events similar to the following as well as
> card resets in the host.
> 
> ..
> [ 5959.338214] vfio-pci :04:10.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x0037 address=0x7b657064 flags=0x]
> [ 5963.353429] ixgbe :03:00.0 enp3s0: Detected Tx Unit Hang
>  Tx Queue <0>
>  TDH, TDT <93>, <9d>
>  next_to_use  <9d>
>  next_to_clean<93>
>tx_buffer_info[next_to_clean]
>  time_stamp   <10019e800>
>  jiffies  <10019ec80>
> ...
> 
> I retested on v6.5 vanilla (guest & host) and still get the above
> errors so luckily for me it doesn't seem to be caused by the new code
> but I can't reproduce it without virtio-iommu. Any idea what could
> cause this?

Adding Eric in case this looks familiar.

I don't have hardware to test this but I guess QEMU system emulation may
be able to reproduce the issue since it has an AMD IOMMU (unmaintained)
and igb, I can give that a try.

Thanks,
Jean

> 
> > 
> > > > Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
> > > > Signed-off-by: Niklas Schnelle 
> > > > ---
> > > >   drivers/iommu/virtio-iommu.c | 12 
> > > >   1 file changed, 12 insertions(+)
> > > > 
> > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > > index fb73dec5b953..1b7526494490 100644
> > > > --- a/drivers/iommu/virtio-iommu.c
> > > > +++ b/drivers/iommu/virtio-iommu.c
> > > > @@ -924,6 +924,15 @@ static int viommu_iotlb_sync_map(struct 
> > > > iommu_domain *domain,
> > > > return viommu_sync_req(vdomain->viommu);
> > > >   }
> > > >   
> > > > +static void viommu_flush_iotlb_all(struct iommu_domain *domain)
> > > > +{
> > > > +   struct viommu_domain *vdomain = to_viommu_domain(domain);
> > > > +
> > > > +   if (!vdomain->nr_endpoints)
> > > > +   return;
> > > 
> > > As for patch 1, a NULL check in viommu_sync_req() would allow dropping
> > > this one
> > > 
> > > Thanks,
> > > Jean
> 
> Right, makes sense will move the check into viommu_sync_req() and add a
> coment that it is there fore the cases where viommu_iotlb_sync() et al
> get called before the IOMMU is set up.
> 
> > > 
> > > > +   viommu_sync_req(vdomain->viommu);
> > > > +}
> > > > +
> > > >   static void viommu_get_resv_regions(struct device *dev, struct 
> > > > list_head *head)
> > > >   {
> > > > struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> > > > @@ -1049,6 +1058,8 @@ static bool viommu_capable(struct device *dev, 
> > > > enum iommu_cap cap)
> > > > switch (cap) {
> > > > case IOMMU_CAP_CACHE_COHERENCY:
> > > > return true;
> > > > +   case IOMMU_CAP_DEFERRED_FLUSH:
> > > > +   return true;
> > > > default:
> > > > return false;
> > > > }
> > > > @@ -1069,6 +1080,7 @@ static struct iommu_ops viommu_ops = {
> > > > .map_pages  = viommu_map_pages,
> > > > .unmap_pages= viommu_unmap_pages,
> > > > .iova_to_phys   = viommu_iova_to_phys,
> > > > +   .flush_iotlb_all= viommu_flush_iotlb_all,
> > > > .iotlb_sync = viommu_iotlb_sync,
> > > > .iotlb_sync_map = viommu_iotlb_sync_map,
> > > > .free   = viommu_domain_free,
> > > > 
> > > > -- 
> > > > 2.39.2
> > > > 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush

2023-09-04 Thread Jean-Philippe Brucker

On Fri, Aug 25, 2023 at 05:21:26PM +0200, Niklas Schnelle wrote:
> Add ops->flush_iotlb_all operation to enable virtio-iommu for the
> dma-iommu deferred flush scheme. This results inn a significant increase

in

> in performance in exchange for a window in which devices can still
> access previously IOMMU mapped memory. To get back to the prior behavior
> iommu.strict=1 may be set on the kernel command line.

Maybe add that it depends on CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT} as
well, because I've seen kernel configs that enable either.

> 
> Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
> Signed-off-by: Niklas Schnelle 
> ---
>  drivers/iommu/virtio-iommu.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index fb73dec5b953..1b7526494490 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -924,6 +924,15 @@ static int viommu_iotlb_sync_map(struct iommu_domain 
> *domain,
>   return viommu_sync_req(vdomain->viommu);
>  }
>  
> +static void viommu_flush_iotlb_all(struct iommu_domain *domain)
> +{
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + if (!vdomain->nr_endpoints)
> + return;

As for patch 1, a NULL check in viommu_sync_req() would allow dropping
this one

Thanks,
Jean

> + viommu_sync_req(vdomain->viommu);
> +}
> +
>  static void viommu_get_resv_regions(struct device *dev, struct list_head 
> *head)
>  {
>   struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> @@ -1049,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum 
> iommu_cap cap)
>   switch (cap) {
>   case IOMMU_CAP_CACHE_COHERENCY:
>   return true;
> + case IOMMU_CAP_DEFERRED_FLUSH:
> + return true;
>   default:
>   return false;
>   }
> @@ -1069,6 +1080,7 @@ static struct iommu_ops viommu_ops = {
>   .map_pages  = viommu_map_pages,
>   .unmap_pages= viommu_unmap_pages,
>   .iova_to_phys   = viommu_iova_to_phys,
> + .flush_iotlb_all= viommu_flush_iotlb_all,
>   .iotlb_sync = viommu_iotlb_sync,
>   .iotlb_sync_map = viommu_iotlb_sync_map,
>   .free   = viommu_domain_free,
> 
> -- 
> 2.39.2
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-04 Thread Jean-Philippe Brucker

Hi Niklas,

Thanks for following up with these patches

On Fri, Aug 25, 2023 at 05:21:25PM +0200, Niklas Schnelle wrote:
> Pull out the sync operation from viommu_map_pages() by implementing
> ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
> elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
> it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.
> 
> Link: 
> https://lore.kernel.org/lkml/20230726111433.1105665-1-schne...@linux.ibm.com/
> Signed-off-by: Niklas Schnelle 
> ---
>  drivers/iommu/virtio-iommu.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 3551ed057774..fb73dec5b953 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -843,7 +843,7 @@ static int viommu_map_pages(struct iommu_domain *domain, 
> unsigned long iova,
>   .flags  = cpu_to_le32(flags),
>   };
>  
> - ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
> + ret = viommu_add_req(vdomain->viommu, , sizeof(map));
>   if (ret) {
>   viommu_del_mappings(vdomain, iova, end);
>   return ret;
> @@ -909,9 +909,21 @@ static void viommu_iotlb_sync(struct iommu_domain 
> *domain,
>  {
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> + if (!vdomain->nr_endpoints)
> + return;

I was wondering about these nr_endpoints checks, which seemed unnecessary:
if map()/unmap() were called with no attached endpoints, then no requests
were added to the queue, and viommu_sync_req() below is a nop.

But at least viommu_iotlb_sync_map() and viommu_flush_iotlb_all() need to
handle being called before the domain is finalized (for example by
iommu_create_device_direct_mappings()). In that case vdomain->viommu is
NULL so if you add a NULL check in viommu_sync_req() then you should be
able to drop the nr_endpoints checks in both patches.

Thanks,
Jean

>   viommu_sync_req(vdomain->viommu);
>  }
>  
> +static int viommu_iotlb_sync_map(struct iommu_domain *domain,
> +  unsigned long iova, size_t size)
> +{
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + if (!vdomain->nr_endpoints)
> + return 0;
> + return viommu_sync_req(vdomain->viommu);
> +}
> +
>  static void viommu_get_resv_regions(struct device *dev, struct list_head 
> *head)
>  {
>   struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> @@ -1058,6 +1070,7 @@ static struct iommu_ops viommu_ops = {
>   .unmap_pages= viommu_unmap_pages,
>   .iova_to_phys   = viommu_iova_to_phys,
>   .iotlb_sync = viommu_iotlb_sync,
> + .iotlb_sync_map = viommu_iotlb_sync_map,
>   .free   = viommu_domain_free,
>   }
>  };
> 
> -- 
> 2.39.2
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC] iommu/virtio: Use single flush queue (EXPERIMENTAL)

2023-08-02 Thread Jean-Philippe Brucker

Hi Niklas,

On Wed, Jul 26, 2023 at 01:14:33PM +0200, Niklas Schnelle wrote:
> Just like on paged s390 guests with their virtual IOMMU, syncing
> mappings via virtio-iommu is quite expensive. It can thus benefit from
> queueing unmapped IOVAs and flushing them in batches but less so from
> parallel flushes which is what the shadow_on_flush flag introduced for
> s390 tunes dma-iommu to do.
> 
> For this to work .flush_iotlb_all is implemented. Furthermore
> .iotlb_sync_map is also implemented and used to pull the sync out of the
> mapping operation for some additional batching and performance gain.
> 
> In a basic test with NVMe pass-through to a KVM guest on a Ryzen 3900X
> these changes together lead to about 19% more IOPS in a fio test and
> slightly more bandwidth too.

Nice, thank you for testing this. I played with a NVMe on an Intel desktop
and can confirm similar results. With "sq" meaning single flush queue and
"mq" percpu flush queue, "+map" is with .iotlb_sync_map() enabled.

Multithread block randwrite job [1]:

 BW compared to hostConfidence
  (higher better)
  host   100.0%   ±0.0%
  noviommu99.9 0.0
viommu lazy sq +map   99.9 0.1
viommu lazy mq +map   99.9 0.1
viommu lazy sq92.2 0.9
viommu lazy mq91.5 0.9
 viommu strict +map   92.7 0.9
 viommu strict81.3 1.0

Single page randrw:

Latency compared to hostConfidence
   (lower better)
  hostx1.00   ±.04
  noviommu 1.23.04
viommu lazy sq +map7.09.05
viommu lazy mq +map7.07.07
viommu lazy sq 7.15.04
viommu lazy mq 7.11.05
 viommu strict +map8.82.05
 viommu strict 8.82.04

So with lazy+map we get the maximum bandwidth reachable on this disk
(2.5GiB/s) even with a heavy iommu_map/unmap usage, which is cool.
Random access latency also improves with lazy mode.

The difference between single and percpu flush queue isn't really
measurable in my multithread test. There is a difference between Lazy sq
and mq but the variation between samples outweighs it.

> 
> Signed-off-by: Niklas Schnelle 
> ---
> Note:
> The idea of using the single flush queue scheme from my series "iommu/dma: 
> s390
> DMA API conversion and optimized IOTLB flushing"[0] for virtio-iommu was 
> already
> mentioned in the cover letter. I now wanted to explore this with this patch
> which may also serve as a test vehicle for the single flush queue scheme 
> usable
> on non-s390.
> 
> Besides limited testing, this is marked experimental mainly because the use of
> queuing needs to be a concious decision as it allows continued access to
> unmapped pages for up to a second with the currently proposed single flush
> queue mechanism.

It fits with the iommu.strict=0 / CONFIG_IOMMU_DEFAULT_DMA_LAZY setting,
which selects DMA_FQ domains. That option allows a misbehaving device to
access memory that has been freed/reallocated, which is what we're
enabling here. I believe the risk is pretty much the same for deferred
UNMAP as for deferred TLBI, since mappings that we're removing were likely
cached in the TLB. Increasing the timeout does make it easier to exploit,
but I don't think that changes the policy from an admin's perspective:
only enable lazy mode if you trust device and driver.

On bare metal, we disable DMA_FQ for devices that can be easily hotplugged
into unattended machines (through external-facing ports like thunderbolt).
On VMs, the concern isn't really about external devices, since they don't
automatically get plugged into a VM without user intervention. Here I
guess the devices we don't trust will be virtual devices implemented by
other VMs. We don't have any method to identify them yet, so
iommu.strict=1 and CONFIG_IOMMU_DEFAULT_DMA_STRICT is the best we can do
at the moment.

I'm not so sure about enabling shadow_on_flush by default, since the
performance difference was too small in my tests. Maybe a module parameter
for dma-iommu could configure the flush queue?

> Also it might make sense to split this patch to do the
> introduction and use of .iotlb_sync_map separately but as a test vehicle
> I found it easier to consume as a single patch.

Yes, both changes are useful but should be in two patches

Thanks,
Jean

[1] (Running one fio process for each of the two tests, sequentially, 30
or more times, with 1 warmup.)

[global]
filename=/dev/nvme0n1
runtime=10
ioengine=libaio
direct=1
time_based

[randwrite_multi]
group_reporting

Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

2023-06-26 Thread Jean-Philippe Brucker

On Mon, Jun 19, 2023 at 11:35:50AM +0800, Baolu Lu wrote:
> > Another outstanding issue was what to do for PASID stop. When the guest
> > device driver stops using a PASID it issues a PASID stop request to the
> > device (a device-specific mechanism). If the device is not using PRI stop
> > markers it waits for pending PRs to complete and we're fine. Otherwise it
> > sends a stop marker which is flushed to the PRI queue, but does not wait
> > for pending PRs.
> > 
> > Handling stop markers is annoying. If the device issues one, then the PRI
> > queue contains stale faults, a stop marker, followed by valid faults for
> > the next address space bound to this PASID. The next address space will
> > get all the spurious faults because the fault handler doesn't know that
> > there is a stop marker coming. Linux is probably alright with spurious
> > faults, though maybe not in all cases, and other guests may not support
> > them at all.
> > 
> > We might need to revisit supporting stop markers: request that each device
> > driver declares whether their device uses stop markers on unbind() ("This
> > mechanism must indicate that a Stop Marker Message will be generated."
> > says the spec, but doesn't say if the function always uses one or the
> > other mechanism so it's per-unbind). Then we still have to synchronize
> > unbind() with the fault handler to deal with the pending stop marker,
> > which might have already gone through or be generated later.
> 
> I don't quite follow here. Once a PASID is unbound from the device, the
> device driver should be free to release the PASID. The PASID could then
> be used for any other purpose. The device driver has no idea when the
> pending page requests are flushed after unbind(), so it cannot decide
> how long should the PASID be delayed for reuse. Therefore, I understand
> that a successful return from the unbind() function denotes that all
> pending page requests have been flushed and the PASID is viable for
> other use.

Yes that's the contract for unbind() at the moment

> 
> > 
> > Currently we ignore all that and just flush the PRI queue, followed by the
> > IOPF queue, to get rid of any stale fault before reassigning the PASID. A
> > guest however would also need to first flush the HW PRI queue, but doesn't
> > have a direct way to do that. If we want to support guests that don't deal
> > with stop markers, the host needs to flush the PRI queue when a PASID is
> > detached. I guess on Intel detaching the PASID goes through the host which
> > can flush the host queue. On Arm we'll probably need to flush the queue
> > when receiving a PASID cache invalidation, which the guest issues after
> > clearing a PASID table entry.
> 
> The Intel VT-d driver follows below steps to drain pending page requests
> when a PASID is unbound from a device.
> 
> - Tear down the device's pasid table entry for the stopped pasid.
>   This ensures that ATS/PRI will stop putting more page requests for the
>   pasid in VT-d PRQ.

Oh that's interesting, I didn't know about the implicit TLB invalidations
on page requests for VT-d.

For Arm SMMU, clearing the PASID table entry does cause ATS Translation
Requests to return with Completer Abort, but does not affect PRI. The SMMU
pushes page requests directly into the PRI queue without reading any table
(unless the queue overflows).

We're counting on the device driver to perform the PASID stop request
before calling unbind(), described in PCIe 6.20.1 (Managing PASID Usage)
and 10.4.1.2 (Managing PASID Usage on PRG Requests). This ensures that
when unbind() is called, no more page request for the PASID is pushed into
the PRI queue. But some may still be in the queue if the device uses stop
markers.

> - Sync with the PRQ handling thread until all related page requests in
>   PRQ have been delivered.

This is what I'm concerned about. For VT-d this happens in the host which
is in charge of modifying the PASID table. For SMMU, the guest writes the
PASID table. It flushes its virtual PRI queue, but not the physical queue
that is managed by the host.

One synchronization point where the host could flush the physical PRI
queue is the PASID config invalidation (CMD_CFGI_CD). As Jason pointed out
the host may not be able to observe those if a command queue is assigned
directly to the guest (a theoretical SMMU extension), though in that case
the guest may also have direct access to a PRI queue (like the AMD vIOMMU
extension) and be able to flush the queue directly.

But we can just wait for PRI implementations and see what the drivers
need. Maybe no device will implement stop markers.

Thanks,
Jean

> - Flush the iopf queue with iopf_queue_flush_dev().
> - Follow the steps defined in VT-d spec section 7.10 to drain all page
>   requests and responses between VT-d and the endpoint device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

2023-06-16 Thread Jean-Philippe Brucker

Hi Baolu,

On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu wrote:
> - The timeout value for the pending page fault messages. Ideally we
>   should determine the timeout value from the device configuration, but
>   I failed to find any statement in the PCI specification (version 6.x).
>   A default 100 milliseconds is selected in the implementation, but it
>   leave the room for grow the code for per-device setting.

If it helps we had some discussions about this timeout [1]. It's useful to
print out a warning for debugging, but I don't think completing the fault
on timeout is correct, we should leave the fault pending. Given that the
PCI spec does not indicate a timeout, the guest can wait as long as it
wants to complete the fault (and 100ms may even be reasonable on an
emulator, who knows how many layers and context switches the fault has to
go through).

Another outstanding issue was what to do for PASID stop. When the guest
device driver stops using a PASID it issues a PASID stop request to the
device (a device-specific mechanism). If the device is not using PRI stop
markers it waits for pending PRs to complete and we're fine. Otherwise it
sends a stop marker which is flushed to the PRI queue, but does not wait
for pending PRs.

Handling stop markers is annoying. If the device issues one, then the PRI
queue contains stale faults, a stop marker, followed by valid faults for
the next address space bound to this PASID. The next address space will
get all the spurious faults because the fault handler doesn't know that
there is a stop marker coming. Linux is probably alright with spurious
faults, though maybe not in all cases, and other guests may not support
them at all.

We might need to revisit supporting stop markers: request that each device
driver declares whether their device uses stop markers on unbind() ("This
mechanism must indicate that a Stop Marker Message will be generated."
says the spec, but doesn't say if the function always uses one or the
other mechanism so it's per-unbind). Then we still have to synchronize
unbind() with the fault handler to deal with the pending stop marker,
which might have already gone through or be generated later.

Currently we ignore all that and just flush the PRI queue, followed by the
IOPF queue, to get rid of any stale fault before reassigning the PASID. A
guest however would also need to first flush the HW PRI queue, but doesn't
have a direct way to do that. If we want to support guests that don't deal
with stop markers, the host needs to flush the PRI queue when a PASID is
detached. I guess on Intel detaching the PASID goes through the host which
can flush the host queue. On Arm we'll probably need to flush the queue
when receiving a PASID cache invalidation, which the guest issues after
clearing a PASID table entry.

Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/20180423153622.GC38106@ostrya.localdomain/
Also unregistration, not sure if relevant here
https://lore.kernel.org/linux-iommu/20190605154553.0d00ad8d@jacob-builder/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Detach domain on endpoint release

2023-05-18 Thread Jean-Philippe Brucker

On Thu, May 18, 2023 at 10:59:12AM -0300, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 02:56:38PM +0100, Jean-Philippe Brucker wrote:
> > > Can you wrapper this into a BLOCKED domain like we are moving drivers
> > > toward, and then attach the blocked domain instead of introducing this
> > > special case?
> > 
> > Yes, I think the way the virtio-iommu driver should implement BLOCKED
> > domains is initially clearing the global-bypass bit, and then issuing
> > DETACH requests when the core asks to attach a BLOCKED domain. This has
> > the same effect as issuing an ATTACH request with an empty domain, but
> > requires fewer resources in the VMM.
> 
> Does that exclude identity though?

No, identity attaches a domain with the ATTACH_F_BYPASS flag (or an
identity-mapped domain if the feature is missing), it doesn't rely on
global-bypass.

> 
> It seems like the protocol should not have an implicit operation like
> this, the desired translation mode should always be made
> explicit.

I probably misunderstood your plan for BLOCKED. This particular patch is
about removing devices from the machine, for example PCIe hot-unplug. So I
thought you were suggesting the core will at some point attach a BLOCKED
domain to a device being removed, in order to block translation while the
device is being removed and while a new one is being plugged in with the
same RID. For that case I think DETACH, rather than ATTACH an empty
domain, makes more sense. Otherwise with the same reasoning we'd need to
attach all 4 billion endpoint IDs to an empty domain at boot which isn't
feasible. In addition I don't think we'll want to force the VMMs to keep
endpoint ID state internally after destroying devices, though that does
need to be specified one way or another.

If BLOCKED is only for transient states, for example while a struct device
is not bound to a driver, then attaching an empty domain works and is
simpler to implement. Probably the best is to implement BLOCKED this way
and still call DETACH in the release_device() op.

In any case, it shouldn't make a difference to the core. I'll see which
one is better for the VMMs.

Thanks,
Jean

> Having a boot time default makes sense, but there is little
> reason for an OS to go back to the boot time default...
> 
> Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Detach domain on endpoint release

2023-05-18 Thread Jean-Philippe Brucker

On Wed, May 17, 2023 at 01:20:51PM -0300, Jason Gunthorpe wrote:
> On Wed, May 10, 2023 at 09:11:57AM +0100, Jean-Philippe Brucker wrote:
> 
> > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > > index 5b8fe9bfa9a5..3d3d4462359e 100644
> > > --- a/drivers/iommu/virtio-iommu.c
> > > +++ b/drivers/iommu/virtio-iommu.c
> > > @@ -788,6 +788,28 @@ static int viommu_attach_dev(struct iommu_domain 
> > > *domain, struct device *dev)
> > >   return 0;
> > >  }
> > >  
> > > +static void viommu_detach_dev(struct viommu_endpoint *vdev)
> > > +{
> > > + int i;
> > > + struct virtio_iommu_req_detach req;
> > > + struct viommu_domain *vdomain = vdev->vdomain;
> > > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(vdev->dev);
> > > +
> > > + if (!vdomain)
> > > + return;
> > > +
> > > + req = (struct virtio_iommu_req_detach) {
> > > + .head.type  = VIRTIO_IOMMU_T_DETACH,
> > > + .domain = cpu_to_le32(vdomain->id),
> > > + };
> > > +
> > > + for (i = 0; i < fwspec->num_ids; i++) {
> > > + req.endpoint = cpu_to_le32(fwspec->ids[i]);
> > > + WARN_ON(viommu_send_req_sync(vdev->viommu, , sizeof(req)));
> > > + }
> > > + vdev->vdomain = NULL;
> 
> Not for this patch, but something to work on..
> 
> I assume detach disconnects the container on the VFIO side and puts it
> into a BLOCKED state?

At the moment that depends on the VMM boot-bypass policy: if virtio-iommu
is booted with global-bypass set, then detaching will go back to an
identity-mapped container, not a BLOCKED state. If the global-bypass bit
is clear, then it does go back to a BLOCKED state. However QEMU has a
default policy of boot-bypass (because that allows booting a firmware or
OS that doesn't have the IOMMU drivers).

The driver can clear the global-bypass bit to change this behavior, but at
the moment we just follow the VMM boot policy.

> 
> Can you wrapper this into a BLOCKED domain like we are moving drivers
> toward, and then attach the blocked domain instead of introducing this
> special case?

Yes, I think the way the virtio-iommu driver should implement BLOCKED
domains is initially clearing the global-bypass bit, and then issuing
DETACH requests when the core asks to attach a BLOCKED domain. This has
the same effect as issuing an ATTACH request with an empty domain, but
requires fewer resources in the VMM.

Thanks,
Jean

> 
> I've been thinking about having some core code support to do fairly
> common pattern of 'parking' the iommu at some well defined translation
> mode, BLOCKED in this case.
> 
> Thanks,
> Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 2/2] iommu/virtio: Return size mapped for a detached domain

2023-05-15 Thread Jean-Philippe Brucker

When map() is called on a detached domain, the domain does not exist in
the device so we do not send a MAP request, but we do update the
internal mapping tree, to be replayed on the next attach. Since this
constitutes a successful iommu_map() call, return *mapped in this case
too.

Fixes: 7e62edd7a33a ("iommu/virtio: Add map/unmap_pages() callbacks 
implementation")
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index fd316a37d7562..3551ed057774e 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -833,25 +833,26 @@ static int viommu_map_pages(struct iommu_domain *domain, 
unsigned long iova,
if (ret)
return ret;
 
-   map = (struct virtio_iommu_req_map) {
-   .head.type  = VIRTIO_IOMMU_T_MAP,
-   .domain = cpu_to_le32(vdomain->id),
-   .virt_start = cpu_to_le64(iova),
-   .phys_start = cpu_to_le64(paddr),
-   .virt_end   = cpu_to_le64(end),
-   .flags  = cpu_to_le32(flags),
-   };
+   if (vdomain->nr_endpoints) {
+   map = (struct virtio_iommu_req_map) {
+   .head.type  = VIRTIO_IOMMU_T_MAP,
+   .domain = cpu_to_le32(vdomain->id),
+   .virt_start = cpu_to_le64(iova),
+   .phys_start = cpu_to_le64(paddr),
+   .virt_end   = cpu_to_le64(end),
+   .flags  = cpu_to_le32(flags),
+   };
 
-   if (!vdomain->nr_endpoints)
-   return 0;
-
-   ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
-   if (ret)
-   viommu_del_mappings(vdomain, iova, end);
-   else if (mapped)
+   ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
+   if (ret) {
+   viommu_del_mappings(vdomain, iova, end);
+   return ret;
+   }
+   }
+   if (mapped)
*mapped = size;
 
-   return ret;
+   return 0;
 }
 
 static size_t viommu_unmap_pages(struct iommu_domain *domain, unsigned long 
iova,
-- 
2.40.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 1/2] iommu/virtio: Detach domain on endpoint release

2023-05-15 Thread Jean-Philippe Brucker

When an endpoint is released, for example a PCIe VF being destroyed or a
function hot-unplugged, it should be detached from its domain. Send a
DETACH request.

Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
Reported-by: Akihiko Odaki 
Link: 
https://lore.kernel.org/all/15bf1b00-3aa0-973a-3a86-3fa5c4d41...@daynix.com/
Signed-off-by: Jean-Philippe Brucker 
Tested-by: Akihiko Odaki 
---
v1: 
https://lore.kernel.org/linux-iommu/20230414150744.562456-1-jean-phili...@linaro.org/
v2: fixed nr_endpoints count reported by Eric
---
 drivers/iommu/virtio-iommu.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 5b8fe9bfa9a5b..fd316a37d7562 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -788,6 +788,29 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
return 0;
 }
 
+static void viommu_detach_dev(struct viommu_endpoint *vdev)
+{
+   int i;
+   struct virtio_iommu_req_detach req;
+   struct viommu_domain *vdomain = vdev->vdomain;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(vdev->dev);
+
+   if (!vdomain)
+   return;
+
+   req = (struct virtio_iommu_req_detach) {
+   .head.type  = VIRTIO_IOMMU_T_DETACH,
+   .domain = cpu_to_le32(vdomain->id),
+   };
+
+   for (i = 0; i < fwspec->num_ids; i++) {
+   req.endpoint = cpu_to_le32(fwspec->ids[i]);
+   WARN_ON(viommu_send_req_sync(vdev->viommu, , sizeof(req)));
+   }
+   vdomain->nr_endpoints--;
+   vdev->vdomain = NULL;
+}
+
 static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t pgsize, size_t pgcount,
int prot, gfp_t gfp, size_t *mapped)
@@ -990,6 +1013,7 @@ static void viommu_release_device(struct device *dev)
 {
struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
 
+   viommu_detach_dev(vdev);
iommu_put_resv_regions(dev, >resv_regions);
kfree(vdev);
 }
-- 
2.40.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 0/2] iommu/virtio: Fixes

2023-05-15 Thread Jean-Philippe Brucker

One fix reported by Akihiko, and another found while going over the
driver.

Jean-Philippe Brucker (2):
  iommu/virtio: Detach domain on endpoint release
  iommu/virtio: Return size mapped for a detached domain

 drivers/iommu/virtio-iommu.c | 57 ++--
 1 file changed, 41 insertions(+), 16 deletions(-)

-- 
2.40.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Detach domain on endpoint release

2023-05-10 Thread Jean-Philippe Brucker

On Wed, May 10, 2023 at 05:37:22PM +0200, Eric Auger wrote:
> Hi Jean,
> 
> On 4/14/23 17:07, Jean-Philippe Brucker wrote:
> > When an endpoint is released, for example a PCIe VF is disabled or a
> > function hot-unplugged, it should be detached from its domain. Send a
> > DETACH request.
> >
> > Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
> > Reported-by: Akihiko Odaki 
> > Link: 
> > https://lore.kernel.org/all/15bf1b00-3aa0-973a-3a86-3fa5c4d41...@daynix.com/
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/virtio-iommu.c | 23 +++
> >  1 file changed, 23 insertions(+)
> >
> > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > index 5b8fe9bfa9a5..3d3d4462359e 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -788,6 +788,28 @@ static int viommu_attach_dev(struct iommu_domain 
> > *domain, struct device *dev)
> > return 0;
> >  }
> >  
> > +static void viommu_detach_dev(struct viommu_endpoint *vdev)
> > +{
> > +   int i;
> > +   struct virtio_iommu_req_detach req;
> > +   struct viommu_domain *vdomain = vdev->vdomain;
> > +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(vdev->dev);
> > +
> > +   if (!vdomain)
> > +   return;
> > +
> > +   req = (struct virtio_iommu_req_detach) {
> > +   .head.type  = VIRTIO_IOMMU_T_DETACH,
> > +   .domain = cpu_to_le32(vdomain->id),
> > +   };
> > +
> > +   for (i = 0; i < fwspec->num_ids; i++) {
> > +   req.endpoint = cpu_to_le32(fwspec->ids[i]);
> > +   WARN_ON(viommu_send_req_sync(vdev->viommu, , sizeof(req)));
> > +   }
> just a late question: don't you need to decrement vdomain's nr_endpoints?
> 

Ah yes, I'll fix that, thank you. I think attach() could use some cleanup
as well: if the request fails then we should keep the nr_endpoints
reference on the previous domain. But that's less urgent.

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Detach domain on endpoint release

2023-05-10 Thread Jean-Philippe Brucker

Hi Joerg,

On Fri, Apr 14, 2023 at 04:07:45PM +0100, Jean-Philippe Brucker wrote:
> When an endpoint is released, for example a PCIe VF is disabled or a
> function hot-unplugged, it should be detached from its domain. Send a
> DETACH request.
> 
> Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
> Reported-by: Akihiko Odaki 
> Link: 
> https://lore.kernel.org/all/15bf1b00-3aa0-973a-3a86-3fa5c4d41...@daynix.com/
> Signed-off-by: Jean-Philippe Brucker 

This patch fixes device unregistration in the virtio-iommu driver, could
you please pick it up for the next batch of fixes?  It applies cleanly on
v6.4-rc1

Thanks,
Jean

> ---
>  drivers/iommu/virtio-iommu.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 5b8fe9bfa9a5..3d3d4462359e 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -788,6 +788,28 @@ static int viommu_attach_dev(struct iommu_domain 
> *domain, struct device *dev)
>   return 0;
>  }
>  
> +static void viommu_detach_dev(struct viommu_endpoint *vdev)
> +{
> + int i;
> + struct virtio_iommu_req_detach req;
> + struct viommu_domain *vdomain = vdev->vdomain;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(vdev->dev);
> +
> + if (!vdomain)
> + return;
> +
> + req = (struct virtio_iommu_req_detach) {
> + .head.type  = VIRTIO_IOMMU_T_DETACH,
> + .domain = cpu_to_le32(vdomain->id),
> + };
> +
> + for (i = 0; i < fwspec->num_ids; i++) {
> + req.endpoint = cpu_to_le32(fwspec->ids[i]);
> + WARN_ON(viommu_send_req_sync(vdev->viommu, , sizeof(req)));
> + }
> + vdev->vdomain = NULL;
> +}
> +
>  static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova,
>   phys_addr_t paddr, size_t pgsize, size_t pgcount,
>   int prot, gfp_t gfp, size_t *mapped)
> @@ -990,6 +1012,7 @@ static void viommu_release_device(struct device *dev)
>  {
>   struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
>  
> + viommu_detach_dev(vdev);
>   iommu_put_resv_regions(dev, >resv_regions);
>   kfree(vdev);
>  }
> -- 
> 2.40.0
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-iommu hotplug issue

2023-04-14 Thread Jean-Philippe Brucker

On Thu, Apr 13, 2023 at 08:01:54PM +0900, Akihiko Odaki wrote:
> Yes, that's right. The guest can dynamically create and delete VFs. The
> device is emulated by QEMU: igb, an Intel NIC recently added to QEMU and
> projected to be released as part of QEMU 8.0.

Ah great, that's really useful, I'll add it to my tests

> > Yes, I think this is an issue in the virtio-iommu driver, which should be
> > sending a DETACH request when the VF is disabled, likely from
> > viommu_release_device(). I'll work on a fix unless you would like to do it
> 
> It will be nice if you prepare a fix. I will test your patch with my
> workload if you share it with me.

I sent a fix:
https://lore.kernel.org/linux-iommu/20230414150744.562456-1-jean-phili...@linaro.org/

Thank you for reporting this, it must have been annoying to debug

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] iommu/virtio: Detach domain on endpoint release

2023-04-14 Thread Jean-Philippe Brucker

When an endpoint is released, for example a PCIe VF is disabled or a
function hot-unplugged, it should be detached from its domain. Send a
DETACH request.

Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
Reported-by: Akihiko Odaki 
Link: 
https://lore.kernel.org/all/15bf1b00-3aa0-973a-3a86-3fa5c4d41...@daynix.com/
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 5b8fe9bfa9a5..3d3d4462359e 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -788,6 +788,28 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
return 0;
 }
 
+static void viommu_detach_dev(struct viommu_endpoint *vdev)
+{
+   int i;
+   struct virtio_iommu_req_detach req;
+   struct viommu_domain *vdomain = vdev->vdomain;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(vdev->dev);
+
+   if (!vdomain)
+   return;
+
+   req = (struct virtio_iommu_req_detach) {
+   .head.type  = VIRTIO_IOMMU_T_DETACH,
+   .domain = cpu_to_le32(vdomain->id),
+   };
+
+   for (i = 0; i < fwspec->num_ids; i++) {
+   req.endpoint = cpu_to_le32(fwspec->ids[i]);
+   WARN_ON(viommu_send_req_sync(vdev->viommu, , sizeof(req)));
+   }
+   vdev->vdomain = NULL;
+}
+
 static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t pgsize, size_t pgcount,
int prot, gfp_t gfp, size_t *mapped)
@@ -990,6 +1012,7 @@ static void viommu_release_device(struct device *dev)
 {
struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
 
+   viommu_detach_dev(vdev);
iommu_put_resv_regions(dev, >resv_regions);
kfree(vdev);
 }
-- 
2.40.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-iommu hotplug issue

2023-04-13 Thread Jean-Philippe Brucker

Hello,

On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
> Hi,
> 
> Recently I encountered a problem with the combination of Linux's
> virtio-iommu driver and QEMU when a SR-IOV virtual function gets disabled.
> I'd like to ask you what kind of solution is appropriate here and implement
> the solution if possible.
> 
> A PCIe device implementing the SR-IOV specification exports a virtual
> function, and the guest can enable or disable it at runtime by writing to a
> configuration register. This effectively looks like a PCI device is
> hotplugged for the guest.

Just so I understand this better: the guest gets a whole PCIe device PF
that implements SR-IOV, and so the guest can dynamically create VFs?  Out
of curiosity, is that a hardware device assigned to the guest with VFIO,
or a device emulated by QEMU?

> In such a case, the kernel assumes the endpoint is
> detached from the virtio-iommu domain, but QEMU actually does not detach it.
> 
> This inconsistent view of the removed device sometimes prevents the VM from
> correctly performing the following procedure, for example:
> 1. Enable a VF.
> 2. Disable the VF.
> 3. Open a vfio container.
> 4. Open the group which the PF belongs to.
> 5. Add the group to the vfio container.
> 6. Map some memory region.
> 7. Close the group.
> 8. Close the vfio container.
> 9. Repeat 3-8
> 
> When the VF gets disabled, the kernel assumes the endpoint is detached from
> the IOMMU domain, but QEMU actually doesn't detach it. Later, the domain
> will be reused in step 3-8.
> 
> In step 7, the PF will be detached, and the kernel thinks there is no
> endpoint attached and the mapping the domain holds is cleared, but the VF
> endpoint is still attached and the mapping is kept intact.
> 
> In step 9, the same domain will be reused again, and the kernel requests to
> create a new mapping, but it will conflict with the existing mapping and
> result in -EINVAL.
> 
> This problem can be fixed by either of:
> - requesting the detachment of the endpoint from the guest when the PCI
> device is unplugged (the VF is disabled)

Yes, I think this is an issue in the virtio-iommu driver, which should be
sending a DETACH request when the VF is disabled, likely from
viommu_release_device(). I'll work on a fix unless you would like to do it

> - detecting that the PCI device is gone and automatically detach it on
> QEMU-side.
> 
> It is not completely clear for me which solution is more appropriate as the
> virtio-iommu specification is written in a way independent of the endpoint
> mechanism and does not say what should be done when a PCI device is
> unplugged.

Yes, I'm not sure it's in scope for the specification, it's more about
software guidance

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3] dt-bindings: virtio: Convert virtio, pci-iommu to DT schema

2022-09-23 Thread Jean-Philippe Brucker

Convert the binding that describes the virtio-pci based IOMMU to DT
schema. Change the compatible string to "pci,", which is
defined by the PCI Bus Binding, but keep "virtio,pci-iommu" as an option
for backward compatibility.

Signed-off-by: Jean-Philippe Brucker 
---
v3: Renamed file and type to pci-iommu
v2: 
https://lore.kernel.org/linux-devicetree/20220922161644.372181-1-jean-phili...@linaro.org/
---
 .../devicetree/bindings/virtio/iommu.txt  |  66 
 .../devicetree/bindings/virtio/pci-iommu.yaml | 101 ++
 2 files changed, 101 insertions(+), 66 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 Documentation/devicetree/bindings/virtio/pci-iommu.yaml

diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
b/Documentation/devicetree/bindings/virtio/iommu.txt
deleted file mode 100644
index 2407fea0651c..
--- a/Documentation/devicetree/bindings/virtio/iommu.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-* virtio IOMMU PCI device
-
-When virtio-iommu uses the PCI transport, its programming interface is
-discovered dynamically by the PCI probing infrastructure. However the
-device tree statically describes the relation between IOMMU and DMA
-masters. Therefore, the PCI root complex that hosts the virtio-iommu
-contains a child node representing the IOMMU device explicitly.
-
-Required properties:
-
-- compatible:  Should be "virtio,pci-iommu"
-- reg: PCI address of the IOMMU. As defined in the PCI Bus
-   Binding reference [1], the reg property is a five-cell
-   address encoded as (phys.hi phys.mid phys.lo size.hi
-   size.lo). phys.hi should contain the device's BDF as
-   0b  dfff . The other cells
-   should be zero.
-- #iommu-cells:Each platform DMA master managed by the IOMMU is 
assigned
-   an endpoint ID, described by the "iommus" property [2].
-   For virtio-iommu, #iommu-cells must be 1.
-
-Notes:
-
-- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
-  virtio-iommu node doesn't have an "iommus" property, and is omitted from
-  the iommu-map property of the root complex.
-
-Example:
-
-pcie@1000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-
-   /* The IOMMU programming interface uses slot 00:01.0 */
-   iommu0: iommu@0008 {
-   compatible = "virtio,pci-iommu";
-   reg = <0x0800 0 0 0 0>;
-   #iommu-cells = <1>;
-   };
-
-   /*
-* The IOMMU manages all functions in this PCI domain except
-* itself. Omit BDF 00:01.0.
-*/
-   iommu-map = <0x0  0x0 0x8>
-   <0x9  0x9 0xfff7>;
-};
-
-pcie@2000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-   /*
-* The IOMMU also manages all functions from this domain,
-* with endpoint IDs 0x1 - 0x1
-*/
-   iommu-map = <0x0  0x1 0x1>;
-};
-
-ethernet@fe001000 {
-   ...
-   /* The IOMMU manages this platform device with endpoint ID 0x2 */
-   iommus = < 0x2>;
-};
-
-[1] Documentation/devicetree/bindings/pci/pci.txt
-[2] Documentation/devicetree/bindings/iommu/iommu.txt
diff --git a/Documentation/devicetree/bindings/virtio/pci-iommu.yaml 
b/Documentation/devicetree/bindings/virtio/pci-iommu.yaml
new file mode 100644
index ..972a785a42de
--- /dev/null
+++ b/Documentation/devicetree/bindings/virtio/pci-iommu.yaml
@@ -0,0 +1,101 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/virtio/pci-iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: virtio-iommu device using the virtio-pci transport
+
+maintainers:
+  - Jean-Philippe Brucker 
+
+description: |
+  When virtio-iommu uses the PCI transport, its programming interface is
+  discovered dynamically by the PCI probing infrastructure. However the
+  device tree statically describes the relation between IOMMU and DMA
+  masters. Therefore, the PCI root complex that hosts the virtio-iommu
+  contains a child node representing the IOMMU device explicitly.
+
+  DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
+  virtio-iommu node doesn't have an "iommus" property, and is omitted from
+  the iommu-map property of the root complex.
+
+properties:
+  # If compatible is present, it should contain the vendor and device ID
+  # according to the PCI Bus Binding specification. Since PCI provides
+  # built-in identification methods, compatible is not actually required.
+  compatible:
+oneOf:
+  - items:
+  - const: virtio,pci-iommu
+  - const: pci1af4,1057
+  - items:
+  - const: pci1af4,1057
+
+  reg:
+

[PATCH v2] dt-bindings: virtio: Convert virtio, pci-iommu to DT schema

2022-09-22 Thread Jean-Philippe Brucker

Convert the binding that describes virtio-pci base IOMMU to DT schema.
Change the compatible string to "pci," which is defined
by the PCI Bus Binding

Signed-off-by: Jean-Philippe Brucker 
---
v2: Fix example, make compatible a required property
v1: 
https://lore.kernel.org/all/20220916132229.1908841-1-jean-phili...@linaro.org/
---
 .../devicetree/bindings/virtio/iommu.txt  |  66 
 .../devicetree/bindings/virtio/iommu.yaml | 101 ++
 2 files changed, 101 insertions(+), 66 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.yaml

diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
b/Documentation/devicetree/bindings/virtio/iommu.txt
deleted file mode 100644
index 2407fea0651c..
--- a/Documentation/devicetree/bindings/virtio/iommu.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-* virtio IOMMU PCI device
-
-When virtio-iommu uses the PCI transport, its programming interface is
-discovered dynamically by the PCI probing infrastructure. However the
-device tree statically describes the relation between IOMMU and DMA
-masters. Therefore, the PCI root complex that hosts the virtio-iommu
-contains a child node representing the IOMMU device explicitly.
-
-Required properties:
-
-- compatible:  Should be "virtio,pci-iommu"
-- reg: PCI address of the IOMMU. As defined in the PCI Bus
-   Binding reference [1], the reg property is a five-cell
-   address encoded as (phys.hi phys.mid phys.lo size.hi
-   size.lo). phys.hi should contain the device's BDF as
-   0b  dfff . The other cells
-   should be zero.
-- #iommu-cells:Each platform DMA master managed by the IOMMU is 
assigned
-   an endpoint ID, described by the "iommus" property [2].
-   For virtio-iommu, #iommu-cells must be 1.
-
-Notes:
-
-- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
-  virtio-iommu node doesn't have an "iommus" property, and is omitted from
-  the iommu-map property of the root complex.
-
-Example:
-
-pcie@1000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-
-   /* The IOMMU programming interface uses slot 00:01.0 */
-   iommu0: iommu@0008 {
-   compatible = "virtio,pci-iommu";
-   reg = <0x0800 0 0 0 0>;
-   #iommu-cells = <1>;
-   };
-
-   /*
-* The IOMMU manages all functions in this PCI domain except
-* itself. Omit BDF 00:01.0.
-*/
-   iommu-map = <0x0  0x0 0x8>
-   <0x9  0x9 0xfff7>;
-};
-
-pcie@2000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-   /*
-* The IOMMU also manages all functions from this domain,
-* with endpoint IDs 0x1 - 0x1
-*/
-   iommu-map = <0x0  0x1 0x1>;
-};
-
-ethernet@fe001000 {
-   ...
-   /* The IOMMU manages this platform device with endpoint ID 0x2 */
-   iommus = < 0x2>;
-};
-
-[1] Documentation/devicetree/bindings/pci/pci.txt
-[2] Documentation/devicetree/bindings/iommu/iommu.txt
diff --git a/Documentation/devicetree/bindings/virtio/iommu.yaml 
b/Documentation/devicetree/bindings/virtio/iommu.yaml
new file mode 100644
index ..ae8b670928d3
--- /dev/null
+++ b/Documentation/devicetree/bindings/virtio/iommu.yaml
@@ -0,0 +1,101 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/virtio/iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: virtio-iommu device using the virtio-pci transport
+
+maintainers:
+  - Jean-Philippe Brucker 
+
+description: |
+  When virtio-iommu uses the PCI transport, its programming interface is
+  discovered dynamically by the PCI probing infrastructure. However the
+  device tree statically describes the relation between IOMMU and DMA
+  masters. Therefore, the PCI root complex that hosts the virtio-iommu
+  contains a child node representing the IOMMU device explicitly.
+
+  DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
+  virtio-iommu node doesn't have an "iommus" property, and is omitted from
+  the iommu-map property of the root complex.
+
+properties:
+  # If compatible is present, it should contain the vendor and device ID
+  # according to the PCI Bus Binding specification. Since PCI provides
+  # built-in identification methods, compatible is not actually required.
+  compatible:
+oneOf:
+  - items:
+  - const: virtio,pci-iommu
+  - const: pci1af4,1057
+  - items:
+  - const: pci1af4,1057
+
+  reg:
+description: |
+  PCI address of the IOMMU. As defined in the PCI Bus Binding
+  reference, the reg

Re: [PATCH] dt-bindings: virtio: Convert virtio,pci-iommu to DT schema

2022-09-22 Thread Jean-Philippe Brucker

On Sun, Sep 18, 2022 at 10:23:06AM +0100, Krzysztof Kozlowski wrote:
> > +required:
> 
> Also: compatible
> 
> > +  - reg
> > +  - '#iommu-cells'
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > +  - |
> > +pcie0 {
> 
> Node name: pcie
> 
> > +#address-cells = <3>;
> > +#size-cells = <2>;
> > +
> 
> device_type and then you will see a bunch of warnings.

Right, I think I wanted to avoid pulling the whole PCIe baggage because it
requires a lot of properties that aren't relevant to the example. But
having tried it it's not too bad, and it ensures we validate the child
node.

Thanks,
Jean

> 
> > +/*
> > + * The IOMMU manages all functions in this PCI domain except
> > + * itself. Omit BDF 00:01.0.
> > + */
> > +iommu-map = <0x0  0x0 0x8
> > + 0x9  0x9 0xfff7>;
> > +
> > +/* The IOMMU programming interface uses slot 00:01.0 */
> > +iommu0: iommu@1,0 {
> > +compatible = "pci1af4,1057";
> > +reg = <0x800 0 0 0 0>;
> > +#iommu-cells = <1>;
> > +};
> > +};
> > +
> > +pcie1 {
> 
> Node name: pcie (so probably you will need unit address and reg)
> 
> > +/*
> > + * The IOMMU also manages all functions from this domain,
> > + * with endpoint IDs 0x1 - 0x1
> > + */
> > +iommu-map = <0x0  0x1 0x1>;
> > +};
> > +
> > +ethernet {
> > +/* The IOMMU manages this platform device with endpoint ID 0x2 
> > */
> > +iommus = < 0x2>;
> > +};
> > +
> > +...
> 
> 
> Best regards,
> Krzysztof
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] iommu/virtio: __viommu_sync_req is no need to return a value

2022-09-22 Thread Jean-Philippe Brucker

Hi Liu,

On Thu, Sep 22, 2022 at 07:24:46PM +0800, Liu Song wrote:
> From: Liu Song 
> 
> In "__viommu_sync_req", 0 is always returned as the only return value, no
> return value is needed for this case, and the processes and functions
> involved are adjusted accordingly.
> 
> Signed-off-by: Liu Song 

Thanks for the patch but I'd rather improve __viommu_sync_req() to handle
more errors. At the moment, if the virtqueue breaks then it spins
infinitely waiting for a host response. We should at least check the
return value of virtqueue_kick(), and maybe add a timeout as well although
I'm not sure which time base we can use reliably here.

Thanks,
Jean

> ---
>  drivers/iommu/virtio-iommu.c | 23 ++-
>  1 file changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index b7c2280..fde5661 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -151,7 +151,7 @@ static off_t viommu_get_write_desc_offset(struct 
> viommu_dev *viommu,
>   * Wait for all added requests to complete. When this function returns, all
>   * requests that were in-flight at the time of the call have completed.
>   */
> -static int __viommu_sync_req(struct viommu_dev *viommu)
> +static void __viommu_sync_req(struct viommu_dev *viommu)
>  {
>   unsigned int len;
>   size_t write_len;
> @@ -180,22 +180,15 @@ static int __viommu_sync_req(struct viommu_dev *viommu)
>   list_del(>list);
>   kfree(req);
>   }
> -
> - return 0;
>  }
>  
> -static int viommu_sync_req(struct viommu_dev *viommu)
> +static void viommu_sync_req(struct viommu_dev *viommu)
>  {
> - int ret;
>   unsigned long flags;
>  
>   spin_lock_irqsave(>request_lock, flags);
> - ret = __viommu_sync_req(viommu);
> - if (ret)
> - dev_dbg(viommu->dev, "could not sync requests (%d)\n", ret);
> + __viommu_sync_req(viommu);
>   spin_unlock_irqrestore(>request_lock, flags);
> -
> - return ret;
>  }
>  
>  /*
> @@ -247,8 +240,8 @@ static int __viommu_add_req(struct viommu_dev *viommu, 
> void *buf, size_t len,
>   ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC);
>   if (ret == -ENOSPC) {
>   /* If the queue is full, sync and retry */
> - if (!__viommu_sync_req(viommu))
> - ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC);
> + __viommu_sync_req(viommu);
> + ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC);
>   }
>   if (ret)
>   goto err_free;
> @@ -293,11 +286,7 @@ static int viommu_send_req_sync(struct viommu_dev 
> *viommu, void *buf,
>   goto out_unlock;
>   }
>  
> - ret = __viommu_sync_req(viommu);
> - if (ret) {
> - dev_dbg(viommu->dev, "could not sync requests (%d)\n", ret);
> - /* Fall-through (get the actual request status) */
> - }
> + __viommu_sync_req(viommu);
>  
>   ret = viommu_get_req_errno(buf, len);
>  out_unlock:
> -- 
> 1.8.3.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/6] Define EINVAL as device/domain incompatibility

2022-09-21 Thread Jean-Philippe Brucker

On Wed, Sep 21, 2022 at 01:22:31AM -0700, Nicolin Chen wrote:
> This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
> https://lore.kernel.org/kvm/yxnt9uqtmbqul...@8bytes.org/
> 
> The purpose is to regulate all existing ->attach_dev callback functions to
> use EINVAL exclusively for an incompatibility error between a device and a
> domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
> try a different domain with the same device.
[...]
>  drivers/iommu/amd/iommu.c   | 12 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 +---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  3 --
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c |  7 +--
>  drivers/iommu/fsl_pamu.c|  2 +-
>  drivers/iommu/fsl_pamu_domain.c |  4 +-
>  drivers/iommu/intel/iommu.c | 10 ++--
>  drivers/iommu/intel/pasid.c |  6 ++-
>  drivers/iommu/iommu.c   | 22 
>  drivers/iommu/ipmmu-vmsa.c  |  2 -
>  drivers/iommu/msm_iommu.c   | 59 +++--
>  drivers/iommu/mtk_iommu.c   |  4 +-
>  drivers/iommu/omap-iommu.c  |  6 +--
>  drivers/iommu/sprd-iommu.c  |  4 +-
>  drivers/iommu/tegra-gart.c  |  2 +-
>  drivers/iommu/virtio-iommu.c    |  7 ++-

For virtio-iommu:

Reviewed-by: Jean-Philippe Brucker 

>  include/linux/iommu.h   | 12 +
>  17 files changed, 90 insertions(+), 83 deletions(-)
> 
> -- 
> 2.17.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] dt-bindings: virtio: Convert virtio,pci-iommu to DT schema

2022-09-16 Thread Jean-Philippe Brucker

Convert the binding that describes the virtio-pci based IOMMU to DT
schema. Change the compatible string to "pci,", which is
defined by the PCI Bus Binding, but keep "virtio,pci-iommu" as an option
for backward compatibility.

Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/virtio/iommu.txt  | 66 --
 .../devicetree/bindings/virtio/iommu.yaml | 86 +++
 2 files changed, 86 insertions(+), 66 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.yaml

diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
b/Documentation/devicetree/bindings/virtio/iommu.txt
deleted file mode 100644
index 2407fea0651c..
--- a/Documentation/devicetree/bindings/virtio/iommu.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-* virtio IOMMU PCI device
-
-When virtio-iommu uses the PCI transport, its programming interface is
-discovered dynamically by the PCI probing infrastructure. However the
-device tree statically describes the relation between IOMMU and DMA
-masters. Therefore, the PCI root complex that hosts the virtio-iommu
-contains a child node representing the IOMMU device explicitly.
-
-Required properties:
-
-- compatible:  Should be "virtio,pci-iommu"
-- reg: PCI address of the IOMMU. As defined in the PCI Bus
-   Binding reference [1], the reg property is a five-cell
-   address encoded as (phys.hi phys.mid phys.lo size.hi
-   size.lo). phys.hi should contain the device's BDF as
-   0b  dfff . The other cells
-   should be zero.
-- #iommu-cells:Each platform DMA master managed by the IOMMU is 
assigned
-   an endpoint ID, described by the "iommus" property [2].
-   For virtio-iommu, #iommu-cells must be 1.
-
-Notes:
-
-- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
-  virtio-iommu node doesn't have an "iommus" property, and is omitted from
-  the iommu-map property of the root complex.
-
-Example:
-
-pcie@1000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-
-   /* The IOMMU programming interface uses slot 00:01.0 */
-   iommu0: iommu@0008 {
-   compatible = "virtio,pci-iommu";
-   reg = <0x0800 0 0 0 0>;
-   #iommu-cells = <1>;
-   };
-
-   /*
-* The IOMMU manages all functions in this PCI domain except
-* itself. Omit BDF 00:01.0.
-*/
-   iommu-map = <0x0  0x0 0x8>
-   <0x9  0x9 0xfff7>;
-};
-
-pcie@2000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-   /*
-* The IOMMU also manages all functions from this domain,
-* with endpoint IDs 0x1 - 0x1
-*/
-   iommu-map = <0x0  0x1 0x1>;
-};
-
-ethernet@fe001000 {
-   ...
-   /* The IOMMU manages this platform device with endpoint ID 0x2 */
-   iommus = < 0x2>;
-};
-
-[1] Documentation/devicetree/bindings/pci/pci.txt
-[2] Documentation/devicetree/bindings/iommu/iommu.txt
diff --git a/Documentation/devicetree/bindings/virtio/iommu.yaml 
b/Documentation/devicetree/bindings/virtio/iommu.yaml
new file mode 100644
index ..d5bbb8ab9603
--- /dev/null
+++ b/Documentation/devicetree/bindings/virtio/iommu.yaml
@@ -0,0 +1,86 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/virtio/iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: virtio-iommu device using the virtio-pci transport
+
+maintainers:
+  - Jean-Philippe Brucker 
+
+description: |
+  When virtio-iommu uses the PCI transport, its programming interface is
+  discovered dynamically by the PCI probing infrastructure. However the
+  device tree statically describes the relation between IOMMU and DMA
+  masters. Therefore, the PCI root complex that hosts the virtio-iommu
+  contains a child node representing the IOMMU device explicitly.
+
+  DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
+  virtio-iommu node doesn't have an "iommus" property, and is omitted from
+  the iommu-map property of the root complex.
+
+properties:
+  # If compatible is present, it should contain the vendor and device ID
+  # according to the PCI Bus Binding specification. Since PCI provides
+  # built-in identification methods, compatible is not actually required.
+  compatible:
+oneOf:
+  - items:
+  - const: virtio,pci-iommu
+  - const: pci1af4,1057
+  - items:
+  - const: pci1af4,1057
+
+  reg:
+description: |
+  PCI address of the IOMMU. As defined in the PCI Bus Binding
+  reference, the reg property is a five-cell address encoded as (phys.hi
+

Re: [PATCH 4/5] iommu: Regulate errno in ->attach_dev callback functions

2022-09-14 Thread Jean-Philippe Brucker

On Wed, Sep 14, 2022 at 06:11:06AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 13, 2022 at 01:27:03PM +0100, Jean-Philippe Brucker wrote:
> > I think in the future it will be too easy to forget about the constrained
> > return value of attach() while modifying some other part of the driver,
> > and let an external helper return EINVAL. So I'd rather not propagate ret
> > from outside of viommu_domain_attach() and finalise().
> 
> Fortunately, if -EINVAL is wrongly returned it only creates an
> inefficiency, not a functional problem. So we do not need to be
> precise here.

Ah fair. In that case the attach_dev() documentation should indicate that
EINVAL is a hint, so that callers don't rely on it (currently words "must"
and "exclusively" indicate that returning EINVAL for anything other than
device-domain incompatibility is unacceptable). The virtio-iommu
implementation may well return EINVAL from the virtio stack or from the
host response.

Thanks,
Jean

> 
> > Since we can't guarantee that APIs like virtio or ida won't ever return
> > EINVAL, we should set all return values:
> 
> I dislike this alot, it squashes all return codes to try to optimize
> an obscure failure path :(
> 
> Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] iommu: Regulate errno in ->attach_dev callback functions

2022-09-13 Thread Jean-Philippe Brucker

Hi Nicolin,

On Tue, Sep 13, 2022 at 01:24:47AM -0700, Nicolin Chen wrote:
> Following the new rules in include/linux/iommu.h kdocs, update all drivers
> ->attach_dev callback functions to return ENODEV error code for all device
> specific errors. It particularly excludes EINVAL from being used for such
> error cases. For the same purpose, also replace one EINVAL with ENOMEM in
> mtk_iommu driver.
> 
> Note that the virtio-iommu does a viommu_domain_map_identity() call, which
> returns either 0 or ENOMEM at this moment. Change to "return ret" directly
> to allow it to pass an EINVAL in the future.
[...]
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 80151176ba12..874c01634d2b 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -696,7 +696,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
> *vdev,
>   if (ret) {
>   ida_free(>domain_ids, vdomain->id);
>   vdomain->viommu = NULL;
> - return -EOPNOTSUPP;
> + return ret;

I think in the future it will be too easy to forget about the constrained
return value of attach() while modifying some other part of the driver,
and let an external helper return EINVAL. So I'd rather not propagate ret
from outside of viommu_domain_attach() and finalise().

For the same reason I do prefer this solution over EMEDIUMTYPE, because
it's too tempting to use exotic errno when they seem appropriate instead
of boring ENODEV and EINVAL. The alternative would be adding a special
purpose code to linux/errno.h, similarly to EPROBE_DEFER, but that might
be excessive.

Since we can't guarantee that APIs like virtio or ida won't ever return
EINVAL, we should set all return values:

--- 8< ---
>From 7b16796cb78d11971236f98fd2d3cd73ca769827 Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker 
Date: Tue, 13 Sep 2022 12:53:02 +0100
Subject: [PATCH] iommu/virtio: Constrain return value of viommu_attach_dev()

Ensure viommu_attach_dev() only return errno values expected from the
attach_dev() op. In particular, only return EINVAL when we're sure that
the device is incompatible with the domain.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 08eeafc9529f..582ff5a33b52 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -669,13 +669,13 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
dev_err(vdev->dev,
"granule 0x%lx larger than system page size 0x%lx\n",
viommu_page_size, PAGE_SIZE);
-   return -EINVAL;
+   return -ENODEV;
}
 
ret = ida_alloc_range(>domain_ids, viommu->first_domain,
  viommu->last_domain, GFP_KERNEL);
if (ret < 0)
-   return ret;
+   return -ENOMEM;
 
vdomain->id = (unsigned int)ret;
 
@@ -696,7 +696,7 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
if (ret) {
ida_free(>domain_ids, vdomain->id);
vdomain->viommu = NULL;
-   return -EOPNOTSUPP;
+   return -ENODEV;
}
}
 
@@ -734,7 +734,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
ret = viommu_domain_finalise(vdev, domain);
} else if (vdomain->viommu != vdev->viommu) {
dev_err(dev, "cannot attach to foreign vIOMMU\n");
-   ret = -EXDEV;
+   ret = -EINVAL;
}
mutex_unlock(>mutex);
 
@@ -769,7 +769,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
 
ret = viommu_send_req_sync(vdomain->viommu, , sizeof(req));
if (ret)
-   return ret;
+   return -ENODEV;
}
 
if (!vdomain->nr_endpoints) {
@@ -779,7 +779,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
 */
ret = viommu_replay_mappings(vdomain);
if (ret)
-   return ret;
+   return -ENODEV;
}
 
vdomain->nr_endpoints++;
-- 
2.37.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Fix compile error with viommu_capable()

2022-09-07 Thread Jean-Philippe Brucker

On Wed, Sep 07, 2022 at 05:11:54PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> A recent fix introduced viommu_capable() but other changes
> from Robin change the function signature of the call-back it
> is used for.
> 
> When both changes are merged a compile error will happen
> because the function pointer types mismatch. Fix that by
> updating the viommu_capable() signature after the merge.
> 
> Cc: Jean-Philippe Brucker 
> Cc: Robin Murphy 
> Signed-off-by: Joerg Roedel 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  drivers/iommu/virtio-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index da463db9f12a..1b12825e2df1 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -1005,7 +1005,7 @@ static int viommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
>  
> -static bool viommu_capable(enum iommu_cap cap)
> +static bool viommu_capable(struct device *dev, enum iommu_cap cap)
>  {
>   switch (cap) {
>   case IOMMU_CAP_CACHE_COHERENCY:
> -- 
> 2.36.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3] iommu/virtio: Fix interaction with VFIO

2022-08-25 Thread Jean-Philippe Brucker

Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.

The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.

We can distinguish different cases of non-coherent DMA:

(1) When accesses from a hardware endpoint are not coherent. The host
would describe such a device using firmware methods ('dma-coherent'
in device-tree, '_CCA' in ACPI), since they are also needed without
a vIOMMU. In this case mappings are created without IOMMU_CACHE.
virtio-iommu doesn't need any additional support. It sends the same
requests as for coherent devices.

(2) When the physical IOMMU supports non-cacheable mappings. Supporting
those would require a new feature in virtio-iommu, new PROBE request
property and MAP flags. Device drivers would use a new API to
discover this since it depends on the architecture and the physical
IOMMU.

(3) When the hardware supports PCIe no-snoop. It is possible for
assigned PCIe devices to issue no-snoop transactions, and the
virtio-iommu specification is lacking any mention of this.

Arm platforms don't necessarily support no-snoop, and those that do
cannot enforce coherency of no-snoop transactions. Device drivers
must be careful about assuming that no-snoop transactions won't end
up cached; see commit e02f5c1bb228 ("drm: disable uncached DMA
optimization for ARM and arm64"). On x86 platforms, the host may or
may not enforce coherency of no-snoop transactions with the physical
IOMMU. But according to the above commit, on x86 a driver which
assumes that no-snoop DMA is compatible with uncached CPU mappings
will also work if the host enforces coherency.

Although these issues are not specific to virtio-iommu, it could be
used to facilitate discovery and configuration of no-snoop. This
would require a new feature bit, PROBE property and ATTACH/MAP
flags.

Cc: sta...@vger.kernel.org
Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker 
---
Since v2 [1], I tried to refine the commit message.
This fix is needed for v5.19 and v6.0.

I can improve the check once Robin's change [2] is merged:
capable(IOMMU_CAP_CACHE_COHERENCY) could return dev->dma_coherent for
case (1) above.

[1] 
https://lore.kernel.org/linux-iommu/20220818163801.1011548-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-iommu/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.mur...@arm.com/
---
 drivers/iommu/virtio-iommu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 08eeafc9529f..80151176ba12 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
 }
 
+static bool viommu_capable(enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.probe_device   = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,
-- 
2.37.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] iommu/virtio: Fix interaction with VFIO

2022-08-19 Thread Jean-Philippe Brucker

On Thu, Aug 18, 2022 at 09:10:25PM +0100, Robin Murphy wrote:
> On 2022-08-18 17:38, Jean-Philippe Brucker wrote:
> > Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
> > coherence") requires IOMMU drivers to advertise
> > IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
> > not provide to userspace the ability to maintain coherency through cache
> > invalidations, it requires hardware coherency. Advertise the capability
> > in order to restore VFIO support.
> > 
> > The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
> > enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
> > While virtio-iommu cannot enforce coherency (of PCIe no-snoop
> > transactions), it does support IOMMU_CACHE.
> > 
> > Non-coherent accesses are not currently a concern for virtio-iommu
> > because host OSes only assign coherent devices,
> 
> Is that guaranteed though? I see nothing in VFIO checking *device*
> coherency, only that the *IOMMU* can impose it via this capability, which
> would form a very circular argument.

Yes the wording is wrong here, more like "host OSes only assign devices
whose accesses are coherent". And it's not guaranteed, just I'm still
looking for a realistic counter-example. I guess a good indicator would be
a VMM that presents a device without 'dma-coherent'.

> We can no longer say that in practice
> nobody has a VFIO-capable IOMMU in front of non-coherent PCI, now that
> Rockchip RK3588 boards are about to start shipping (at best we can only say
> that they still won't have the SMMUs in the DT until I've finished ripping
> up the bus ops).

Ah, I was hoping that vfio-pci should only be concerned about no-snoop. Do
you know if your series [2] ensures that the SMMU driver doesn't report
IOMMU_CAP_CACHE_COHERENCY for that system?

> 
> > and the guest does not
> > enable PCIe no-snoop. Nevertheless, we can summarize here the possible
> > support for non-coherent DMA:
> > 
> > (1) When accesses from a hardware endpoint are not coherent. The host
> >  would describe such a device using firmware methods ('dma-coherent'
> >  in device-tree, '_CCA' in ACPI), since they are also needed without
> >  a vIOMMU. In this case mappings are created without IOMMU_CACHE.
> >  virtio-iommu doesn't need any additional support. It sends the same
> >  requests as for coherent devices.
> > 
> > (2) When the physical IOMMU supports non-cacheable mappings. Supporting
> >  those would require a new feature in virtio-iommu, new PROBE request
> >  property and MAP flags. Device drivers would use a new API to
> >  discover this since it depends on the architecture and the physical
> >  IOMMU.
> > 
> > (3) When the hardware supports PCIe no-snoop. Some architecture do not
> >  support this either (whether no-snoop is supported by an Arm system
> >  is not discoverable by software). If Linux did enable no-snoop in
> >  endpoints on x86, then virtio-iommu would need additional feature,
> >  PROBE property, ATTACH and/or MAP flags to support enforcing snoop.
> 
> That's not an "if" - various Linux drivers *do* use no-snoop, which IIUC is
> the main reason for VFIO wanting to enforce this in the first place. For
> example, see the big fat comment in drm_arch_can_wc_memory() if you've
> forgotten the fun we had with AMD GPUs in the TX2 boxes back in the day ;)

Ah duh, I missed that PCI_EXP_DEVCTL_NOSNOOP_EN defaults to 1, of course
it does. So I think VFIO should clear it on Arm and make it read-only,
since the SMMU can't force-snoop like on x86. I'd be tempted to do that if
CONFIG_ARM{,64} is enabled, but checking a new IOMMU capability may be
cleaner.

Thanks,
Jean

> 
> This is what I was getting at in reply to v1, it's really not a "this is
> fine as things stand" kind of patch, it's a "this is the best we can do to
> be less wrong for expected usage, but still definitely not right".
> Admittedly I downplayed that a little in [2] by deliberately avoiding all
> mention of no-snoop, but only because that's such a horrific unsolvable mess
> it's hardly worth the pain of bringing up...
> 
> Cheers,
> Robin.
> 
> > Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache 
> > coherence")
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> > 
> > Since v1 [1], I added some details to the commit message. This fix is
> > still needed for v5.19 and v6.0.
> > 
> > I can improve the check once Robin's change [2] is merged:
> > capable(IOMMU_CAP_CACHE_COHERENCY) could return d

[PATCH v2] iommu/virtio: Fix interaction with VFIO

2022-08-18 Thread Jean-Philippe Brucker

Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.

The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.

Non-coherent accesses are not currently a concern for virtio-iommu
because host OSes only assign coherent devices, and the guest does not
enable PCIe no-snoop. Nevertheless, we can summarize here the possible
support for non-coherent DMA:

(1) When accesses from a hardware endpoint are not coherent. The host
would describe such a device using firmware methods ('dma-coherent'
in device-tree, '_CCA' in ACPI), since they are also needed without
a vIOMMU. In this case mappings are created without IOMMU_CACHE.
virtio-iommu doesn't need any additional support. It sends the same
requests as for coherent devices.

(2) When the physical IOMMU supports non-cacheable mappings. Supporting
those would require a new feature in virtio-iommu, new PROBE request
property and MAP flags. Device drivers would use a new API to
discover this since it depends on the architecture and the physical
IOMMU.

(3) When the hardware supports PCIe no-snoop. Some architecture do not
support this either (whether no-snoop is supported by an Arm system
is not discoverable by software). If Linux did enable no-snoop in
endpoints on x86, then virtio-iommu would need additional feature,
PROBE property, ATTACH and/or MAP flags to support enforcing snoop.

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker 
---

Since v1 [1], I added some details to the commit message. This fix is
still needed for v5.19 and v6.0.

I can improve the check once Robin's change [2] is merged:
capable(IOMMU_CAP_CACHE_COHERENCY) could return dev->dma_coherent for
case (1) above.

[1] 
https://lore.kernel.org/linux-iommu/20220714111059.708735-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-iommu/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.mur...@arm.com/
---
 drivers/iommu/virtio-iommu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 08eeafc9529f..80151176ba12 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
 }
 
+static bool viommu_capable(enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.probe_device   = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,
-- 
2.37.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 1/1] iommu/virtio: Do not dereference fwnode in struct device

2022-08-09 Thread Jean-Philippe Brucker

On Mon, Aug 01, 2022 at 07:51:42PM +0300, Andy Shevchenko wrote:
> In order to make the underneath API easier to change in the future,
> prevent users from dereferencing fwnode from struct device.
> Instead, use the specific device_match_fwnode() API for that.
> 
> Signed-off-by: Andy Shevchenko 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  drivers/iommu/virtio-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 08eeafc9529f..9fe723f55213 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -925,7 +925,7 @@ static struct virtio_driver virtio_iommu_drv;
>  
>  static int viommu_match_node(struct device *dev, const void *data)
>  {
> - return dev->parent->fwnode == data;
> + return device_match_fwnode(dev->parent, data);
>  }
>  
>  static struct viommu_dev *viommu_get_by_fwnode(struct fwnode_handle *fwnode)
> -- 
> 2.35.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Advertise IOMMU_CAP_CACHE_COHERENCY

2022-07-22 Thread Jean-Philippe Brucker

On Thu, Jul 14, 2022 at 02:39:32PM +0100, Robin Murphy wrote:
> > In the meantime we do need to restore VFIO support under virtio-iommu,
> > since userspace still expects that to work, and the existing use-cases are
> > coherent devices.
> 
> Yeah, I'm not necessarily against adding this as a horrible bodge for now -
> the reality is that people using VFIO must be doing it on coherent systems
> or it wouldn't be working properly anyway - as long as we all agree that
> that's what it is.
> 
> Next cycle I'll be sending the follow-up patches to bring
> device_iommu_capable() to its final form (hoping the outstanding VDPA patch
> lands in the meantime), at which point we get to sort-of-fix the SMMU
> drivers[1], and can do something similar here too. I guess the main question
> for virtio-iommu is whether it needs to be described/negotiated in the
> protocol itself, or can be reliably described by other standard firmware
> properties (with maybe just a spec not to clarify that coherency must be
> consistent).

What consumers of IOMMU_CAP_CACHE_COHERENCY now want to know, is whether
coherency is managed in HW for one particular endpoint, or if they need to
issue cache maintenance. The latter cannot be handled by VFIO since cache
maintenance is generally privileged.

So I had to list several possibilities regarding non-coherent accesses.
I don't think we need a spec change.

A. Accesses through physical IOMMU are never coherent
-

In this case, translated accesses from the physical device can't access
memory coherently. The host would describe it using existing FW methods
(dma-coherent in DT, _CCA in ACPI) since it's also needed without a
vIOMMU.

No change needed for virtio-iommu, I think, it can support non-coherent
devices. It can also support mixing coherent and non-coherent devices in
the same domain, because domains just allow to multiplex map requests at
the moment. Since we allow sending the same map request onto two different
domains, one with coherent devices and one with non-coherent ones, then we
can also allow using a single domain for that. If the host cannot handle
this, it is allowed to reject attach requests for incompatible devices.

In Linux I think compatible() should include dev->dma_coherent after your
change, or the callers should check dev->dma_coherent themselves
(vfio-platform in particular)

B. Non-cacheable mappings
-

Here, accesses are normally coherent but the pIOMMU mappings may be
configured to be non-coherent (non-cacheable access type on Arm). If there
is an actual need for this, we could add a feature bit, a probe request
property and a map flag.

In Linux we may want to disallow !IOMMU_CACHE if the device is coherent,
since we don't support this case.

C. PCIe devices performing no-snoop accesses

Accesses are normally coherent but the device may set a transaction bit
requesting the transaction to be non-coherent.

A guest can't enable and use no-snoop in a PCIe device without knowing
whether the system supports it. It's not discoverable on Arm, so a guest
can't use it. On x86 I think it's always supported but the pIOMMU may
enforce snoop (and the guest may be unable to perform cache maintenance?
I didn't follow the whole wbinvd discussions for lack of time).

The problem is the same without a vIOMMU, so I'd defer that to some
firmware method describing no-snoop.

D. Non-coherent virtio-iommu

Non-coherent virtqueues. It's not forbidden by the spec, and a transport
driver could support it, but it's a transport problem and virtio-iommu
doesn't need to know about it.

Did I forget anything?  Otherwise I don't think we need any spec change at
the moment. But when adding support for page tables, we'll have to
consider each of these cases since the guest will be able to set memory
attributes and will care about page walks coherency. That will be bundled
in a probe request along with the other page table information.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Advertise IOMMU_CAP_CACHE_COHERENCY

2022-07-14 Thread Jean-Philippe Brucker

On Thu, Jul 14, 2022 at 01:01:37PM +0100, Robin Murphy wrote:
> On 2022-07-14 12:11, Jean-Philippe Brucker wrote:
> > Fix virtio-iommu interaction with VFIO, as VFIO now requires
> > IOMMU_CAP_CACHE_COHERENCY. virtio-iommu does not support non-cacheable
> > mappings, and always expects to be called with IOMMU_CACHE.
> 
> Can we know this is actually true though? What if the virtio-iommu
> implementation is backed by something other than VFIO, and the underlying
> hardware isn't coherent? AFAICS the spec doesn't disallow that.

Right, I should add a note about that. If someone does actually want to
support non-coherent device, I assume we'll add a per-device property, a
'non-cacheable' mapping flag, and IOMMU_CAP_CACHE_COHERENCY will hold.
I'm also planning to add a check on (IOMMU_CACHE && !IOMMU_NOEXEC) in
viommu_map(), but not as a fix.

In the meantime we do need to restore VFIO support under virtio-iommu,
since userspace still expects that to work, and the existing use-cases are
coherent devices.

Thanks,
Jean

> 
> Thanks,
> Robin.
> 
> > Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache 
> > coherence")
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >   drivers/iommu/virtio-iommu.c | 11 +++
> >   1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > index 25be4b822aa0..bf340d779c10 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, 
> > struct of_phandle_args *args)
> > return iommu_fwspec_add_ids(dev, args->args, 1);
> >   }
> > +static bool viommu_capable(enum iommu_cap cap)
> > +{
> > +   switch (cap) {
> > +   case IOMMU_CAP_CACHE_COHERENCY:
> > +   return true;
> > +   default:
> > +   return false;
> > +   }
> > +}
> > +
> >   static struct iommu_ops viommu_ops = {
> > +   .capable= viommu_capable,
> > .domain_alloc   = viommu_domain_alloc,
> > .probe_device   = viommu_probe_device,
> > .probe_finalize = viommu_probe_finalize,
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] iommu/virtio: Advertise IOMMU_CAP_CACHE_COHERENCY

2022-07-14 Thread Jean-Philippe Brucker

Fix virtio-iommu interaction with VFIO, as VFIO now requires
IOMMU_CAP_CACHE_COHERENCY. virtio-iommu does not support non-cacheable
mappings, and always expects to be called with IOMMU_CACHE.

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 25be4b822aa0..bf340d779c10 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
 }
 
+static bool viommu_capable(enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.probe_device   = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,
-- 
2.36.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [GIT PULL] virtio,vdpa,qemu_fw_cfg: features, cleanups, fixes

2022-01-14 Thread Jean-Philippe Brucker

Hi,

On Fri, Jan 14, 2022 at 03:35:15PM -0500, Michael S. Tsirkin wrote:
> Jean-Philippe Brucker (5):
>   iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
>   iommu/virtio: Support bypass domains
>   iommu/virtio: Sort reserved regions
>   iommu/virtio: Pass end address to viommu_add_mapping()
>   iommu/virtio: Support identity-mapped domains

Please could you drop these patches, they are from an old version of the
series. The newer version was already in Joerg's pull request and was
merged, so this will conflict.

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 5/5] iommu/virtio: Support identity-mapped domains

2021-12-01 Thread Jean-Philippe Brucker

Support identity domains for devices that do not offer the
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, by creating 1:1 mappings between
the virtual and physical address space. Identity domains created this
way still perform noticeably better than DMA domains, because they don't
have the overhead of setting up and tearing down mappings at runtime.
The performance difference between this and bypass is minimal in
comparison.

It does not matter that the physical addresses in the identity mappings
do not all correspond to memory. By enabling passthrough we are trusting
the device driver and the device itself to only perform DMA to suitable
locations. In some cases it may even be desirable to perform DMA to MMIO
regions.

Reviewed-by: Eric Auger 
Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 61 +---
 1 file changed, 57 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 2fa370c2659c..6a8a52b4297b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -375,6 +375,55 @@ static size_t viommu_del_mappings(struct viommu_domain 
*vdomain,
return unmapped;
 }
 
+/*
+ * Fill the domain with identity mappings, skipping the device's reserved
+ * regions.
+ */
+static int viommu_domain_map_identity(struct viommu_endpoint *vdev,
+ struct viommu_domain *vdomain)
+{
+   int ret;
+   struct iommu_resv_region *resv;
+   u64 iova = vdomain->domain.geometry.aperture_start;
+   u64 limit = vdomain->domain.geometry.aperture_end;
+   u32 flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
+   unsigned long granule = 1UL << __ffs(vdomain->domain.pgsize_bitmap);
+
+   iova = ALIGN(iova, granule);
+   limit = ALIGN_DOWN(limit + 1, granule) - 1;
+
+   list_for_each_entry(resv, >resv_regions, list) {
+   u64 resv_start = ALIGN_DOWN(resv->start, granule);
+   u64 resv_end = ALIGN(resv->start + resv->length, granule) - 1;
+
+   if (resv_end < iova || resv_start > limit)
+   /* No overlap */
+   continue;
+
+   if (resv_start > iova) {
+   ret = viommu_add_mapping(vdomain, iova, resv_start - 1,
+(phys_addr_t)iova, flags);
+   if (ret)
+   goto err_unmap;
+   }
+
+   if (resv_end >= limit)
+   return 0;
+
+   iova = resv_end + 1;
+   }
+
+   ret = viommu_add_mapping(vdomain, iova, limit, (phys_addr_t)iova,
+flags);
+   if (ret)
+   goto err_unmap;
+   return 0;
+
+err_unmap:
+   viommu_del_mappings(vdomain, 0, iova);
+   return ret;
+}
+
 /*
  * viommu_replay_mappings - re-send MAP requests
  *
@@ -637,14 +686,18 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->viommu = viommu;
 
if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-   if (!virtio_has_feature(viommu->vdev,
-   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   if (virtio_has_feature(viommu->vdev,
+  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   vdomain->bypass = true;
+   return 0;
+   }
+
+   ret = viommu_domain_map_identity(vdev, vdomain);
+   if (ret) {
ida_free(>domain_ids, vdomain->id);
vdomain->viommu = NULL;
return -EOPNOTSUPP;
}
-
-   vdomain->bypass = true;
}
 
return 0;
-- 
2.34.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 3/5] iommu/virtio: Sort reserved regions

2021-12-01 Thread Jean-Philippe Brucker

To ease identity mapping support, keep the list of reserved regions
sorted.

Reviewed-by: Eric Auger 
Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 14dfee76fd19..1b3c1f2741c6 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -423,7 +423,7 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
size_t size;
u64 start64, end64;
phys_addr_t start, end;
-   struct iommu_resv_region *region = NULL;
+   struct iommu_resv_region *region = NULL, *next;
unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
start = start64 = le64_to_cpu(mem->start);
@@ -454,7 +454,12 @@ static int viommu_add_resv_mem(struct viommu_endpoint 
*vdev,
if (!region)
return -ENOMEM;
 
-   list_add(>list, >resv_regions);
+   /* Keep the list sorted */
+   list_for_each_entry(next, >resv_regions, list) {
+   if (next->start > region->start)
+   break;
+   }
+   list_add_tail(>list, >list);
return 0;
 }
 
-- 
2.34.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 2/5] iommu/virtio: Support bypass domains

2021-12-01 Thread Jean-Philippe Brucker

The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the ATTACH
request, that creates a bypass domain. Use it to enable identity
domains.

When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device, we
currently fail attaching to an identity domain. Future patches will
instead create identity mappings in this case.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 80930ce04a16..14dfee76fd19 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -71,6 +71,7 @@ struct viommu_domain {
struct rb_root_cached   mappings;
 
unsigned long   nr_endpoints;
+   boolbypass;
 };
 
 struct viommu_endpoint {
@@ -587,7 +588,9 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
 {
struct viommu_domain *vdomain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   if (type != IOMMU_DOMAIN_UNMANAGED &&
+   type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
@@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->map_flags  = viommu->map_flags;
vdomain->viommu = viommu;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+   if (!virtio_has_feature(viommu->vdev,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   ida_free(>domain_ids, vdomain->id);
+   vdomain->viommu = NULL;
+   return -EOPNOTSUPP;
+   }
+
+   vdomain->bypass = true;
+   }
+
return 0;
 }
 
@@ -691,6 +705,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
.domain = cpu_to_le32(vdomain->id),
};
 
+   if (vdomain->bypass)
+   req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
+
for (i = 0; i < fwspec->num_ids; i++) {
req.endpoint = cpu_to_le32(fwspec->ids[i]);
 
@@ -1132,6 +1149,7 @@ static unsigned int features[] = {
VIRTIO_IOMMU_F_DOMAIN_RANGE,
VIRTIO_IOMMU_F_PROBE,
VIRTIO_IOMMU_F_MMIO,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG,
 };
 
 static struct virtio_device_id id_table[] = {
-- 
2.34.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 1/5] iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG

2021-12-01 Thread Jean-Philippe Brucker

Add definitions for the VIRTIO_IOMMU_F_BYPASS_CONFIG, which supersedes
VIRTIO_IOMMU_F_BYPASS.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 include/uapi/linux/virtio_iommu.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..1ff357f0d72e 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS  3
 #define VIRTIO_IOMMU_F_PROBE   4
 #define VIRTIO_IOMMU_F_MMIO5
+#define VIRTIO_IOMMU_F_BYPASS_CONFIG   6
 
 struct virtio_iommu_range_64 {
__le64  start;
@@ -36,6 +37,8 @@ struct virtio_iommu_config {
struct virtio_iommu_range_32domain_range;
/* Probe buffer size */
__le32  probe_size;
+   __u8bypass;
+   __u8reserved[3];
 };
 
 /* Request types */
@@ -66,11 +69,14 @@ struct virtio_iommu_req_tail {
__u8reserved[3];
 };
 
+#define VIRTIO_IOMMU_ATTACH_F_BYPASS   (1 << 0)
+
 struct virtio_iommu_req_attach {
struct virtio_iommu_req_headhead;
__le32  domain;
__le32  endpoint;
-   __u8reserved[8];
+   __le32  flags;
+   __u8reserved[4];
struct virtio_iommu_req_tailtail;
 };
 
-- 
2.34.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

2021-12-01 Thread Jean-Philippe Brucker

To support identity mappings, the virtio-iommu driver must be able to
represent full 64-bit ranges internally. Pass (start, end) instead of
(start, size) to viommu_add/del_mapping().

Clean comments. The one about the returned size was never true: when
sweeping the whole address space the returned size will most certainly
be smaller than 2^64.

Reviewed-by: Eric Auger 
Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 1b3c1f2741c6..2fa370c2659c 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -311,8 +311,8 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, 
void *buf,
  *
  * On success, return the new mapping. Otherwise return NULL.
  */
-static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned long 
iova,
- phys_addr_t paddr, size_t size, u32 flags)
+static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, u64 end,
+ phys_addr_t paddr, u32 flags)
 {
unsigned long irqflags;
struct viommu_mapping *mapping;
@@ -323,7 +323,7 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
 
mapping->paddr  = paddr;
mapping->iova.start = iova;
-   mapping->iova.last  = iova + size - 1;
+   mapping->iova.last  = end;
mapping->flags  = flags;
 
spin_lock_irqsave(>mappings_lock, irqflags);
@@ -338,26 +338,24 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
  *
  * @vdomain: the domain
  * @iova: start of the range
- * @size: size of the range. A size of 0 corresponds to the entire address
- * space.
+ * @end: end of the range
  *
- * On success, returns the number of unmapped bytes (>= size)
+ * On success, returns the number of unmapped bytes
  */
 static size_t viommu_del_mappings(struct viommu_domain *vdomain,
- unsigned long iova, size_t size)
+ u64 iova, u64 end)
 {
size_t unmapped = 0;
unsigned long flags;
-   unsigned long last = iova + size - 1;
struct viommu_mapping *mapping = NULL;
struct interval_tree_node *node, *next;
 
spin_lock_irqsave(>mappings_lock, flags);
-   next = interval_tree_iter_first(>mappings, iova, last);
+   next = interval_tree_iter_first(>mappings, iova, end);
while (next) {
node = next;
mapping = container_of(node, struct viommu_mapping, iova);
-   next = interval_tree_iter_next(node, iova, last);
+   next = interval_tree_iter_next(node, iova, end);
 
/* Trying to split a mapping? */
if (mapping->iova.start < iova)
@@ -656,8 +654,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   /* Free all remaining mappings (size 2^64) */
-   viommu_del_mappings(vdomain, 0, 0);
+   /* Free all remaining mappings */
+   viommu_del_mappings(vdomain, 0, ULLONG_MAX);
 
if (vdomain->viommu)
ida_free(>viommu->domain_ids, vdomain->id);
@@ -742,6 +740,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 {
int ret;
u32 flags;
+   u64 end = iova + size - 1;
struct virtio_iommu_req_map map;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -752,7 +751,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
if (flags & ~vdomain->map_flags)
return -EINVAL;
 
-   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
+   ret = viommu_add_mapping(vdomain, iova, end, paddr, flags);
if (ret)
return ret;
 
@@ -761,7 +760,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
.domain = cpu_to_le32(vdomain->id),
.virt_start = cpu_to_le64(iova),
.phys_start = cpu_to_le64(paddr),
-   .virt_end   = cpu_to_le64(iova + size - 1),
+   .virt_end   = cpu_to_le64(end),
.flags  = cpu_to_le32(flags),
};
 
@@ -770,7 +769,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 
ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
if (ret)
-   viommu_del_mappings(vdomain, iova, size);
+   viommu_del_mappings(vdomain, iova, end);
 
return ret;
 }
@@ -783,7 +782,7 @@ static size_t viommu_unmap(struct iommu_domain *domain, 
unsigned long iova,
struct virtio_iommu_req_unmap un

[PATCH v3 0/5] iommu/virtio: Add identity domains

2021-12-01 Thread Jean-Philippe Brucker

Support identity domains, allowing to only enable IOMMU protection for a
subset of endpoints (those assigned to userspace, for example). Users
may enable identity domains at compile time
(CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
runtime (/sys/kernel/iommu_groups/*/type = identity).

Since v2 [1] I fixed the padding in patch 1 and a rebase error in patch
5, reported by Eric.

Patches 1-2 support identity domains using the optional
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, which was accepted into the spec
[2]. Patches 3-5 add a fallback to identity mappings, when the feature
is not supported.

QEMU patches are on my virtio-iommu/bypass branch [3], and depend on the
UAPI update.

[1] 
https://lore.kernel.org/linux-iommu/20211123155301.1047943-1-jean-phili...@linaro.org/
[2] https://github.com/oasis-tcs/virtio-spec/issues/119
[3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass

Jean-Philippe Brucker (5):
  iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
  iommu/virtio: Support bypass domains
  iommu/virtio: Sort reserved regions
  iommu/virtio: Pass end address to viommu_add_mapping()
  iommu/virtio: Support identity-mapped domains

 include/uapi/linux/virtio_iommu.h |   8 ++-
 drivers/iommu/virtio-iommu.c  | 113 +-
 2 files changed, 101 insertions(+), 20 deletions(-)

-- 
2.34.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 5/5] iommu/virtio: Support identity-mapped domains

2021-11-29 Thread Jean-Philippe Brucker

On Sat, Nov 27, 2021 at 06:09:56PM +0100, Eric Auger wrote:
> > -   vdomain->viommu = 0;
> > +   vdomain->viommu = NULL;
> nit: that change could have been done in patch 2

Ah yes, I changed that in v2 but fixed up the wrong patch

> > return -EOPNOTSUPP;
> > }
> > -
> > -   vdomain->bypass = true;
> > }
> >  
> > return 0;
> Besides
> Reviewed-by: Eric Auger 

Thanks!
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

2021-11-29 Thread Jean-Philippe Brucker

On Sat, Nov 27, 2021 at 06:09:56PM -0500, Michael S. Tsirkin wrote:
> > > -static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned 
> > > long iova,
> > > -   phys_addr_t paddr, size_t size, u32 flags)
> > > +static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, 
> > > u64 end,
> > > +   phys_addr_t paddr, u32 flags)
> > >  {
> > >   unsigned long irqflags;
> > >   struct viommu_mapping *mapping;
> 
> I am worried that API changes like that will cause subtle
> bugs since types of arguments change but not their
> number. If we forgot to update some callers it will all be messed up.
> 
> How about passing struct Range instead?

I gave struct range a try but it looks messier overall since it would only
be used to pass arguments. I think the update is safe enough because there
is one caller for viommu_add_mapping() and two for viommu_del_mappings(),
at the moment.

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 2/5] iommu/virtio: Support bypass domains

2021-11-29 Thread Jean-Philippe Brucker

On Sat, Nov 27, 2021 at 05:18:28PM +0100, Eric Auger wrote:
> Hi Jean,
> 
> On 11/23/21 4:52 PM, Jean-Philippe Brucker wrote:
> > The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the ATTACH
> > request, that creates a bypass domain. Use it to enable identity
> > domains.
> >
> > When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device, we
> > currently fail attaching to an identity domain. Future patches will
> > instead create identity mappings in this case.
> >
> > Reviewed-by: Kevin Tian 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/virtio-iommu.c | 20 +++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> > index 80930ce04a16..ee8a7afd667b 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -71,6 +71,7 @@ struct viommu_domain {
> > struct rb_root_cached   mappings;
> >  
> > unsigned long   nr_endpoints;
> > +   boolbypass;
> >  };
> >  
> >  struct viommu_endpoint {
> > @@ -587,7 +588,9 @@ static struct iommu_domain 
> > *viommu_domain_alloc(unsigned type)
> >  {
> > struct viommu_domain *vdomain;
> >  
> > -   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +   if (type != IOMMU_DOMAIN_UNMANAGED &&
> > +   type != IOMMU_DOMAIN_DMA &&
> > +   type != IOMMU_DOMAIN_IDENTITY)
> > return NULL;
> >  
> > vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
> > @@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct 
> > viommu_endpoint *vdev,
> > vdomain->map_flags  = viommu->map_flags;
> > vdomain->viommu = viommu;
> >  
> > +   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
> > +   if (!virtio_has_feature(viommu->vdev,
> nit: couldn't the check be done before the ida_alloc_range(),
> simplifying the failure cleanup?

It could, but patch 5 falls back to identity mappings, which is better
left at the end of the function to keep the error path simple. I put this
at the end already here, so patch 5 doesn't need to move things around.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/5] iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG

2021-11-29 Thread Jean-Philippe Brucker

Hi Eric,

On Sat, Nov 27, 2021 at 08:59:25AM +0100, Eric Auger wrote:
> > @@ -36,6 +37,8 @@ struct virtio_iommu_config {
> > struct virtio_iommu_range_32domain_range;
> > /* Probe buffer size */
> > __le32  probe_size;
> > +   __u8bypass;
> > +   __u8reserved[7];
> in [PATCH v3] virtio-iommu: Rework the bypass feature I see
> 
> +  u8 bypass;
> +  u8 reserved[3];
> 
> What was exactly voted?

Good catch, this should be 3. It brings the config struct to 40 bytes,
which is the size compilers generate when there is no reserved field.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 5/5] iommu/virtio: Support identity-mapped domains

2021-11-23 Thread Jean-Philippe Brucker

Support identity domains for devices that do not offer the
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, by creating 1:1 mappings between
the virtual and physical address space. Identity domains created this
way still perform noticeably better than DMA domains, because they don't
have the overhead of setting up and tearing down mappings at runtime.
The performance difference between this and bypass is minimal in
comparison.

It does not matter that the physical addresses in the identity mappings
do not all correspond to memory. By enabling passthrough we are trusting
the device driver and the device itself to only perform DMA to suitable
locations. In some cases it may even be desirable to perform DMA to MMIO
regions.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 63 +---
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index eceb9281c8c1..6a8a52b4297b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -375,6 +375,55 @@ static size_t viommu_del_mappings(struct viommu_domain 
*vdomain,
return unmapped;
 }
 
+/*
+ * Fill the domain with identity mappings, skipping the device's reserved
+ * regions.
+ */
+static int viommu_domain_map_identity(struct viommu_endpoint *vdev,
+ struct viommu_domain *vdomain)
+{
+   int ret;
+   struct iommu_resv_region *resv;
+   u64 iova = vdomain->domain.geometry.aperture_start;
+   u64 limit = vdomain->domain.geometry.aperture_end;
+   u32 flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
+   unsigned long granule = 1UL << __ffs(vdomain->domain.pgsize_bitmap);
+
+   iova = ALIGN(iova, granule);
+   limit = ALIGN_DOWN(limit + 1, granule) - 1;
+
+   list_for_each_entry(resv, >resv_regions, list) {
+   u64 resv_start = ALIGN_DOWN(resv->start, granule);
+   u64 resv_end = ALIGN(resv->start + resv->length, granule) - 1;
+
+   if (resv_end < iova || resv_start > limit)
+   /* No overlap */
+   continue;
+
+   if (resv_start > iova) {
+   ret = viommu_add_mapping(vdomain, iova, resv_start - 1,
+(phys_addr_t)iova, flags);
+   if (ret)
+   goto err_unmap;
+   }
+
+   if (resv_end >= limit)
+   return 0;
+
+   iova = resv_end + 1;
+   }
+
+   ret = viommu_add_mapping(vdomain, iova, limit, (phys_addr_t)iova,
+flags);
+   if (ret)
+   goto err_unmap;
+   return 0;
+
+err_unmap:
+   viommu_del_mappings(vdomain, 0, iova);
+   return ret;
+}
+
 /*
  * viommu_replay_mappings - re-send MAP requests
  *
@@ -637,14 +686,18 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->viommu = viommu;
 
if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-   if (!virtio_has_feature(viommu->vdev,
-   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   if (virtio_has_feature(viommu->vdev,
+  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   vdomain->bypass = true;
+   return 0;
+   }
+
+   ret = viommu_domain_map_identity(vdev, vdomain);
+   if (ret) {
ida_free(>domain_ids, vdomain->id);
-   vdomain->viommu = 0;
+   vdomain->viommu = NULL;
return -EOPNOTSUPP;
}
-
-   vdomain->bypass = true;
}
 
return 0;
-- 
2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 3/5] iommu/virtio: Sort reserved regions

2021-11-23 Thread Jean-Philippe Brucker

To ease identity mapping support, keep the list of reserved regions
sorted.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ee8a7afd667b..d63ec4d11b00 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -423,7 +423,7 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
size_t size;
u64 start64, end64;
phys_addr_t start, end;
-   struct iommu_resv_region *region = NULL;
+   struct iommu_resv_region *region = NULL, *next;
unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
start = start64 = le64_to_cpu(mem->start);
@@ -454,7 +454,12 @@ static int viommu_add_resv_mem(struct viommu_endpoint 
*vdev,
if (!region)
return -ENOMEM;
 
-   list_add(>list, >resv_regions);
+   /* Keep the list sorted */
+   list_for_each_entry(next, >resv_regions, list) {
+   if (next->start > region->start)
+   break;
+   }
+   list_add_tail(>list, >list);
return 0;
 }
 
-- 
2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

2021-11-23 Thread Jean-Philippe Brucker

To support identity mappings, the virtio-iommu driver must be able to
represent full 64-bit ranges internally. Pass (start, end) instead of
(start, size) to viommu_add/del_mapping().

Clean comments. The one about the returned size was never true: when
sweeping the whole address space the returned size will most certainly
be smaller than 2^64.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index d63ec4d11b00..eceb9281c8c1 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -311,8 +311,8 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, 
void *buf,
  *
  * On success, return the new mapping. Otherwise return NULL.
  */
-static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned long 
iova,
- phys_addr_t paddr, size_t size, u32 flags)
+static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, u64 end,
+ phys_addr_t paddr, u32 flags)
 {
unsigned long irqflags;
struct viommu_mapping *mapping;
@@ -323,7 +323,7 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
 
mapping->paddr  = paddr;
mapping->iova.start = iova;
-   mapping->iova.last  = iova + size - 1;
+   mapping->iova.last  = end;
mapping->flags  = flags;
 
spin_lock_irqsave(>mappings_lock, irqflags);
@@ -338,26 +338,24 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
  *
  * @vdomain: the domain
  * @iova: start of the range
- * @size: size of the range. A size of 0 corresponds to the entire address
- * space.
+ * @end: end of the range
  *
- * On success, returns the number of unmapped bytes (>= size)
+ * On success, returns the number of unmapped bytes
  */
 static size_t viommu_del_mappings(struct viommu_domain *vdomain,
- unsigned long iova, size_t size)
+ u64 iova, u64 end)
 {
size_t unmapped = 0;
unsigned long flags;
-   unsigned long last = iova + size - 1;
struct viommu_mapping *mapping = NULL;
struct interval_tree_node *node, *next;
 
spin_lock_irqsave(>mappings_lock, flags);
-   next = interval_tree_iter_first(>mappings, iova, last);
+   next = interval_tree_iter_first(>mappings, iova, end);
while (next) {
node = next;
mapping = container_of(node, struct viommu_mapping, iova);
-   next = interval_tree_iter_next(node, iova, last);
+   next = interval_tree_iter_next(node, iova, end);
 
/* Trying to split a mapping? */
if (mapping->iova.start < iova)
@@ -656,8 +654,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   /* Free all remaining mappings (size 2^64) */
-   viommu_del_mappings(vdomain, 0, 0);
+   /* Free all remaining mappings */
+   viommu_del_mappings(vdomain, 0, ULLONG_MAX);
 
if (vdomain->viommu)
ida_free(>viommu->domain_ids, vdomain->id);
@@ -742,6 +740,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 {
int ret;
u32 flags;
+   u64 end = iova + size - 1;
struct virtio_iommu_req_map map;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -752,7 +751,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
if (flags & ~vdomain->map_flags)
return -EINVAL;
 
-   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
+   ret = viommu_add_mapping(vdomain, iova, end, paddr, flags);
if (ret)
return ret;
 
@@ -761,7 +760,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
.domain = cpu_to_le32(vdomain->id),
.virt_start = cpu_to_le64(iova),
.phys_start = cpu_to_le64(paddr),
-   .virt_end   = cpu_to_le64(iova + size - 1),
+   .virt_end   = cpu_to_le64(end),
.flags  = cpu_to_le32(flags),
};
 
@@ -770,7 +769,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 
ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
if (ret)
-   viommu_del_mappings(vdomain, iova, size);
+   viommu_del_mappings(vdomain, iova, end);
 
return ret;
 }
@@ -783,7 +782,7 @@ static size_t viommu_unmap(struct iommu_domain *domain, 
unsigned long iova,
struct virtio_iommu_req_unmap unmap;

[PATCH v2 2/5] iommu/virtio: Support bypass domains

2021-11-23 Thread Jean-Philippe Brucker

The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the ATTACH
request, that creates a bypass domain. Use it to enable identity
domains.

When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device, we
currently fail attaching to an identity domain. Future patches will
instead create identity mappings in this case.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 80930ce04a16..ee8a7afd667b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -71,6 +71,7 @@ struct viommu_domain {
struct rb_root_cached   mappings;
 
unsigned long   nr_endpoints;
+   boolbypass;
 };
 
 struct viommu_endpoint {
@@ -587,7 +588,9 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
 {
struct viommu_domain *vdomain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   if (type != IOMMU_DOMAIN_UNMANAGED &&
+   type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
@@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->map_flags  = viommu->map_flags;
vdomain->viommu = viommu;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+   if (!virtio_has_feature(viommu->vdev,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   ida_free(>domain_ids, vdomain->id);
+   vdomain->viommu = 0;
+   return -EOPNOTSUPP;
+   }
+
+   vdomain->bypass = true;
+   }
+
return 0;
 }
 
@@ -691,6 +705,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
.domain = cpu_to_le32(vdomain->id),
};
 
+   if (vdomain->bypass)
+   req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
+
for (i = 0; i < fwspec->num_ids; i++) {
req.endpoint = cpu_to_le32(fwspec->ids[i]);
 
@@ -1132,6 +1149,7 @@ static unsigned int features[] = {
VIRTIO_IOMMU_F_DOMAIN_RANGE,
VIRTIO_IOMMU_F_PROBE,
VIRTIO_IOMMU_F_MMIO,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG,
 };
 
 static struct virtio_device_id id_table[] = {
-- 
2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 1/5] iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG

2021-11-23 Thread Jean-Philippe Brucker

Add definitions for the VIRTIO_IOMMU_F_BYPASS_CONFIG, which supersedes
VIRTIO_IOMMU_F_BYPASS.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 include/uapi/linux/virtio_iommu.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..cafd8cf7febf 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS  3
 #define VIRTIO_IOMMU_F_PROBE   4
 #define VIRTIO_IOMMU_F_MMIO5
+#define VIRTIO_IOMMU_F_BYPASS_CONFIG   6
 
 struct virtio_iommu_range_64 {
__le64  start;
@@ -36,6 +37,8 @@ struct virtio_iommu_config {
struct virtio_iommu_range_32domain_range;
/* Probe buffer size */
__le32  probe_size;
+   __u8bypass;
+   __u8reserved[7];
 };
 
 /* Request types */
@@ -66,11 +69,14 @@ struct virtio_iommu_req_tail {
__u8reserved[3];
 };
 
+#define VIRTIO_IOMMU_ATTACH_F_BYPASS   (1 << 0)
+
 struct virtio_iommu_req_attach {
struct virtio_iommu_req_headhead;
__le32  domain;
__le32  endpoint;
-   __u8reserved[8];
+   __le32  flags;
+   __u8reserved[4];
struct virtio_iommu_req_tailtail;
 };
 
-- 
2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 0/5] iommu/virtio: Add identity domains

2021-11-23 Thread Jean-Philippe Brucker

Support identity domains, allowing to only enable IOMMU protection for a
subset of endpoints (those assigned to userspace, for example). Users
may enable identity domains at compile time
(CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
runtime (/sys/kernel/iommu_groups/*/type = identity).

Since v1 [1] I rebased onto v5.16-rc and added Kevin's review tag.
The specification update for the new feature has now been accepted [2].

Patches 1-2 support identity domains using the optional
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, and patches 3-5 add a fallback to
identity mappings, when the feature is not supported.

QEMU patches are on my virtio-iommu/bypass branch [3], and depend on the
UAPI update.

[1] 
https://lore.kernel.org/linux-iommu/20211013121052.518113-1-jean-phili...@linaro.org/
[2] https://github.com/oasis-tcs/virtio-spec/issues/119
[3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass

Jean-Philippe Brucker (5):
  iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
  iommu/virtio: Support bypass domains
  iommu/virtio: Sort reserved regions
  iommu/virtio: Pass end address to viommu_add_mapping()
  iommu/virtio: Support identity-mapped domains

 include/uapi/linux/virtio_iommu.h |   8 ++-
 drivers/iommu/virtio-iommu.c  | 113 +-
 2 files changed, 101 insertions(+), 20 deletions(-)

-- 
2.33.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-22 Thread Jean-Philippe Brucker

On Fri, Oct 22, 2021 at 06:16:27AM -0400, Michael S. Tsirkin wrote:
> On Wed, Oct 13, 2021 at 01:10:48PM +0100, Jean-Philippe Brucker wrote:
> > Support identity domains, allowing to only enable IOMMU protection for a
> > subset of endpoints (those assigned to userspace, for example). Users
> > may enable identity domains at compile time
> > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
> > runtime (/sys/kernel/iommu_groups/*/type = identity).
> 
> 
> I put this in my branch so it can get testing under linux-next,
> but pls notice if the ballot does not conclude in time
> for the merge window I won't send it to Linus.

Makes sense, thank you. I sent a new version of the spec change with
clarifications
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07969.html

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-19 Thread Jean-Philippe Brucker

On Tue, Oct 19, 2021 at 09:22:13AM +0800, Jason Wang wrote:
> > > So I think clarifying system reset should address your questions.
> > > I believe we should leave bypass sticky across device reset, so a FW->OS
> > > transition, where the OS resets the device, does not open a vulnerability
> > > (if bypass was enabled at boot and then left disabled by FW.)
> > >
> > > It's still a good idea for the OS to restore on shutdown the bypass value
> > > it was booted with. So it can kexec into an OS that doesn't support
> > > virtio-iommu, for example.
> > >
> > > Thanks,
> > > Jean
> >
> > Is this stickiness really important?

It is important when FW has to hand the IOMMU over to the OS while keeping
DMA disabled for all endpoints. For example DMA was globally disabled on
boot through some external mechanism (e.g. Bus Master Enable in PCI
bridges), and FW disables IOMMU bypass before enabling Bus Master, and
there are some untrusted endpoints in the system that should never be
allowed to perform arbitrary DMA. If a side effect of resetting the IOMMU
is to enable bypass, then the OS opens a vulnerability without knowing it.
That's a real problem on hardware platforms, but maybe too far fetched on
virtual ones.

> > Can't this be addressed just by hypervisor disabling bypass at boot?

Yes I suppose we have that option. If we make bypass non-sticky, we're
preventing FW from working around vulnerable device implementations, but
fixing the implementation itself is much easier in virtualization than in
hardware.

> And I'm not sure if sticky can survive transport reset.

I thought "device reset" includes transport reset as well?  There seems to
be a precedent with virtio-mem which keeps state across device reset. And
PCI allows sticky registers across FLR (RWS registers)

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-18 Thread Jean-Philippe Brucker

On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote:
> > From: Jean-Philippe Brucker 
> > Sent: Wednesday, October 13, 2021 8:11 PM
> > 
> > Support identity domains, allowing to only enable IOMMU protection for a
> > subset of endpoints (those assigned to userspace, for example). Users
> > may enable identity domains at compile time
> > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> > (iommu.passthrough=1) or
> > runtime (/sys/kernel/iommu_groups/*/type = identity).
> 
> Do we want to use consistent terms between spec (bypass domain) 
> and code (identity domain)? 

I don't think we have to. Linux uses "identity" domains and "passthrough"
IOMMU. The old virtio-iommu feature was "bypass" so we should keep that
for the new one, to be consistent. And then I've used "bypass" for domains
as well, in the spec, because it would look strange to use a different
term for the same concept. I find that it sort of falls into place: Linux'
identity domains can be implemented either with bypass or identity-mapped
virtio-iommu domains.

> > 
> > Patches 1-2 support identity domains using the optional
> > VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
> > spec, see [1] for the latest proposal.
> > 
> > Patches 3-5 add a fallback to identity mappings, when the feature is not
> > supported.
> > 
> > Note that this series doesn't touch the global bypass bit added by
> > VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU
> > should
> > be attached to a domain, so global bypass isn't in use after endpoints
> 
> I saw a concept of deferred attach in iommu core. See iommu_is_
> attach_deferred(). Currently this is vendor specific and I haven't
> looked into the exact reason why some vendor sets it now. Just
> be curious whether the same reason might be applied to virtio-iommu.
> 
> > are probed. Before that, the global bypass policy is decided by the
> > hypervisor and firmware. So I don't think Linux needs to touch the
> 
> This reminds me one thing. The spec says that the global bypass
> bit is sticky and not affected by reset.

The spec talks about *device* reset, triggered by software writing 0 to
the status register, but it doesn't mention system reset. Would be good to
clarify that. Something like:

If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it MAY
initialize the \field{bypass} field to 1. Field \field{bypass} SHOULD
NOT change on device reset, but SHOULD be restored to its initial
value on system reset.

> This implies that in the case
> of rebooting the VM into a different OS, the previous OS actually
> has the right to override this setting for the next OS. Is it a right
> design? Even the firmware itself is unable to identify the original
> setting enforced by the hypervisor after reboot. I feel the hypervisor
> setting should be recovered after reset since it reflects the 
> security measure enforced by the virtual platform?

So I think clarifying system reset should address your questions.
I believe we should leave bypass sticky across device reset, so a FW->OS
transition, where the OS resets the device, does not open a vulnerability
(if bypass was enabled at boot and then left disabled by FW.)

It's still a good idea for the OS to restore on shutdown the bypass value
it was booted with. So it can kexec into an OS that doesn't support
virtio-iommu, for example.

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio: wrap config->reset calls

2021-10-14 Thread Jean-Philippe Brucker

On Wed, Oct 13, 2021 at 06:55:31AM -0400, Michael S. Tsirkin wrote:
> This will enable cleanups down the road.
> The idea is to disable cbs, then add "flush_queued_cbs" callback
> as a parameter, this way drivers can flush any work
> queued after callbacks have been disabled.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---

>  drivers/iommu/virtio-iommu.c   | 2 +-

Reviewed-by: Jean-Philippe Brucker 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 2/5] iommu/virtio: Support bypass domains

2021-10-13 Thread Jean-Philippe Brucker

The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the ATTACH
request, that creates a bypass domain. Use it to enable identity
domains.

When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device, we
currently fail attaching to an identity domain. Future patches will
instead create identity mappings in this case.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 80930ce04a16..ee8a7afd667b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -71,6 +71,7 @@ struct viommu_domain {
struct rb_root_cached   mappings;
 
unsigned long   nr_endpoints;
+   boolbypass;
 };
 
 struct viommu_endpoint {
@@ -587,7 +588,9 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
 {
struct viommu_domain *vdomain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   if (type != IOMMU_DOMAIN_UNMANAGED &&
+   type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
@@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->map_flags  = viommu->map_flags;
vdomain->viommu = viommu;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+   if (!virtio_has_feature(viommu->vdev,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   ida_free(>domain_ids, vdomain->id);
+   vdomain->viommu = 0;
+   return -EOPNOTSUPP;
+   }
+
+   vdomain->bypass = true;
+   }
+
return 0;
 }
 
@@ -691,6 +705,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
.domain = cpu_to_le32(vdomain->id),
};
 
+   if (vdomain->bypass)
+   req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
+
for (i = 0; i < fwspec->num_ids; i++) {
req.endpoint = cpu_to_le32(fwspec->ids[i]);
 
@@ -1132,6 +1149,7 @@ static unsigned int features[] = {
VIRTIO_IOMMU_F_DOMAIN_RANGE,
VIRTIO_IOMMU_F_PROBE,
VIRTIO_IOMMU_F_MMIO,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG,
 };
 
 static struct virtio_device_id id_table[] = {
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 3/5] iommu/virtio: Sort reserved regions

2021-10-13 Thread Jean-Philippe Brucker

To ease identity mapping support, keep the list of reserved regions
sorted.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ee8a7afd667b..d63ec4d11b00 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -423,7 +423,7 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
size_t size;
u64 start64, end64;
phys_addr_t start, end;
-   struct iommu_resv_region *region = NULL;
+   struct iommu_resv_region *region = NULL, *next;
unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
start = start64 = le64_to_cpu(mem->start);
@@ -454,7 +454,12 @@ static int viommu_add_resv_mem(struct viommu_endpoint 
*vdev,
if (!region)
return -ENOMEM;
 
-   list_add(>list, >resv_regions);
+   /* Keep the list sorted */
+   list_for_each_entry(next, >resv_regions, list) {
+   if (next->start > region->start)
+   break;
+   }
+   list_add_tail(>list, >list);
return 0;
 }
 
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 0/5] iommu/virtio: Add identity domains

2021-10-13 Thread Jean-Philippe Brucker

Support identity domains, allowing to only enable IOMMU protection for a
subset of endpoints (those assigned to userspace, for example). Users
may enable identity domains at compile time
(CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
runtime (/sys/kernel/iommu_groups/*/type = identity).

Patches 1-2 support identity domains using the optional
VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
spec, see [1] for the latest proposal.

Patches 3-5 add a fallback to identity mappings, when the feature is not
supported.

Note that this series doesn't touch the global bypass bit added by
VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU should
be attached to a domain, so global bypass isn't in use after endpoints
are probed. Before that, the global bypass policy is decided by the
hypervisor and firmware. So I don't think Linux needs to touch the
global bypass bit, but there are some patches available on my
virtio-iommu/bypass branch [2] to test it.

QEMU patches are on my virtio-iommu/bypass branch [3] (and the list)

[1] https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07898.html
[2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass
[3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass

Jean-Philippe Brucker (5):
  iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
  iommu/virtio: Support bypass domains
  iommu/virtio: Sort reserved regions
  iommu/virtio: Pass end address to viommu_add_mapping()
  iommu/virtio: Support identity-mapped domains

 include/uapi/linux/virtio_iommu.h |   8 ++-
 drivers/iommu/virtio-iommu.c  | 113 +-
 2 files changed, 101 insertions(+), 20 deletions(-)

-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 5/5] iommu/virtio: Support identity-mapped domains

2021-10-13 Thread Jean-Philippe Brucker

Support identity domains for devices that do not offer the
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, by creating 1:1 mappings between
the virtual and physical address space. Identity domains created this
way still perform noticeably better than DMA domains, because they don't
have the overhead of setting up and tearing down mappings at runtime.
The performance difference between this and bypass is minimal in
comparison.

It does not matter that the physical addresses in the identity mappings
do not all correspond to memory. By enabling passthrough we are trusting
the device driver and the device itself to only perform DMA to suitable
locations. In some cases it may even be desirable to perform DMA to MMIO
regions.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 61 +---
 1 file changed, 57 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index eceb9281c8c1..c9e8367d2962 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -375,6 +375,55 @@ static size_t viommu_del_mappings(struct viommu_domain 
*vdomain,
return unmapped;
 }
 
+/*
+ * Fill the domain with identity mappings, skipping the device's reserved
+ * regions.
+ */
+static int viommu_domain_map_identity(struct viommu_endpoint *vdev,
+ struct viommu_domain *vdomain)
+{
+   int ret;
+   struct iommu_resv_region *resv;
+   u64 iova = vdomain->domain.geometry.aperture_start;
+   u64 limit = vdomain->domain.geometry.aperture_end;
+   u32 flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
+   unsigned long granule = 1UL << __ffs(vdomain->domain.pgsize_bitmap);
+
+   iova = ALIGN(iova, granule);
+   limit = ALIGN_DOWN(limit + 1, granule) - 1;
+
+   list_for_each_entry(resv, >resv_regions, list) {
+   u64 resv_start = ALIGN_DOWN(resv->start, granule);
+   u64 resv_end = ALIGN(resv->start + resv->length, granule) - 1;
+
+   if (resv_end < iova || resv_start > limit)
+   /* No overlap */
+   continue;
+
+   if (resv_start > iova) {
+   ret = viommu_add_mapping(vdomain, iova, resv_start - 1,
+(phys_addr_t)iova, flags);
+   if (ret)
+   goto err_unmap;
+   }
+
+   if (resv_end >= limit)
+   return 0;
+
+   iova = resv_end + 1;
+   }
+
+   ret = viommu_add_mapping(vdomain, iova, limit, (phys_addr_t)iova,
+flags);
+   if (ret)
+   goto err_unmap;
+   return 0;
+
+err_unmap:
+   viommu_del_mappings(vdomain, 0, iova);
+   return ret;
+}
+
 /*
  * viommu_replay_mappings - re-send MAP requests
  *
@@ -637,14 +686,18 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->viommu = viommu;
 
if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-   if (!virtio_has_feature(viommu->vdev,
-   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   if (virtio_has_feature(viommu->vdev,
+  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   vdomain->bypass = true;
+   return 0;
+   }
+
+   ret = viommu_domain_map_identity(vdev, vdomain);
+   if (ret) {
ida_free(>domain_ids, vdomain->id);
vdomain->viommu = 0;
return -EOPNOTSUPP;
}
-
-   vdomain->bypass = true;
}
 
return 0;
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

2021-10-13 Thread Jean-Philippe Brucker

To support identity mappings, the virtio-iommu driver must be able to
represent full 64-bit ranges internally. Pass (start, end) instead of
(start, size) to viommu_add/del_mapping().

Clean comments. The one about the returned size was never true: when
sweeping the whole address space the returned size will most certainly
be smaller than 2^64.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index d63ec4d11b00..eceb9281c8c1 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -311,8 +311,8 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, 
void *buf,
  *
  * On success, return the new mapping. Otherwise return NULL.
  */
-static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned long 
iova,
- phys_addr_t paddr, size_t size, u32 flags)
+static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, u64 end,
+ phys_addr_t paddr, u32 flags)
 {
unsigned long irqflags;
struct viommu_mapping *mapping;
@@ -323,7 +323,7 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
 
mapping->paddr  = paddr;
mapping->iova.start = iova;
-   mapping->iova.last  = iova + size - 1;
+   mapping->iova.last  = end;
mapping->flags  = flags;
 
spin_lock_irqsave(>mappings_lock, irqflags);
@@ -338,26 +338,24 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
  *
  * @vdomain: the domain
  * @iova: start of the range
- * @size: size of the range. A size of 0 corresponds to the entire address
- * space.
+ * @end: end of the range
  *
- * On success, returns the number of unmapped bytes (>= size)
+ * On success, returns the number of unmapped bytes
  */
 static size_t viommu_del_mappings(struct viommu_domain *vdomain,
- unsigned long iova, size_t size)
+ u64 iova, u64 end)
 {
size_t unmapped = 0;
unsigned long flags;
-   unsigned long last = iova + size - 1;
struct viommu_mapping *mapping = NULL;
struct interval_tree_node *node, *next;
 
spin_lock_irqsave(>mappings_lock, flags);
-   next = interval_tree_iter_first(>mappings, iova, last);
+   next = interval_tree_iter_first(>mappings, iova, end);
while (next) {
node = next;
mapping = container_of(node, struct viommu_mapping, iova);
-   next = interval_tree_iter_next(node, iova, last);
+   next = interval_tree_iter_next(node, iova, end);
 
/* Trying to split a mapping? */
if (mapping->iova.start < iova)
@@ -656,8 +654,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   /* Free all remaining mappings (size 2^64) */
-   viommu_del_mappings(vdomain, 0, 0);
+   /* Free all remaining mappings */
+   viommu_del_mappings(vdomain, 0, ULLONG_MAX);
 
if (vdomain->viommu)
ida_free(>viommu->domain_ids, vdomain->id);
@@ -742,6 +740,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 {
int ret;
u32 flags;
+   u64 end = iova + size - 1;
struct virtio_iommu_req_map map;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -752,7 +751,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
if (flags & ~vdomain->map_flags)
return -EINVAL;
 
-   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
+   ret = viommu_add_mapping(vdomain, iova, end, paddr, flags);
if (ret)
return ret;
 
@@ -761,7 +760,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
.domain = cpu_to_le32(vdomain->id),
.virt_start = cpu_to_le64(iova),
.phys_start = cpu_to_le64(paddr),
-   .virt_end   = cpu_to_le64(iova + size - 1),
+   .virt_end   = cpu_to_le64(end),
.flags  = cpu_to_le32(flags),
};
 
@@ -770,7 +769,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 
ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
if (ret)
-   viommu_del_mappings(vdomain, iova, size);
+   viommu_del_mappings(vdomain, iova, end);
 
return ret;
 }
@@ -783,7 +782,7 @@ static size_t viommu_unmap(struct iommu_domain *domain, 
unsigned long iova,
struct virtio_iommu_req_unmap unmap;
struct viommu_domain *vdomain = to_vi

[PATCH 1/5] iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG

2021-10-13 Thread Jean-Philippe Brucker

Add definitions for the VIRTIO_IOMMU_F_BYPASS_CONFIG, which supersedes
VIRTIO_IOMMU_F_BYPASS.

Signed-off-by: Jean-Philippe Brucker 
---
 include/uapi/linux/virtio_iommu.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..cafd8cf7febf 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS  3
 #define VIRTIO_IOMMU_F_PROBE   4
 #define VIRTIO_IOMMU_F_MMIO5
+#define VIRTIO_IOMMU_F_BYPASS_CONFIG   6
 
 struct virtio_iommu_range_64 {
__le64  start;
@@ -36,6 +37,8 @@ struct virtio_iommu_config {
struct virtio_iommu_range_32domain_range;
/* Probe buffer size */
__le32  probe_size;
+   __u8bypass;
+   __u8reserved[7];
 };
 
 /* Request types */
@@ -66,11 +69,14 @@ struct virtio_iommu_req_tail {
__u8reserved[3];
 };
 
+#define VIRTIO_IOMMU_ATTACH_F_BYPASS   (1 << 0)
+
 struct virtio_iommu_req_attach {
struct virtio_iommu_req_headhead;
__le32  domain;
__le32  endpoint;
-   __u8reserved[8];
+   __le32  flags;
+   __u8reserved[4];
struct virtio_iommu_req_tailtail;
 };
 
-- 
2.33.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 03/11] iommu/virtio: Handle incoming page faults

2021-10-11 Thread Jean-Philippe Brucker

Hi Vivek,

On Mon, Oct 11, 2021 at 01:41:15PM +0530, Vivek Gautam wrote:
> > > + list_for_each_entry(ep, >endpoints, list) {
> > > + if (ep->eid == endpoint) {
> > > + vdev = ep->vdev;
> 
> I have a question here though -
> Is endpoint-ID unique across all the endpoints available per 'viommu_dev' or
> per 'viommu_domain'?
> If it is per 'viommu_domain' then the above list is also incorrect.
> As you pointed to in the patch [1] -
> [PATCH RFC v1 02/11] iommu/virtio: Maintain a list of endpoints served
> by viommu_dev
> I am planning to add endpoint ID into a static global xarray in
> viommu_probe_device() as below:
> 
> vdev_for_each_id(i, eid, vdev) {
> ret = xa_insert(_ep_ids, eid, vdev, GFP_KERNEL);
> if (ret)
> goto err_free_dev;
> }
> 
> and replace the above list traversal as below:
> 
> xa_lock_irqsave(_ep_ids, flags);
> xa_for_each(_ep_ids, eid, vdev) {
> if (eid == endpoint) {
> ret =
> iommu_report_device_fault(vdev->dev, _evt);
> if (ret)
> dev_err(vdev->dev, "Couldn't
> handle page request\n");
> }
> }
> xa_unlock_irqrestore(_ep_ids, flags);
> 
> But using a global xarray would also be incorrect if the endpointsID are 
> global
> across 'viommu_domain'.
> 
> I need to find the correct 'viommu_endpoint' to call 
> iommu_report_device_fault()
> with the correct device.

The endpoint IDs are only unique across viommu_dev, so a global xarray
wouldn't work but one in viommu_dev would. In vdomain it doesn't work
either because we can't get to the domain from the fault handler without
first finding the endpoint

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 10/11] uapi/virtio-iommu: Add a new request type to send page response

2021-10-06 Thread Jean-Philippe Brucker

On Thu, Sep 30, 2021 at 02:54:05PM +0530, Vivek Kumar Gautam wrote:
> > > +struct virtio_iommu_req_page_resp {
> > > + struct virtio_iommu_req_headhead;
> > > + __le32  domain;
> > 
> > I don't think we need this field, since the fault report doesn't come with
> > a domain.
> 
> But here we are sending the response which would be consumed by the vfio
> ultimately. In kvmtool, I am consuming this "virtio_iommu_req_page_resp"
> request in the virtio/iommu driver, extracting the domain from it, and using
> that to call the respective "page_response" ops from "vfio_iommu_ops" in the
> vfio/core driver.
> 
> Is this incorrect way of passing on the page-response back to the host
> kernel?

That works for the host userspace-kernel interface because the device is
always attached to a VFIO container.

For virtio-iommu the domain info is redundant. The endpoint information
needs to be kept through the whole response path in order to target the
right endpoint in the end. In addition the guest could enable PRI without
attaching the endpoint to a domain, or fail to disable PRI before
detaching the endpoint. Sure it's weird, but the host can still inject the
recoverable page fault in this case, and the guest answers with "invalid"
status but no domain. We could mandate domains for recoverable faults but
that forces a synchronization against attach/detach and I think it
needlessly deviates from other IOMMUs.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 01/11] uapi/virtio-iommu: Add page request grp-id and flags information

2021-09-30 Thread Jean-Philippe Brucker

On Thu, Sep 30, 2021 at 10:26:35AM +0530, Vivek Kumar Gautam wrote:
> > I'm not sure why we made it 32-bit in Linux UAPI, it's a little wasteful.
> > PCIe PRGI is 9-bits and SMMU STAG is 16-bits. Since the scope of the grpid
> > is the endpoint, 16-bit means 64k in-flight faults per endpoint, which
> > seems more than enough.
> 
> Right, I will update this to 16-bits field. It won't be okay to update the
> iommu uAPI now, right?

Since there is no UAPI transport for the fault request/response at the
moment, it should be possible to update it.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 10/11] uapi/virtio-iommu: Add a new request type to send page response

2021-09-21 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 03:21:46PM +0530, Vivek Gautam wrote:
> Once the page faults are handled, the response has to be sent to
> virtio-iommu backend, from where it can be sent to the host to
> prepare the response to a generated io page fault by the device.
> Add a new virt-queue request type to handle this.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  include/uapi/linux/virtio_iommu.h | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index c12d9b6a7243..1b174b98663a 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -48,6 +48,7 @@ struct virtio_iommu_config {
>  #define VIRTIO_IOMMU_T_PROBE 0x05
>  #define VIRTIO_IOMMU_T_ATTACH_TABLE  0x06
>  #define VIRTIO_IOMMU_T_INVALIDATE0x07
> +#define VIRTIO_IOMMU_T_PAGE_RESP 0x08
>  
>  /* Status types */
>  #define VIRTIO_IOMMU_S_OK0x00
> @@ -70,6 +71,23 @@ struct virtio_iommu_req_tail {
>   __u8reserved[3];
>  };
>  
> +struct virtio_iommu_req_page_resp {
> + struct virtio_iommu_req_headhead;
> + __le32  domain;

I don't think we need this field, since the fault report doesn't come with
a domain.

> + __le32  endpoint;
> +#define VIRTIO_IOMMU_PAGE_RESP_PASID_VALID   (1 << 0)

To be consistent with the rest of the document this would be
VIRTIO_IOMMU_PAGE_RESP_F_PASID_VALID

> + __le32  flags;
> + __le32  pasid;
> + __le32  grpid;
> +#define VIRTIO_IOMMU_PAGE_RESP_SUCCESS   (0x0)
> +#define VIRTIO_IOMMU_PAGE_RESP_INVALID   (0x1)
> +#define VIRTIO_IOMMU_PAGE_RESP_FAILURE   (0x2)
> + __le16  resp_code;
> + __u8pasid_valid;

This field isn't needed since there already is
VIRTIO_IOMMU_PAGE_RESP_PASID_VALID

> + __u8reserved[9];
> + struct virtio_iommu_req_tailtail;
> +};

I'd align the size of the struct to 16 bytes, but I don't think that's
strictly necessary.

Thanks,
Jean

> +
>  struct virtio_iommu_req_attach {
>   struct virtio_iommu_req_headhead;
>   __le32  domain;
> -- 
> 2.17.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 09/11] iommu/virtio: Implement sva bind/unbind calls

2021-09-21 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 03:21:45PM +0530, Vivek Gautam wrote:
> SVA bind and unbind implementations will allow to prepare translation
> context with CPU page tables that can be programmed into host iommu
> hardware to realize shared address space utilization between the CPU
> and virtualized devices using virtio-iommu.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c  | 199 +-
>  include/uapi/linux/virtio_iommu.h |   2 +
>  2 files changed, 199 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 250c137a211b..08f1294baeab 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -14,6 +14,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -28,6 +31,7 @@
>  #include 
>  #include "iommu-pasid-table.h"
>  #include "iommu-sva-lib.h"
> +#include "io-pgtable-arm.h"

Is this used here?

>  
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
> @@ -41,6 +45,7 @@ DEFINE_XARRAY_ALLOC1(viommu_asid_xa);
>  
>  static DEFINE_MUTEX(sva_lock);
>  static DEFINE_MUTEX(iopf_lock);
> +static DEFINE_MUTEX(viommu_asid_lock);
>  
>  struct viommu_dev_pri_work {
>   struct work_struct  work;
> @@ -88,10 +93,22 @@ struct viommu_mapping {
>  struct viommu_mm {
>   int pasid;
>   u64 archid;
> + struct viommu_sva_bond  *bond;
>   struct io_pgtable_ops   *ops;
>   struct viommu_domain*domain;
>  };
>  
> +struct viommu_sva_bond {
> + struct iommu_svasva;
> + struct mm_struct*mm;
> + struct iommu_psdtable_mmu_notifier  *viommu_mn;
> + struct list_headlist;
> + refcount_t  refs;
> +};
> +
> +#define sva_to_viommu_bond(handle) \
> + container_of(handle, struct viommu_sva_bond, sva)
> +
>  struct viommu_domain {
>   struct iommu_domain domain;
>   struct viommu_dev   *viommu;
> @@ -136,6 +153,7 @@ struct viommu_endpoint {
>   boolpri_supported;
>   boolsva_enabled;
>   booliopf_enabled;
> + struct list_headbonds;
>  };
>  
>  struct viommu_ep_entry {
> @@ -1423,14 +1441,15 @@ static int viommu_attach_pasid_table(struct 
> viommu_endpoint *vdev,
>  
>   pst_cfg->iommu_dev = viommu->dev->parent;
>  
> + mutex_lock(_asid_lock);
>   /* Prepare PASID tables info to allocate a new table */
>   ret = viommu_prepare_pst(vdev, pst_cfg, fmt);
>   if (ret)
> - return ret;
> + goto err_out_unlock;
>  
>   ret = iommu_psdtable_alloc(tbl, pst_cfg);
>   if (ret)
> - return ret;
> + goto err_out_unlock;
>  
>   pst_cfg->iommu_dev = viommu->dev->parent;
>   pst_cfg->fmt = PASID_TABLE_ARM_SMMU_V3;
> @@ -1452,6 +1471,7 @@ static int viommu_attach_pasid_table(struct 
> viommu_endpoint *vdev,
>   if (ret)
>   goto err_free_ops;
>   }
> + mutex_unlock(_asid_lock);
>   } else {
>   /* TODO: otherwise, check for compatibility with vdev. */
>   return -ENOSYS;
> @@ -1467,6 +1487,8 @@ static int viommu_attach_pasid_table(struct 
> viommu_endpoint *vdev,
>  err_free_psdtable:
>   iommu_psdtable_free(tbl, >cfg);
>  
> +err_out_unlock:
> + mutex_unlock(_asid_lock);
>   return ret;
>  }
>  
> @@ -1706,6 +1728,7 @@ static struct iommu_device *viommu_probe_device(struct 
> device *dev)
>   vdev->dev = dev;
>   vdev->viommu = viommu;
>   INIT_LIST_HEAD(>resv_regions);
> + INIT_LIST_HEAD(>bonds);
>   dev_iommu_priv_set(dev, vdev);
>  
>   if (viommu->probe_size) {
> @@ -1755,6 +1778,175 @@ static int viommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
>  
> +static u32 viommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct viommu_sva_bond *bond = sva_to_viommu_bond(handle);
> +
> + return bond->mm->pasid;
> +}
> +
> +static void viommu_mmu_notifier_free(struct mmu_notifier *mn)
> +{
> + kfree(mn_to_pstiommu(mn));
> +}
> +
> +static struct mmu_notifier_ops viommu_mmu_notifier_ops = {
> + .free_notifier  = viommu_mmu_notifier_free,

.invalidate_range and .release will be needed as well, to keep up to date
with changes to the address space

> +};
> +
> +/* Allocate or get existing MMU notifier for this {domain, mm} pair */
> +static struct iommu_psdtable_mmu_notifier *
> +viommu_mmu_notifier_get(struct

Re: [PATCH RFC v1 08/11] iommu/arm-smmu-v3: Implement shared context alloc and free ops

2021-09-21 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 03:21:44PM +0530, Vivek Gautam wrote:
> Implementing the alloc_shared_cd and free_shared_cd in cd-lib, and
> start using them for arm-smmu-v3-sva implementation.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c  | 71 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 83 ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  1 -
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 14 
>  4 files changed, 73 insertions(+), 96 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
> index 537b7c784d40..b87829796596 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
> @@ -285,16 +285,14 @@ static bool arm_smmu_free_asid(struct xarray *xa, void 
> *cookie_cd)
>   * descriptor is using it, try to replace it.
>   */
>  static struct arm_smmu_ctx_desc *
> -arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
> +arm_smmu_share_asid(struct iommu_pasid_table *tbl, struct mm_struct *mm,
> + struct xarray *xa, u16 asid, u32 asid_bits)

xa and asid_bits could be stored in some arch-specific section of the
iommu_pasid_table struct. Other table drivers wouldn't need those
arguments.

More a comment for the parent series: it may be clearer to give a
different prefix to functions in this file (arm_smmu_cd_, pst_arm_?).
Reading this patch I'm a little confused by what belongs in the IOMMU
driver and what is done by this library. (I also keep reading 'tbl' as
'tlb'. Maybe we could make it 'table' since that doesn't take a lot more
space)

>  {
>   int ret;
>   u32 new_asid;
>   struct arm_smmu_ctx_desc *cd;
> - struct arm_smmu_device *smmu;
> - struct arm_smmu_domain *smmu_domain;
> - struct iommu_pasid_table *tbl;
>  
> - cd = xa_load(_smmu_asid_xa, asid);
> + cd = xa_load(xa, asid);
>   if (!cd)
>   return NULL;
>  
> @@ -306,12 +304,8 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>   return cd;
>   }
>  
> - smmu_domain = container_of(cd, struct arm_smmu_domain, s1_cfg.cd);
> - smmu = smmu_domain->smmu;
> - tbl = smmu_domain->tbl;
> -
> - ret = xa_alloc(_smmu_asid_xa, _asid, cd,
> -XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
> + ret = xa_alloc(xa, _asid, cd, XA_LIMIT(1, (1 << asid_bits) - 1),
> +GFP_KERNEL);
>   if (ret)
>   return ERR_PTR(-ENOSPC);
>   /*
> @@ -325,48 +319,52 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>* be some overlap between use of both ASIDs, until we invalidate the
>* TLB.
>*/
> - ret = iommu_psdtable_write(tbl, >cfg, 0, cd);
> + ret = arm_smmu_write_ctx_desc(>cfg, 0, cd);
>   if (ret)
>   return ERR_PTR(-ENOSYS);
>  
>   /* Invalidate TLB entries previously associated with that context */
> - iommu_psdtable_flush_tlb(tbl, smmu_domain, asid);
> + iommu_psdtable_flush_tlb(tbl, tbl->cookie, asid);
>  
> - xa_erase(_smmu_asid_xa, asid);
> + xa_erase(xa, asid);
>   return NULL;
>  }
>  
> -struct arm_smmu_ctx_desc *
> -arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, struct mm_struct *mm)
> +static struct iommu_psdtable_mmu_notifier *
> +arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, struct mm_struct *mm,
> +  struct xarray *xa, u32 asid_bits)
>  {
>   u16 asid;
>   int err = 0;
>   u64 tcr, par, reg;
>   struct arm_smmu_ctx_desc *cd;
>   struct arm_smmu_ctx_desc *ret = NULL;
> + struct iommu_psdtable_mmu_notifier *pst_mn;
>  
>   asid = arm64_mm_context_get(mm);
>   if (!asid)
>   return ERR_PTR(-ESRCH);
>  
> + pst_mn = kzalloc(sizeof(*pst_mn), GFP_KERNEL);
> + if (!pst_mn) {
> + err = -ENOMEM;
> + goto out_put_context;
> + }
> +
>   cd = kzalloc(sizeof(*cd), GFP_KERNEL);
>   if (!cd) {
>   err = -ENOMEM;
> - goto out_put_context;
> + goto out_free_mn;
>   }
>  
>   refcount_set(>refs, 1);
>  
> - mutex_lock(_smmu_asid_lock);
> - ret = arm_smmu_share_asid(mm, asid);
> + ret = arm_smmu_share_asid(tbl, mm, xa, asid, asid_bits);
>   if (ret) {
> - mutex_unlock(_smmu_asid_lock);
>   goto out_free_cd;
>   }
>  
> - err = xa_insert(_smmu_asid_xa, asid, cd, GFP_KERNEL);
> - mutex_unlock(_smmu_asid_lock);
> -
> + err = xa_insert(xa, asid, cd, GFP_KERNEL);
>   if (err)
>   goto out_free_asid;
>  
> @@ -406,21 +404,26 @@ arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, 
> struct mm_struct *mm)
>   cd->asid = asid;
>   cd->mm = mm;
>  
> - return cd;
> + pst_mn->vendor.cd = cd;
> + return pst_mn;
>  
>  out_free_asid:
> -

Re: [PATCH RFC v1 05/11] iommu/virtio: Add SVA feature and related enable/disable callbacks

2021-09-21 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 03:21:41PM +0530, Vivek Gautam wrote:
> Add a feature flag to virtio iommu for Shared virtual addressing
> (SVA). This feature would indicate the availablily path for handling
> device page faults, and the provision for sending page response.

In this case the feature should probably be called PAGE_REQUEST or
similar. SVA aggregates PF + PASID + shared page tables

Thanks,
Jean

> Also add necessary methods to enable and disable SVA so that the
> masters can enable the SVA path. This also requires enabling the
> PRI capability on the device.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c  | 268 ++
>  include/uapi/linux/virtio_iommu.h |   1 +
>  2 files changed, 269 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 3da5f0807711..250c137a211b 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -26,6 +27,7 @@
>  
>  #include 
>  #include "iommu-pasid-table.h"
> +#include "iommu-sva-lib.h"
>  
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
> @@ -37,6 +39,9 @@
>  /* Some architectures need an Address Space ID for each page table */
>  DEFINE_XARRAY_ALLOC1(viommu_asid_xa);
>  
> +static DEFINE_MUTEX(sva_lock);
> +static DEFINE_MUTEX(iopf_lock);
> +
>  struct viommu_dev_pri_work {
>   struct work_struct  work;
>   struct viommu_dev   *dev;
> @@ -71,6 +76,7 @@ struct viommu_dev {
>  
>   boolhas_map:1;
>   boolhas_table:1;
> + boolhas_sva:1;
>  };
>  
>  struct viommu_mapping {
> @@ -124,6 +130,12 @@ struct viommu_endpoint {
>   void*pstf;
>   /* Preferred page table format */
>   void*pgtf;
> +
> + /* sva */
> + boolats_supported;
> + boolpri_supported;
> + boolsva_enabled;
> + booliopf_enabled;
>  };
>  
>  struct viommu_ep_entry {
> @@ -582,6 +594,64 @@ static int viommu_add_pstf(struct viommu_endpoint *vdev, 
> void *pstf, size_t len)
>   return 0;
>  }
>  
> +static int viommu_init_ats_pri(struct viommu_endpoint *vdev)
> +{
> + struct device *dev = vdev->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!dev_is_pci(vdev->dev))
> + return -EINVAL;
> +
> + if (pci_ats_supported(pdev))
> + vdev->ats_supported = true;
> +
> + if (pci_pri_supported(pdev))
> + vdev->pri_supported = true;
> +
> + return 0;
> +}
> +
> +static int viommu_enable_pri(struct viommu_endpoint *vdev)
> +{
> + int ret;
> + struct pci_dev *pdev;
> +
> + /* Let's allow only 4 requests for PRI right now */
> + size_t max_inflight_pprs = 4;
> +
> + if (!vdev->pri_supported || !vdev->ats_supported)
> + return -ENODEV;
> +
> + pdev = to_pci_dev(vdev->dev);
> +
> + ret = pci_reset_pri(pdev);
> + if (ret)
> + return ret;
> +
> + ret = pci_enable_pri(pdev, max_inflight_pprs);
> + if (ret) {
> + dev_err(vdev->dev, "cannot enable PRI: %d\n", ret);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static void viommu_disable_pri(struct viommu_endpoint *vdev)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev_is_pci(vdev->dev))
> + return;
> +
> + pdev = to_pci_dev(vdev->dev);
> +
> + if (!pdev->pri_enabled)
> + return;
> +
> + pci_disable_pri(pdev);
> +}
> +
>  static int viommu_init_queues(struct viommu_dev *viommu)
>  {
>   viommu->iopf_pri = iopf_queue_alloc(dev_name(viommu->dev));
> @@ -684,6 +754,10 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   if (ret)
>   goto out_free_eps;
>  
> + ret = viommu_init_ats_pri(vdev);
> + if (ret)
> + goto out_free_eps;
> +
>   kfree(probe);
>   return 0;
>  
> @@ -1681,6 +1755,194 @@ static int viommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
>  
> +static bool viommu_endpoint_iopf_supported(struct viommu_endpoint *vdev)
> +{
> + /* TODO: support Stall model later */
> + return vdev->pri_supported;
> +}
> +
> +bool viommu_endpoint_sva_supported(struct viommu_endpoint *vdev)
> +{
> + struct viommu_dev *viommu = vdev->viommu;
> +
> + if (!viommu->has_sva)
> + return false;
> +
> + return vdev->pasid_bits;
> +}
> +
> +bool viommu_endpoint_sva_enabled(struct viommu_endpoint *vdev)
> +{
> + bool enabled;
> +
> + mutex_lock(_lock);
> +

Re: [PATCH RFC v1 03/11] iommu/virtio: Handle incoming page faults

2021-09-21 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 03:21:39PM +0530, Vivek Gautam wrote:
> Redirect the incoming page faults to the registered fault handler
> that can take the fault information such as, pasid, page request
> group-id, address and pasid flags.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c  | 80 ++-
>  include/uapi/linux/virtio_iommu.h |  1 +
>  2 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index c970f386f031..fd237cad1ce5 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -37,6 +37,13 @@
>  /* Some architectures need an Address Space ID for each page table */
>  DEFINE_XARRAY_ALLOC1(viommu_asid_xa);
>  
> +struct viommu_dev_pri_work {
> + struct work_struct  work;
> + struct viommu_dev   *dev;
> + struct virtio_iommu_fault   *vfault;
> + u32 endpoint;
> +};
> +
>  struct viommu_dev {
>   struct iommu_device iommu;
>   struct device   *dev;
> @@ -49,6 +56,8 @@ struct viommu_dev {
>   struct list_headrequests;
>   void*evts;
>   struct list_headendpoints;
> + struct workqueue_struct *pri_wq;
> + struct viommu_dev_pri_work  *pri_work;

IOPF already has a workqueue, so the driver doesn't need one.
iommu_report_device_fault() should be fast enough to be called from the
event handler.

>  
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
> @@ -666,6 +675,58 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   return ret;
>  }
>  
> +static void viommu_handle_ppr(struct work_struct *work)
> +{
> + struct viommu_dev_pri_work *pwork =
> + container_of(work, struct viommu_dev_pri_work, 
> work);
> + struct viommu_dev *viommu = pwork->dev;
> + struct virtio_iommu_fault *vfault = pwork->vfault;
> + struct viommu_endpoint *vdev;
> + struct viommu_ep_entry *ep;
> + struct iommu_fault_event fault_evt = {
> + .fault.type = IOMMU_FAULT_PAGE_REQ,
> + };
> + struct iommu_fault_page_request *prq = _evt.fault.prm;
> +
> + u32 flags   = le32_to_cpu(vfault->flags);
> + u32 prq_flags   = le32_to_cpu(vfault->pr_evt_flags);
> + u32 endpoint= pwork->endpoint;
> +
> + memset(prq, 0, sizeof(struct iommu_fault_page_request));

The fault_evt struct is already initialized

> + prq->addr = le64_to_cpu(vfault->address);
> +
> + if (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_LAST_PAGE)
> + prq->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + if (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID) {
> + prq->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> + prq->pasid = le32_to_cpu(vfault->pasid);
> + prq->grpid = le32_to_cpu(vfault->grpid);
> + }
> +
> + if (flags & VIRTIO_IOMMU_FAULT_F_READ)
> + prq->perm |= IOMMU_FAULT_PERM_READ;
> + if (flags & VIRTIO_IOMMU_FAULT_F_WRITE)
> + prq->perm |= IOMMU_FAULT_PERM_WRITE;
> + if (flags & VIRTIO_IOMMU_FAULT_F_EXEC)
> + prq->perm |= IOMMU_FAULT_PERM_EXEC;
> + if (flags & VIRTIO_IOMMU_FAULT_F_PRIV)
> + prq->perm |= IOMMU_FAULT_PERM_PRIV;
> +
> + list_for_each_entry(ep, >endpoints, list) {
> + if (ep->eid == endpoint) {
> + vdev = ep->vdev;
> + break;
> + }
> + }
> +
> + if ((prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID) &&
> + (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_NEEDS_PASID))
> + prq->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
> +
> + if (iommu_report_device_fault(vdev->dev, _evt))
> + dev_err(vdev->dev, "Couldn't handle page request\n");

An error likely means that nobody registered a fault handler, but we could
display a few more details about the fault that would help debug the
endpoint

> +}
> +
>  static int viommu_fault_handler(struct viommu_dev *viommu,
>   struct virtio_iommu_fault *fault)
>  {
> @@ -679,7 +740,13 @@ static int viommu_fault_handler(struct viommu_dev 
> *viommu,
>   u32 pasid   = le32_to_cpu(fault->pasid);
>  
>   if (type == VIRTIO_IOMMU_FAULT_F_PAGE_REQ) {
> - dev_info(viommu->dev, "Page request fault - unhandled\n");
> + dev_info_ratelimited(viommu->dev,
> +  "Page request fault from EP %u\n",
> +  endpoint);

That's rather for debugging the virtio-iommu driver, so should be
dev_dbg() (or removed entirely)

> +
> + viommu->pri_work->vfault = fault;
> + viommu->pri_work->endpoint = endpoint;
> + queue_work(viommu->pri_wq, >pri_work->work);
>

Re: [PATCH RFC v1 02/11] iommu/virtio: Maintain a list of endpoints served by viommu_dev

2021-09-21 Thread Jean-Philippe Brucker

On Fri, Apr 23, 2021 at 03:21:38PM +0530, Vivek Gautam wrote:
> Keeping a record of list of endpoints that are served by the virtio-iommu
> device would help in redirecting the requests of page faults to the
> correct endpoint device to handle such requests.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 50039070e2aa..c970f386f031 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -48,6 +48,7 @@ struct viommu_dev {
>   spinlock_t  request_lock;
>   struct list_headrequests;
>   void*evts;
> + struct list_headendpoints;

As we're going to search by ID, an xarray or rb_tree would be more
appropriate than a list

>  
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
> @@ -115,6 +116,12 @@ struct viommu_endpoint {
>   void*pgtf;
>  };
>  
> +struct viommu_ep_entry {
> + u32 eid;
> + struct viommu_endpoint  *vdev;
> + struct list_headlist;
> +};

No need for a separate struct, I think you can just add the list head and
id into viommu_endpoint.

> +
>  struct viommu_request {
>   struct list_headlist;
>   void*writeback;
> @@ -573,6 +580,7 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   size_t probe_len;
>   struct virtio_iommu_req_probe *probe;
>   struct virtio_iommu_probe_property *prop;
> + struct viommu_ep_entry *ep;
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>   struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
>  
> @@ -640,6 +648,18 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   prop = (void *)probe->properties + cur;
>   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
>   }
> + if (ret)
> + goto out_free;
> +
> + ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> + if (!ep) {
> + ret = -ENOMEM;
> + goto out_free;
> + }
> + ep->eid = probe->endpoint;
> + ep->vdev = vdev;
> +
> + list_add(>list, >endpoints);

This should be in viommu_probe_device() (viommu_probe_endpoint() is only
called if F_PROBE is negotiated). I think we need a lock for this
list/xarray

Thanks,
Jean

>  
>  out_free:
>   kfree(probe);
> @@ -1649,6 +1669,7 @@ static int viommu_probe(struct virtio_device *vdev)
>   viommu->dev = dev;
>   viommu->vdev = vdev;
>   INIT_LIST_HEAD(>requests);
> + INIT_LIST_HEAD(>endpoints);
>  
>   ret = viommu_init_vqs(viommu);
>   if (ret)
> -- 
> 2.17.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 01/11] uapi/virtio-iommu: Add page request grp-id and flags information

2021-09-21 Thread Jean-Philippe Brucker

Hi Vivek,

Thanks a lot for your work on this

On Fri, Apr 23, 2021 at 03:21:37PM +0530, Vivek Gautam wrote:
> Add fault information for group-id and necessary flags for page
> request faults that can be handled by page fault handler in
> virtio-iommu driver.
> 
> Signed-off-by: Vivek Gautam 
> Cc: Joerg Roedel 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Jean-Philippe Brucker 
> Cc: Eric Auger 
> Cc: Alex Williamson 
> Cc: Kevin Tian 
> Cc: Jacob Pan 
> Cc: Liu Yi L 
> Cc: Lorenzo Pieralisi 
> Cc: Shameerali Kolothum Thodi 
> ---
>  include/uapi/linux/virtio_iommu.h | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index f8bf927a0689..accc3318ce46 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -307,14 +307,27 @@ struct virtio_iommu_req_invalidate {
>  #define VIRTIO_IOMMU_FAULT_F_DMA_UNRECOV 1
>  #define VIRTIO_IOMMU_FAULT_F_PAGE_REQ2
>  
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID (1 << 0)
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_LAST_PAGE   (1 << 1)
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_PRIV_DATA   (1 << 2)
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_NEEDS_PASID (1 << 3)

I don't think this one is necessary here. The NEEDS_PASID flags added by
commit 970471914c67 ("iommu: Allow page responses without PASID") mainly
helps Linux keep track of things internally. It does tell the fault
handler whether to reply with PASID or not, but we don't need that here.
The virtio-iommu driver knows whether a PASID is required by looking at
the "PRG Response PASID Required" bit in the PCIe capability. For non-PCIe
faults (e.g. SMMU stall), I'm guessing we'll need a PROBE property to
declare that the endpoint supports recoverable faults anyway, so "PASID
required in response" can go through there as well.

> +
> +#define VIRTIO_IOMMU_FAULT_UNREC_F_PASID_VALID   (1 << 0)
> +#define VIRTIO_IOMMU_FAULT_UNREC_F_ADDR_VALID(1 << 1)
> +#define VIRTIO_IOMMU_FAULT_UNREC_F_FETCH_ADDR_VALID  (1 << 2)
> +
>  struct virtio_iommu_fault {
>   __u8reason;
>   __u8reserved[3];
>   __le16  flt_type;
>   __u8reserved2[2];
> + /* flags is actually permission flags */

It's also used for declaring validity of fields.
VIRTIO_IOMMU_FAULT_F_ADDRESS already tells whether the address field is
valid, so all the other flags introduced by this patch can go in here.

>   __le32  flags;
> + /* flags for PASID and Page request handling info */
> + __le32  pr_evt_flags;
>   __le32  endpoint;
>   __le32  pasid;
> + __le32  grpid;

I'm not sure why we made it 32-bit in Linux UAPI, it's a little wasteful.
PCIe PRGI is 9-bits and SMMU STAG is 16-bits. Since the scope of the grpid
is the endpoint, 16-bit means 64k in-flight faults per endpoint, which
seems more than enough.

New fields must be appended at the end of the struct, because old drivers
will expect to find the 'endpoint' field at this offset. You could remove
'reserved3' while adding 'grpid', to keep the struct layout.

>   __u8reserved3[4];
>   __le64  address;
>   __u8reserved4[8];

So the base structure, currently in the spec, looks like this:

struct virtio_iommu_fault {
u8   reason;
u8   reserved[3];
le32 flags;
le32 endpoint;
le32 reserved1;
le64 address;
};

#define VIRTIO_IOMMU_FAULT_F_READ   (1 << 0)
#define VIRTIO_IOMMU_FAULT_F_WRITE  (1 << 1)
#define VIRTIO_IOMMU_FAULT_F_ADDRESS(1 << 8)

The extended struct could be:

struct virtio_iommu_fault {
u8   reason;
u8   reserved[3];
le32 flags;
le32 endpoint;
le32 pasid;
le64 address;
/* Page request group ID */
le16 group_id;
u8   reserved1[6];
/* For VT-d private data */
le64 private_data[2];
};

#define VIRTIO_IOMMU_FAULT_F_READ   (1 << 0)
#define VIRTIO_IOMMU_FAULT_F_WRITE  (1 << 1)
#define VIRTIO_IOMMU_FAULT_F_EXEC   (1 << 2)
#d

Re: [PATCH 1/5] dt-bindings: virtio: mmio: Add support for device subnode

2021-07-14 Thread Jean-Philippe Brucker

On Tue, Jul 13, 2021 at 10:34:03PM +0200, Arnd Bergmann wrote:
> > > Is it going to be a problem if two devices in kernel use the same
> > > of_node ?
> >
> > There shouldn't be. We have nodes be multiple providers (e.g clocks
> > and resets) already.
> 
> I think this would be a little different, but it can still work. There is in
> fact already some precedent of doing this, with Jean-Philippe's virtio-iommu
> binding, which is documented in both
> 
> Documentation/devicetree/bindings/virtio/iommu.txt
> Documentation/devicetree/bindings/virtio/mmio.txt
> 
> Unfortunately, those are still slightly different from where I think we should
> be going here, but it's probably close enough to fit into the general
> system.
> 
> What we have with virtio-iommu is two special hacks:
>  - on virtio-mmio, a node with 'compatible="virtio,mmio"' may optionally
>have an '#iommu-cells=<1>', in which case we assume it's an iommu.
>  - for virtio-pci, the node has the standard PCI 'reg' property but a special
>'compatible="virtio,pci-iommu"' property that I think is different from any
>other PCI node.

Yes in retrospect I don't think the compatible property on the PCI
endpoint node is necessary nor useful, we could deprecate it. The OS gets
the IOMMU topology information early from 'iommus', 'iommu-map' and
'#iommu-cells' properties, which is the only reason we need this PCI
endpoint explicitly described in DT. The rest is discovered while probing
just like virtio-mmio.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

2021-06-18 Thread Jean-Philippe Brucker

Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.

The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
  first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/dma-iommu.h   |  4 ++--
 arch/arm64/mm/dma-mapping.c |  2 +-
 drivers/iommu/amd/iommu.c   |  2 +-
 drivers/iommu/dma-iommu.c   | 12 ++--
 drivers/iommu/intel/iommu.c |  5 +
 5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..758ca4694257 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
+void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
 
 /* The DMA API isn't _quite_ the whole story, though... */
 /*
@@ -50,7 +50,7 @@ struct msi_msg;
 struct device;
 
 static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base,
-   u64 size)
+  u64 dma_limit)
 {
 }
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..6719f9efea09 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
 
dev->dma_coherent = coherent;
if (iommu)
-   iommu_setup_dma_ops(dev, dma_base, size);
+   iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
 
 #ifdef CONFIG_XEN
if (xen_swiotlb_detect())
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3ac42bbdefc6..216323fb27ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev)
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
else
set_dma_ops(dev, NULL);
 }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..c62e19bed302 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev)
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
  * @base: IOVA at which the mappable address space starts
- * @size: Size of IOVA space
+ * @limit: Last address of the IOVA space
  * @dev: Device the domain is being initialised for
  *
- * @base and @size should be exact multiples of IOMMU page granularity to
+ * @base and @limit + 1 should be exact multiples of IOMMU page granularity to
  * avoid rounding surprises. If necessary, we reserve the page at address 0
  * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
  * any change which could make prior IOVAs invalid will fail.
  */
 static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
-   u64 size, struct device *dev)
+dma_addr_t limit, struct device *dev)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
@@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
/* Check the domain allows at least some access to the device... */
if (domain->geometry.force_aperture) {
if (base > domain->geometry.aperture_end ||
-   base + size <= domain->geometry.aperture_start) {
+   limit < domain->geometry.aperture_start) {
pr_warn("specified DMA range outside IOMMU 
capability\n");
return -EFAULT;
}
@@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = {
  * The IOMMU core code allocates the default DMA domain, which the underlying
  * IOMMU driver needs to support via the dm

[PATCH v5 3/5] ACPI: Add driver for the VIOT table

2021-06-18 Thread Jean-Philippe Brucker

The ACPI Virtual I/O Translation Table describes topology of
para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT.
For now it describes the relation between virtio-iommu and the endpoints
it manages.

Three steps are needed to configure DMA of endpoints:

(1) acpi_viot_init(): parse the VIOT table, find or create the fwnode
associated to each vIOMMU device. This needs to happen after
acpi_scan_init(), because it relies on the struct device and their
fwnode to be available.

(2) When probing the vIOMMU device, the driver registers its IOMMU ops
within the IOMMU subsystem. This step doesn't require any
intervention from the VIOT driver.

(3) viot_iommu_configure(): before binding the endpoint to a driver,
find the associated IOMMU ops. Register them, along with the
endpoint ID, into the device's iommu_fwspec.

If step (3) happens before step (2), it is deferred until the IOMMU is
initialized, then retried.

Tested-by: Eric Auger 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/Kconfig  |   3 +
 drivers/iommu/Kconfig |   1 +
 drivers/acpi/Makefile |   2 +
 include/linux/acpi_viot.h |  19 ++
 drivers/acpi/bus.c|   2 +
 drivers/acpi/scan.c   |   3 +
 drivers/acpi/viot.c   | 366 ++
 MAINTAINERS   |   8 +
 8 files changed, 404 insertions(+)
 create mode 100644 include/linux/acpi_viot.h
 create mode 100644 drivers/acpi/viot.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index eedec61e3476..3758c6940ed7 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -526,6 +526,9 @@ endif
 
 source "drivers/acpi/pmic/Kconfig"
 
+config ACPI_VIOT
+   bool
+
 endif  # ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..aff8a4830dd1 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -403,6 +403,7 @@ config VIRTIO_IOMMU
depends on ARM64
select IOMMU_API
select INTERVAL_TREE
+   select ACPI_VIOT if ACPI
help
  Para-virtualised IOMMU driver with virtio.
 
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 700b41adf2db..a6e644c48987 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -118,3 +118,5 @@ video-objs  += acpi_video.o video_detect.o
 obj-y  += dptf/
 
 obj-$(CONFIG_ARM64)+= arm64/
+
+obj-$(CONFIG_ACPI_VIOT)+= viot.o
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
new file mode 100644
index ..1eb8ee5b0e5f
--- /dev/null
+++ b/include/linux/acpi_viot.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ACPI_VIOT_H__
+#define __ACPI_VIOT_H__
+
+#include 
+
+#ifdef CONFIG_ACPI_VIOT
+void __init acpi_viot_init(void);
+int viot_iommu_configure(struct device *dev);
+#else
+static inline void acpi_viot_init(void) {}
+static inline int viot_iommu_configure(struct device *dev)
+{
+   return -ENODEV;
+}
+#endif
+
+#endif /* __ACPI_VIOT_H__ */
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index a4bd673934c0..d6f4e2f06fdb 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -27,6 +27,7 @@
 #include 
 #endif
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1334,6 +1335,7 @@ static int __init acpi_init(void)
acpi_wakeup_device_init();
acpi_debugger_init();
acpi_setup_sb_notify_handler();
+   acpi_viot_init();
return 0;
 }
 
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 2a2e690040e9..3e2bb04ab528 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1556,6 +1557,8 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
return ops;
 
err = iort_iommu_configure_id(dev, id_in);
+   if (err && err != -EPROBE_DEFER)
+   err = viot_iommu_configure(dev);
 
/*
 * If we have reason to believe the IOMMU driver missed the initial
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
new file mode 100644
index ..d2256326c73a
--- /dev/null
+++ b/drivers/acpi/viot.c
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual I/O topology
+ *
+ * The Virtual I/O Translation Table (VIOT) describes the topology of
+ * para-virtual IOMMUs and the endpoints they manage. The OS uses it to
+ * initialize devices in the right order, preventing endpoints from issuing DMA
+ * before their IOMMU is ready.
+ *
+ * When binding a driver to a device, before calling the device driver's 
probe()
+ * method, the driver infrastructure calls dma_configure(). At that point the
+ * VIOT driver looks for an IOMMU associated to the device in the VIOT table.
+ * If an IOMMU exists and has been initialized, the VIOT driver initializes the

[PATCH v5 5/5] iommu/virtio: Enable x86 support

2021-06-18 Thread Jean-Philippe Brucker

With the VIOT support in place, x86 platforms can now use the
virtio-iommu.

Because the other x86 IOMMU drivers aren't yet ready to use the
acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the
moment. Similarly to Vt-d and AMD IOMMU, clear the DMA ops and call
iommu_setup_dma_ops() from probe_finalize().

Acked-by: Joerg Roedel 
Acked-by: Michael S. Tsirkin 
Tested-by: Eric Auger 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig|  3 ++-
 drivers/iommu/dma-iommu.c|  1 +
 drivers/iommu/virtio-iommu.c | 11 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index aff8a4830dd1..07b7c25cbed8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -400,8 +400,9 @@ config HYPERV_IOMMU
 config VIRTIO_IOMMU
tristate "Virtio IOMMU driver"
depends on VIRTIO
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA
select INTERVAL_TREE
select ACPI_VIOT if ACPI
help
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c62e19bed302..9dbbc95c8189 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1330,6 +1330,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
 pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA 
ops\n",
 dev_name(dev));
 }
+EXPORT_SYMBOL_GPL(iommu_setup_dma_ops);
 
 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
phys_addr_t msi_addr, struct iommu_domain *domain)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index c6e5ee4d9cef..fe581f0c9b3a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -904,6 +905,15 @@ static struct iommu_device *viommu_probe_device(struct 
device *dev)
return ERR_PTR(ret);
 }
 
+static void viommu_probe_finalize(struct device *dev)
+{
+#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS
+   /* First clear the DMA ops in case we're switching from a DMA domain */
+   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
+#endif
+}
+
 static void viommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -940,6 +950,7 @@ static struct iommu_ops viommu_ops = {
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
.probe_device   = viommu_probe_device,
+   .probe_finalize = viommu_probe_finalize,
.release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
-- 
2.32.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT

2021-06-18 Thread Jean-Philippe Brucker

Extract the code that sets up the IOMMU infrastructure from IORT, since
it can be reused by VIOT. Move it one level up into a new
acpi_iommu_configure_id() function, which calls the IORT parsing
function which in turn calls the acpi_iommu_fwspec_init() helper.

Signed-off-by: Jean-Philippe Brucker 
---
 include/acpi/acpi_bus.h   |  3 ++
 include/linux/acpi_iort.h |  8 ++---
 drivers/acpi/arm64/iort.c | 74 +--
 drivers/acpi/scan.c   | 73 +-
 4 files changed, 86 insertions(+), 72 deletions(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 3a82faac5767..41f092a269f6 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -588,6 +588,9 @@ struct acpi_pci_root {
 
 bool acpi_dma_supported(struct acpi_device *adev);
 enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
+int acpi_iommu_fwspec_init(struct device *dev, u32 id,
+  struct fwnode_handle *fwnode,
+  const struct iommu_ops *ops);
 int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset,
   u64 *size);
 int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f7f054833afd..f1f0842a2cb2 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in);
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
@@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device 
*dev) { }
 /* IOMMU interface */
 static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
 { return -ENODEV; }
-static inline const struct iommu_ops *iort_iommu_configure_id(
- struct device *dev, const u32 *id_in)
-{ return NULL; }
+static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+{ return -ENODEV; }
 static inline
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index a940be1cf2af..487d1095030d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -806,23 +806,6 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
return NULL;
 }
 
-static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev)
-{
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
-}
-
-static inline int iort_add_device_replay(struct device *dev)
-{
-   int err = 0;
-
-   if (dev->bus && !device_iommu_mapped(dev))
-   err = iommu_probe_device(dev);
-
-   return err;
-}
-
 /**
  * iort_iommu_msi_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type)
}
 }
 
-static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,
-  struct fwnode_handle *fwnode,
-  const struct iommu_ops *ops)
-{
-   int ret = iommu_fwspec_init(dev, fwnode, ops);
-
-   if (!ret)
-   ret = iommu_fwspec_add_ids(dev, , 1);
-
-   return ret;
-}
-
 static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
 {
struct acpi_iort_root_complex *pci_rc;
@@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct 
acpi_iort_node *node,
return iort_iommu_driver_enabled(node->type) ?
   -EPROBE_DEFER : -ENODEV;
 
-   return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);
+   return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
 }
 
 struct iort_pci_alias_info {
@@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev,
  * @dev: device to configure
  * @id_in: optional input id const value pointer
  *
- * Returns: iommu_ops pointer on configuration success
- *  NULL on configuration failure
+ * Returns: 0 on success, <0 on failure
  */
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
struct acpi_iort_node *node;
-   const struct iommu_ops *ops;
int err = -ENODEV;
 
-   /*
-

[PATCH v5 1/5] ACPI: arm64: Move DMA setup operations out of IORT

2021-06-18 Thread Jean-Philippe Brucker

Extract generic DMA setup code out of IORT, so it can be reused by VIOT.
Keep it in drivers/acpi/arm64 for now, since it could break x86
platforms that haven't run this code so far, if they have invalid
tables.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/Makefile |  1 +
 include/linux/acpi.h|  3 +++
 include/linux/acpi_iort.h   |  6 ++---
 drivers/acpi/arm64/dma.c| 50 ++
 drivers/acpi/arm64/iort.c   | 54 ++---
 drivers/acpi/scan.c |  2 +-
 6 files changed, 66 insertions(+), 50 deletions(-)
 create mode 100644 drivers/acpi/arm64/dma.c

diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..66acbe77f46e 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_ACPI_IORT)+= iort.o
 obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
+obj-y  += dma.o
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c60745f657e9..7aaa9559cc19 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct 
acpi_srat_x2apic_cpu_affinity *pa);
 
 #ifdef CONFIG_ARM64
 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa);
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size);
 #else
 static inline void
 acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
+static inline void
+acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { }
 #endif
 
 int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 1a12baa58e40..f7f054833afd 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, 
u32 id,
 void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
-void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
+int iort_dma_get_ranges(struct device *dev, u64 *size);
 const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
const u32 *id_in);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
@@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain(
 { return NULL; }
 static inline void acpi_configure_pmsi_domain(struct device *dev) { }
 /* IOMMU interface */
-static inline void iort_dma_setup(struct device *dev, u64 *dma_addr,
- u64 *size) { }
+static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
+{ return -ENODEV; }
 static inline const struct iommu_ops *iort_iommu_configure_id(
  struct device *dev, const u32 *id_in)
 { return NULL; }
diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
new file mode 100644
index ..f16739ad3cc0
--- /dev/null
+++ b/drivers/acpi/arm64/dma.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
+{
+   int ret;
+   u64 end, mask;
+   u64 dmaaddr = 0, size = 0, offset = 0;
+
+   /*
+* If @dev is expected to be DMA-capable then the bus code that created
+* it should have initialised its dma_mask pointer by this point. For
+* now, we'll continue the legacy behaviour of coercing it to the
+* coherent mask if not, but we'll no longer do so quietly.
+*/
+   if (!dev->dma_mask) {
+   dev_warn(dev, "DMA mask not set\n");
+   dev->dma_mask = >coherent_dma_mask;
+   }
+
+   if (dev->coherent_dma_mask)
+   size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
+   else
+   size = 1ULL << 32;
+
+   ret = acpi_dma_get_range(dev, , , );
+   if (ret == -ENODEV)
+   ret = iort_dma_get_ranges(dev, );
+   if (!ret) {
+   /*
+* Limit coherent and dma mask based on size retrieved from
+* firmware.
+*/
+   end = dmaaddr + size - 1;
+   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   dev->bus_dma_limit = end;
+   dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask);
+   *dev->dma_mask = min(*dev->dma_mask, mask);
+   }
+
+   *dma_addr = dmaaddr;
+   *dma_size = size;
+
+   ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size);
+
+   dev_dbg(dev, "dma_offset(%#08llx)%s\n", offset, ret ? " failed!" : "");
+}
diff --git a/drivers/acpi/a

[PATCH v5 0/5] Add support for ACPI VIOT

2021-06-18 Thread Jean-Philippe Brucker

Add a driver for the ACPI VIOT table, which provides topology
information for para-virtual IOMMUs. Enable virtio-iommu on
non-devicetree platforms, including x86.

Since v4 [1]:
* Fixes (comments, wrong argument, unused variable)
* Removed patch 5 that wrongly moved set_dma_ops(dev, NULL) into dma-iommu.
  The simplification of limit parameters for x86 callers is now in patch 4.
* Release ACPI table after parsing
* Added review and tested tags, thanks for all the feedback!

You can find a QEMU implementation at [2], with extra support for
testing all VIOT nodes including MMIO-based endpoints and IOMMU.
This series is at [3].

[1] 
https://lore.kernel.org/linux-iommu/20210610075130.67517-1-jean-phili...@linaro.org/
[2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi
[3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi

Jean-Philippe Brucker (5):
  ACPI: arm64: Move DMA setup operations out of IORT
  ACPI: Move IOMMU setup code out of IORT
  ACPI: Add driver for the VIOT table
  iommu/dma: Pass address limit rather than size to
iommu_setup_dma_ops()
  iommu/virtio: Enable x86 support

 drivers/acpi/Kconfig |   3 +
 drivers/iommu/Kconfig|   4 +-
 drivers/acpi/Makefile|   2 +
 drivers/acpi/arm64/Makefile  |   1 +
 include/acpi/acpi_bus.h  |   3 +
 include/linux/acpi.h |   3 +
 include/linux/acpi_iort.h|  14 +-
 include/linux/acpi_viot.h|  19 ++
 include/linux/dma-iommu.h|   4 +-
 arch/arm64/mm/dma-mapping.c  |   2 +-
 drivers/acpi/arm64/dma.c |  50 +
 drivers/acpi/arm64/iort.c| 128 ++--
 drivers/acpi/bus.c   |   2 +
 drivers/acpi/scan.c  |  78 +++-
 drivers/acpi/viot.c  | 366 +++
 drivers/iommu/amd/iommu.c|   2 +-
 drivers/iommu/dma-iommu.c|  13 +-
 drivers/iommu/intel/iommu.c  |   5 +-
 drivers/iommu/virtio-iommu.c |  11 ++
 MAINTAINERS  |   8 +
 20 files changed, 581 insertions(+), 137 deletions(-)
 create mode 100644 include/linux/acpi_viot.h
 create mode 100644 drivers/acpi/arm64/dma.c
 create mode 100644 drivers/acpi/viot.c

-- 
2.32.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()

2021-06-18 Thread Jean-Philippe Brucker

On Wed, Jun 16, 2021 at 06:02:39PM +0100, Robin Murphy wrote:
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index c62e19bed302..175f8eaeb5b3 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
> > dma_base, u64 dma_limit)
> > if (domain->type == IOMMU_DOMAIN_DMA) {
> > if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
> > goto out_err;
> > -   dev->dma_ops = _dma_ops;
> > +   set_dma_ops(dev, _dma_ops);
> > +   } else {
> > +   set_dma_ops(dev, NULL);
> 
> I'm not keen on moving this here, since iommu-dma only knows that its own
> ops are right for devices it *is* managing; it can't assume any particular
> ops are appropriate for devices it isn't. The idea here is that
> arch_setup_dma_ops() may have already set the appropriate ops for the
> non-IOMMU case, so if the default domain type is passthrough then we leave
> those in place.
> 
> For example, I do still plan to revisit my conversion of arch/arm someday,
> at which point I'd have to undo this for that reason.

Makes sense, I'll remove this bit.

> Simplifying the base and size arguments is of course fine, but TBH I'd say
> rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a
> considerable faff passing them around for nothing but a tenuous sanity check
> in iommu_dma_init_domain(), and now that dev->dma_range_map is a common
> thing we should expect that to give us any relevant limitations if we even
> still care.

So I started working on this but it gets too bulky for a preparatory
patch. Dropping the parameters from arch_setup_dma_ops() seems especially
complicated because arm32 does need the size parameter for IOMMU mappings
and that value falls back to the bus DMA mask or U32_MAX in the absence of
dma-ranges. I could try to dig into this for a separate series.

Even only dropping the parameters from iommu_setup_dma_ops() isn't
completely trivial (8 files changed, 55 insertions(+), 36 deletions(-)
because we still need the lower IOVA limit from dma_range_map), so I'd
rather send it separately and have it sit in -next for a while.

Thanks,
Jean

> 
> That said, those are all things which can be fixed up later if the series is
> otherwise ready to go and there's still a chance of landing it for 5.14. If
> you do have any other reason to respin, then I think the x86 probe_finalize
> functions simply want an unconditional set_dma_ops(dev, NULL) before the
> iommu_setup_dma_ops() call.
> 
> Cheers,
> Robin.
> 
> > }
> > return;
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 85f18342603c..8d866940692a 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device 
> > *dev)
> >   static void intel_iommu_probe_finalize(struct device *dev)
> >   {
> > -   dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT;
> > -   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> > -   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > -
> > -   if (domain && domain->type == IOMMU_DOMAIN_DMA)
> > -   iommu_setup_dma_ops(dev, base,
> > -   __DOMAIN_MAX_ADDR(dmar_domain->gaw));
> > -   else
> > -   set_dma_ops(dev, NULL);
> > +   iommu_setup_dma_ops(dev, 0, U64_MAX);
> >   }
> >   static void intel_iommu_get_resv_regions(struct device *device,
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 4/6] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

2021-06-18 Thread Jean-Philippe Brucker

On Wed, Jun 16, 2021 at 05:28:59PM +0200, Eric Auger wrote:
> Hi Jean,
> 
> On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote:
> > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> > index 4bf1dd3eb041..7bd1d2199141 100644
> > --- a/arch/arm64/mm/dma-mapping.c
> > +++ b/arch/arm64/mm/dma-mapping.c
> > @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
> > u64 size,
> >  
> > dev->dma_coherent = coherent;
> > if (iommu)
> > -   iommu_setup_dma_ops(dev, dma_base, size);
> > +   iommu_setup_dma_ops(dev, dma_base, size - dma_base - 1);
> I don't get  size - dma_base - 1?

Because it's wrong, should be dma_base + size - 1. Thanks for catching it!

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table

2021-06-18 Thread Jean-Philippe Brucker

On Thu, Jun 17, 2021 at 01:50:59PM +0200, Rafael J. Wysocki wrote:
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index be7da23fad76..b835ca702ff0 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #endif
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1339,6 +1340,7 @@ static int __init acpi_init(void)
> > pci_mmcfg_late_init();
> > acpi_iort_init();
> > acpi_scan_init();
> > +   acpi_viot_init();
> 
> Is there a specific reason why to call it right here?
> 
> In particular, does it need to be called after acpi_scan_init()?  And
> does it need to be called before the subsequent functions?  If so,
> then why?

It does need to be called after acpi_scan_init(), because it relies on
struct device and their fwnode to be initialized. In particular to find a
PCI device we call pci_get_domain_bus_and_slot(), which needs the PCI
topology made available by acpi_scan_init().

It does not need to be before the subsequent functions however, I can move
it at the end.

> > +void __init acpi_viot_init(void)
> > +{
> > +   int i;
> > +   acpi_status status;
> > +   struct acpi_table_header *hdr;
> > +   struct acpi_viot_header *node;
> > +
> > +   status = acpi_get_table(ACPI_SIG_VIOT, 0, );
> > +   if (ACPI_FAILURE(status)) {
> > +   if (status != AE_NOT_FOUND) {
> > +   const char *msg = acpi_format_exception(status);
> > +
> > +   pr_err("Failed to get table, %s\n", msg);
> > +   }
> > +   return;
> > +   }
> > +
> > +   viot = (void *)hdr;
> > +
> > +   node = ACPI_ADD_PTR(struct acpi_viot_header, viot, 
> > viot->node_offset);
> > +   for (i = 0; i < viot->node_count; i++) {
> > +   if (viot_parse_node(node))
> > +   return;
> > +
> > +   node = ACPI_ADD_PTR(struct acpi_viot_header, node,
> > +   node->length);
> > +   }
> 
> Do you still need the table after the above is complete?  If not,
> release the reference on it acquired above.

We don't need the table anymore, I'll drop the reference

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table

2021-06-18 Thread Jean-Philippe Brucker

On Wed, Jun 16, 2021 at 03:26:08PM +0200, Eric Auger wrote:
> > +   default:
> > +   pr_warn("Unsupported node %x\n", hdr->type);
> > +   ret = 0;
> > +   goto err_free;
> > +   }
> > +
> > +   /*
> > +* To be compatible with future versions of the table which may include
> > +* other node types, keep parsing.
> > +*/
> nit: doesn't this comment rather apply to the default clause in the
> switch.

Yes, the comment doesn't accurately explain the code below, I'll tweak it.

/*
 * A future version of the table may use the node for other purposes.
 * Keep parsing.
 */

> In case the PCI range node or the single MMIO endoint node does
> not refer to any translation element, isn't it simply an error case?

It is permissible in my opinion. If a future version of the spec appends
new fields to the MMIO endpoint describing some PV property (I can't think
of a useful example), then the table can contain the vIOMMU topology as
usual plus one MMIO node that's only here to describe that property, and
doesn't have a translation element. If we encounter that I think we should
keep parsing.

> > +   if (!ep->viommu) {
> > +   pr_warn("No IOMMU node found\n");
> > +   ret = 0;
> > +   goto err_free;
> > +   }

> Besides
> Reviewed-by: Eric Auger 

Thanks!
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT

2021-06-18 Thread Jean-Philippe Brucker

Hi Eric,

On Wed, Jun 16, 2021 at 11:35:13AM +0200, Eric Auger wrote:
> > -const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
> > -   const u32 *id_in)
> > +int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
> >  {
> > struct acpi_iort_node *node;
> > -   const struct iommu_ops *ops;
> > +   const struct iommu_ops *ops = NULL;

Oops, I need to remove this (and add -Werror to my tests.)


> > +static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
> > +  const u32 *id_in)
> > +{
> > +   int err;
> > +   const struct iommu_ops *ops;
> > +
> > +   /*
> > +* If we already translated the fwspec there is nothing left to do,
> > +* return the iommu_ops.
> > +*/
> > +   ops = acpi_iommu_fwspec_ops(dev);
> > +   if (ops)
> > +   return ops;
> > +
> > +   err = iort_iommu_configure_id(dev, id_in);
> > +
> > +   /*
> > +* If we have reason to believe the IOMMU driver missed the initial
> > +* add_device callback for dev, replay it to get things in order.
> > +*/
> > +   if (!err && dev->bus && !device_iommu_mapped(dev))
> > +   err = iommu_probe_device(dev);
> Previously we had:
>     if (!err) {
>         ops = iort_fwspec_iommu_ops(dev);
>         err = iort_add_device_replay(dev);
>     }
> 
> Please can you explain the transform? I see the
> 
> acpi_iommu_fwspec_ops call below but is it not straightforward to me.

I figured that iort_add_device_replay() is only used once and is
sufficiently simple to be inlined manually (saving 10 lines). Then I
replaced the ops assignment with returns, which saves another line and may
be slightly clearer?  I guess it's mostly a matter of taste, the behavior
should be exactly the same.

> Also the comment mentions replay. Unsure if it is still OK.

The "replay" part is, but "add_device" isn't accurate because it has since
been replaced by probe_device. I'll refresh the comment.

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/6] Add support for ACPI VIOT

2021-06-16 Thread Jean-Philippe Brucker

Hi Rafael,

On Thu, Jun 10, 2021 at 09:51:27AM +0200, Jean-Philippe Brucker wrote:
> Add a driver for the ACPI VIOT table, which provides topology
> information for para-virtual IOMMUs. Enable virtio-iommu on
> non-devicetree platforms, including x86.
> 
> Since v3 [1] I fixed a build bug for !CONFIG_IOMMU_API. Joerg offered to
> take this series through the IOMMU tree, which requires Acks for patches
> 1-3.

I was wondering if you could take a look at patches 1-3, otherwise we'll
miss the mark for 5.14 since I won't be able to resend next week. The
series adds support for virtio-iommu on QEMU and cloud hypervisor.

Thanks,
Jean

> 
> You can find a QEMU implementation at [2], with extra support for
> testing all VIOT nodes including MMIO-based endpoints and IOMMU.
> This series is at [3].
> 
> [1] 
> https://lore.kernel.org/linux-iommu/2021060215.1077006-1-jean-phili...@linaro.org/
> [2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi
> [3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi
> 
> 
> Jean-Philippe Brucker (6):
>   ACPI: arm64: Move DMA setup operations out of IORT
>   ACPI: Move IOMMU setup code out of IORT
>   ACPI: Add driver for the VIOT table
>   iommu/dma: Pass address limit rather than size to
> iommu_setup_dma_ops()
>   iommu/dma: Simplify calls to iommu_setup_dma_ops()
>   iommu/virtio: Enable x86 support
> 
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   4 +-
>  drivers/acpi/Makefile|   2 +
>  drivers/acpi/arm64/Makefile  |   1 +
>  include/acpi/acpi_bus.h  |   3 +
>  include/linux/acpi.h |   3 +
>  include/linux/acpi_iort.h|  14 +-
>  include/linux/acpi_viot.h|  19 ++
>  include/linux/dma-iommu.h|   4 +-
>  arch/arm64/mm/dma-mapping.c  |   2 +-
>  drivers/acpi/arm64/dma.c |  50 +
>  drivers/acpi/arm64/iort.c| 129 ++---
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |  78 +++-
>  drivers/acpi/viot.c  | 364 +++
>  drivers/iommu/amd/iommu.c|   9 +-
>  drivers/iommu/dma-iommu.c|  17 +-
>  drivers/iommu/intel/iommu.c  |  10 +-
>  drivers/iommu/virtio-iommu.c |   8 +
>  MAINTAINERS  |   8 +
>  20 files changed, 580 insertions(+), 150 deletions(-)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/arm64/dma.c
>  create mode 100644 drivers/acpi/viot.c
> 
> -- 
> 2.31.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 1/3] gpio: Add virtio-gpio driver

2021-06-10 Thread Jean-Philippe Brucker

Hi,

Not being very familiar with GPIO, I just have a few general comments and
one on the config space layout

On Thu, Jun 10, 2021 at 12:16:46PM +, Viresh Kumar via Stratos-dev wrote:
> +static int virtio_gpio_req(struct virtio_gpio *vgpio, u16 type, u16 gpio,
> +u8 txdata, u8 *rxdata)
> +{
> + struct virtio_gpio_response *res = >cres;
> + struct virtio_gpio_request *req = >creq;
> + struct scatterlist *sgs[2], req_sg, res_sg;
> + struct device *dev = >vdev->dev;
> + unsigned long time_left;
> + unsigned int len;
> + int ret;
> +
> + req->type = cpu_to_le16(type);
> + req->gpio = cpu_to_le16(gpio);
> + req->data = txdata;
> +
> + sg_init_one(_sg, req, sizeof(*req));
> + sg_init_one(_sg, res, sizeof(*res));
> + sgs[0] = _sg;
> + sgs[1] = _sg;
> +
> + mutex_lock(>lock);
> + ret = virtqueue_add_sgs(vgpio->command_vq, sgs, 1, 1, res, GFP_KERNEL);
> + if (ret) {
> + dev_err(dev, "failed to add request to vq\n");
> + goto out;
> + }
> +
> + reinit_completion(>completion);
> + virtqueue_kick(vgpio->command_vq);
> +
> + time_left = wait_for_completion_timeout(>completion, HZ / 10);
> + if (!time_left) {
> + dev_err(dev, "virtio GPIO backend timeout\n");
> + return -ETIMEDOUT;

mutex is still held

> + }
> +
> + WARN_ON(res != virtqueue_get_buf(vgpio->command_vq, ));
> + if (unlikely(res->status != VIRTIO_GPIO_STATUS_OK)) {
> + dev_err(dev, "virtio GPIO request failed: %d\n", gpio);
> + return -EINVAL;

and here

> + }
> +
> + if (rxdata)
> + *rxdata = res->data;
> +
> +out:
> + mutex_unlock(>lock);
> +
> + return ret;
> +}
> +
> +static int virtio_gpio_request(struct gpio_chip *gc, unsigned int gpio)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> +
> + return virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_ACTIVATE, gpio, 0, NULL);
> +}
> +
> +static void virtio_gpio_free(struct gpio_chip *gc, unsigned int gpio)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> +
> + virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_DEACTIVATE, gpio, 0, NULL);
> +}
> +
> +static int virtio_gpio_get_direction(struct gpio_chip *gc, unsigned int gpio)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> + u8 direction;
> + int ret;
> +
> + ret = virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_GET_DIRECTION, gpio, 0,
> +   );
> + if (ret)
> + return ret;
> +
> + return direction;
> +}
> +
> +static int virtio_gpio_direction_input(struct gpio_chip *gc, unsigned int 
> gpio)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> +
> + return virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_DIRECTION_IN, gpio, 0,
> +NULL);
> +}
> +
> +static int virtio_gpio_direction_output(struct gpio_chip *gc, unsigned int 
> gpio,
> + int value)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> +
> + return virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_DIRECTION_OUT, gpio, (u8)

(that dangling cast looks a bit odd to me)

> +value, NULL);
> +}
> +
> +static int virtio_gpio_get(struct gpio_chip *gc, unsigned int gpio)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> + u8 value;
> + int ret;
> +
> + ret = virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_GET_VALUE, gpio, 0, 
> );
> + if (ret)
> + return ret;
> +
> + return value;
> +}
> +
> +static void virtio_gpio_set(struct gpio_chip *gc, unsigned int gpio, int 
> value)
> +{
> + struct virtio_gpio *vgpio = gpio_chip_to_vgpio(gc);
> +
> + virtio_gpio_req(vgpio, VIRTIO_GPIO_REQ_SET_VALUE, gpio, value, NULL);
> +}
> +
> +static void virtio_gpio_command(struct virtqueue *vq)
> +{
> + struct virtio_gpio *vgpio = vq->vdev->priv;
> +
> + complete(>completion);
> +}
> +
> +static int virtio_gpio_alloc_vqs(struct virtio_device *vdev)
> +{
> + struct virtio_gpio *vgpio = vdev->priv;
> + const char * const names[] = { "command" };
> + vq_callback_t *cbs[] = {
> + virtio_gpio_command,
> + };
> + struct virtqueue *vqs[1] = {NULL};
> + int ret;
> +
> + ret = virtio_find_vqs(vdev, 1, vqs, cbs, names, NULL);
> + if (ret) {
> + dev_err(>dev, "failed to allocate vqs: %d\n", ret);
> + return ret;
> + }
> +
> + vgpio->command_vq = vqs[0];
> +
> + /* Mark the device ready to perform operations from within probe() */
> + virtio_device_ready(vgpio->vdev);

May fit better in the parent function

> + return 0;
> +}
> +
> +static void virtio_gpio_free_vqs(struct virtio_device *vdev)
> +{
> + vdev->config->reset(vdev);
> + vdev->config->del_vqs(vdev);
> +}
> +
> +static const char **parse_gpio_names(struct virtio_device *vdev,
> +

Re: [Stratos-dev] [PATCH V3 1/3] gpio: Add virtio-gpio driver

2021-06-10 Thread Jean-Philippe Brucker

On Thu, Jun 10, 2021 at 04:00:39PM +, Enrico Weigelt, metux IT consult via 
Stratos-dev wrote:
> On 10.06.21 15:22, Arnd Bergmann wrote:
> 
> > Can you give an example of how this would be hooked up to other drivers
> > using those gpios. Can you give an example of how using the "gpio-keys" or
> > "gpio-leds" drivers in combination with virtio-gpio looks like in the DT?
> 
> Connecting between self-probing bus'es and DT is generally tricky. IMHO
> we don't have any generic mechanism for that.

DT does have a generic description of PCI endpoints, which virtio-iommu
relies on to express the relation between IOMMU and endpoint nodes [1].
I think the problem here is similar: the client node needs a phandle to
the GPIO controller which may use virtio-pci transport?

Note that it mostly works if the device is on the root PCI bus. Behind a
bridge the OS may change the device's bus number as needed, so the BDF
reference in DT is only valid if the software providing the DT description
(VMM or firmware) initializes bus numbers accordingly (and I don't
remember if Linux supports this case well).

Thanks,
Jean

[1] Documentation/devicetree/bindings/virtio/iommu.txt

> 
> I've made a few attempts, but nothing practically useful, which would be
> accepted by the corresponding maintainers, yet. We'd either need some
> very special logic in DT probing or pseudo-bus'es for the mapping.
> (DT wants to do those connections via phandle's, which in turn need the
> referenced nodes to be present in the DT).
> 
> >  From what I can tell, both the mmio and pci variants of virtio can have 
> > their
> > dev->of_node populated, but I don't see the logic in 
> > register_virtio_device()
> > that looks up the of_node of the virtio_device that the of_gpio code then
> > tries to refer to.
> 
> Have you ever successfully bound a virtio device via DT ?
> 
> 
> --mtx
> 
> -- 
> ---
> Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
> werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
> GPG/PGP-Schlüssel zu.
> ---
> Enrico Weigelt, metux IT consult
> Free software and Linux embedded engineering
> i...@metux.net -- +49-151-27565287
> -- 
> Stratos-dev mailing list
> stratos-...@op-lists.linaro.org
> https://op-lists.linaro.org/mailman/listinfo/stratos-dev
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 4/6] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

2021-06-10 Thread Jean-Philippe Brucker

Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/dma-iommu.h   |  4 ++--
 arch/arm64/mm/dma-mapping.c |  2 +-
 drivers/iommu/amd/iommu.c   |  2 +-
 drivers/iommu/dma-iommu.c   | 12 ++--
 drivers/iommu/intel/iommu.c |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..758ca4694257 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
+void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
 
 /* The DMA API isn't _quite_ the whole story, though... */
 /*
@@ -50,7 +50,7 @@ struct msi_msg;
 struct device;
 
 static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base,
-   u64 size)
+  u64 dma_limit)
 {
 }
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..7bd1d2199141 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
 
dev->dma_coherent = coherent;
if (iommu)
-   iommu_setup_dma_ops(dev, dma_base, size);
+   iommu_setup_dma_ops(dev, dma_base, size - dma_base - 1);
 
 #ifdef CONFIG_XEN
if (xen_swiotlb_detect())
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3ac42bbdefc6..94b96d81fcfd 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev)
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
+   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, U64_MAX);
else
set_dma_ops(dev, NULL);
 }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..c62e19bed302 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev)
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
  * @base: IOVA at which the mappable address space starts
- * @size: Size of IOVA space
+ * @limit: Last address of the IOVA space
  * @dev: Device the domain is being initialised for
  *
- * @base and @size should be exact multiples of IOMMU page granularity to
+ * @base and @limit + 1 should be exact multiples of IOMMU page granularity to
  * avoid rounding surprises. If necessary, we reserve the page at address 0
  * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
  * any change which could make prior IOVAs invalid will fail.
  */
 static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
-   u64 size, struct device *dev)
+dma_addr_t limit, struct device *dev)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
@@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
/* Check the domain allows at least some access to the device... */
if (domain->geometry.force_aperture) {
if (base > domain->geometry.aperture_end ||
-   base + size <= domain->geometry.aperture_start) {
+   limit < domain->geometry.aperture_start) {
pr_warn("specified DMA range outside IOMMU 
capability\n");
return -EFAULT;
}
@@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = {
  * The IOMMU core code allocates the default DMA domain, which the underlying
  * IOMMU driver needs to support via the dma-iommu layer.
  */
-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size)
+void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit)
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 
@@ -1320,7 +1320,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size)
 * underlying IOMMU driver needs to support via the d

[PATCH v4 3/6] ACPI: Add driver for the VIOT table

2021-06-10 Thread Jean-Philippe Brucker

The ACPI Virtual I/O Translation Table describes topology of
para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT.
For now it describes the relation between virtio-iommu and the endpoints
it manages.

Three steps are needed to configure DMA of endpoints:

(1) acpi_viot_init(): parse the VIOT table, find or create the fwnode
associated to each vIOMMU device.

(2) When probing the vIOMMU device, the driver registers its IOMMU ops
within the IOMMU subsystem. This step doesn't require any
intervention from the VIOT driver.

(3) viot_iommu_configure(): before binding the endpoint to a driver,
find the associated IOMMU ops. Register them, along with the
endpoint ID, into the device's iommu_fwspec.

If step (3) happens before step (2), it is deferred until the IOMMU is
initialized, then retried.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/Kconfig  |   3 +
 drivers/iommu/Kconfig |   1 +
 drivers/acpi/Makefile |   2 +
 include/linux/acpi_viot.h |  19 ++
 drivers/acpi/bus.c|   2 +
 drivers/acpi/scan.c   |   3 +
 drivers/acpi/viot.c   | 364 ++
 MAINTAINERS   |   8 +
 8 files changed, 402 insertions(+)
 create mode 100644 include/linux/acpi_viot.h
 create mode 100644 drivers/acpi/viot.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index eedec61e3476..3758c6940ed7 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -526,6 +526,9 @@ endif
 
 source "drivers/acpi/pmic/Kconfig"
 
+config ACPI_VIOT
+   bool
+
 endif  # ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..aff8a4830dd1 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -403,6 +403,7 @@ config VIRTIO_IOMMU
depends on ARM64
select IOMMU_API
select INTERVAL_TREE
+   select ACPI_VIOT if ACPI
help
  Para-virtualised IOMMU driver with virtio.
 
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 700b41adf2db..a6e644c48987 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -118,3 +118,5 @@ video-objs  += acpi_video.o video_detect.o
 obj-y  += dptf/
 
 obj-$(CONFIG_ARM64)+= arm64/
+
+obj-$(CONFIG_ACPI_VIOT)+= viot.o
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
new file mode 100644
index ..1eb8ee5b0e5f
--- /dev/null
+++ b/include/linux/acpi_viot.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ACPI_VIOT_H__
+#define __ACPI_VIOT_H__
+
+#include 
+
+#ifdef CONFIG_ACPI_VIOT
+void __init acpi_viot_init(void);
+int viot_iommu_configure(struct device *dev);
+#else
+static inline void acpi_viot_init(void) {}
+static inline int viot_iommu_configure(struct device *dev)
+{
+   return -ENODEV;
+}
+#endif
+
+#endif /* __ACPI_VIOT_H__ */
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index be7da23fad76..b835ca702ff0 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -27,6 +27,7 @@
 #include 
 #endif
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1339,6 +1340,7 @@ static int __init acpi_init(void)
pci_mmcfg_late_init();
acpi_iort_init();
acpi_scan_init();
+   acpi_viot_init();
acpi_ec_init();
acpi_debugfs_init();
acpi_sleep_proc_init();
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 0c53c8533300..4fa684fdfda8 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1556,6 +1557,8 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
return ops;
 
err = iort_iommu_configure_id(dev, id_in);
+   if (err && err != -EPROBE_DEFER)
+   err = viot_iommu_configure(dev);
 
/*
 * If we have reason to believe the IOMMU driver missed the initial
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
new file mode 100644
index ..892cd9fa7b6d
--- /dev/null
+++ b/drivers/acpi/viot.c
@@ -0,0 +1,364 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual I/O topology
+ *
+ * The Virtual I/O Translation Table (VIOT) describes the topology of
+ * para-virtual IOMMUs and the endpoints they manage. The OS uses it to
+ * initialize devices in the right order, preventing endpoints from issuing DMA
+ * before their IOMMU is ready.
+ *
+ * When binding a driver to a device, before calling the device driver's 
probe()
+ * method, the driver infrastructure calls dma_configure(). At that point the
+ * VIOT driver looks for an IOMMU associated to the device in the VIOT table.
+ * If an IOMMU exists and has been initialized, the VIOT driver initializes the
+ * device's IOMMU fwspec, allowing the DMA infrastructure to invoke the IOMMU
+ * ops when the device driver configures DMA mappings.

[PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()

2021-06-10 Thread Jean-Philippe Brucker

dma-iommu uses the address bounds described in domain->geometry during
IOVA allocation. The address size parameters of iommu_setup_dma_ops()
are useful for describing additional limits set by the platform
firmware, but aren't needed for drivers that call this function from
probe_finalize(). The base parameter can be zero because dma-iommu
already removes the first IOVA page, and the limit parameter can be
U64_MAX because it's only checked against the domain geometry. Simplify
calls to iommu_setup_dma_ops().

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/amd/iommu.c   |  9 +
 drivers/iommu/dma-iommu.c   |  4 +++-
 drivers/iommu/intel/iommu.c | 10 +-
 3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 94b96d81fcfd..d3123bc05c08 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1708,14 +1708,7 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 
 static void amd_iommu_probe_finalize(struct device *dev)
 {
-   struct iommu_domain *domain;
-
-   /* Domains are initialized for this device - have a look what we ended 
up with */
-   domain = iommu_get_domain_for_dev(dev);
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, U64_MAX);
-   else
-   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
 }
 
 static void amd_iommu_release_device(struct device *dev)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c62e19bed302..175f8eaeb5b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
if (domain->type == IOMMU_DOMAIN_DMA) {
if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
goto out_err;
-   dev->dma_ops = _dma_ops;
+   set_dma_ops(dev, _dma_ops);
+   } else {
+   set_dma_ops(dev, NULL);
}
 
return;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 85f18342603c..8d866940692a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device 
*dev)
 
 static void intel_iommu_probe_finalize(struct device *dev)
 {
-   dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT;
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-
-   if (domain && domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, base,
-   __DOMAIN_MAX_ADDR(dmar_domain->gaw));
-   else
-   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
 }
 
 static void intel_iommu_get_resv_regions(struct device *device,
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT

2021-06-10 Thread Jean-Philippe Brucker

Extract the code that sets up the IOMMU infrastructure from IORT, since
it can be reused by VIOT. Move it one level up into a new
acpi_iommu_configure_id() function, which calls the IORT parsing
function which in turn calls the acpi_iommu_fwspec_init() helper.

Signed-off-by: Jean-Philippe Brucker 
---
 include/acpi/acpi_bus.h   |  3 ++
 include/linux/acpi_iort.h |  8 ++---
 drivers/acpi/arm64/iort.c | 75 +--
 drivers/acpi/scan.c   | 73 -
 4 files changed, 87 insertions(+), 72 deletions(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 3a82faac5767..41f092a269f6 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -588,6 +588,9 @@ struct acpi_pci_root {
 
 bool acpi_dma_supported(struct acpi_device *adev);
 enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
+int acpi_iommu_fwspec_init(struct device *dev, u32 id,
+  struct fwnode_handle *fwnode,
+  const struct iommu_ops *ops);
 int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset,
   u64 *size);
 int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f7f054833afd..f1f0842a2cb2 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in);
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
@@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device 
*dev) { }
 /* IOMMU interface */
 static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
 { return -ENODEV; }
-static inline const struct iommu_ops *iort_iommu_configure_id(
- struct device *dev, const u32 *id_in)
-{ return NULL; }
+static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+{ return -ENODEV; }
 static inline
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index a940be1cf2af..b5b021e064b6 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -806,23 +806,6 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
return NULL;
 }
 
-static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev)
-{
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
-}
-
-static inline int iort_add_device_replay(struct device *dev)
-{
-   int err = 0;
-
-   if (dev->bus && !device_iommu_mapped(dev))
-   err = iommu_probe_device(dev);
-
-   return err;
-}
-
 /**
  * iort_iommu_msi_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type)
}
 }
 
-static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,
-  struct fwnode_handle *fwnode,
-  const struct iommu_ops *ops)
-{
-   int ret = iommu_fwspec_init(dev, fwnode, ops);
-
-   if (!ret)
-   ret = iommu_fwspec_add_ids(dev, , 1);
-
-   return ret;
-}
-
 static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
 {
struct acpi_iort_root_complex *pci_rc;
@@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct 
acpi_iort_node *node,
return iort_iommu_driver_enabled(node->type) ?
   -EPROBE_DEFER : -ENODEV;
 
-   return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);
+   return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
 }
 
 struct iort_pci_alias_info {
@@ -1020,24 +991,14 @@ static int iort_nc_iommu_map_id(struct device *dev,
  * @dev: device to configure
  * @id_in: optional input id const value pointer
  *
- * Returns: iommu_ops pointer on configuration success
- *  NULL on configuration failure
+ * Returns: 0 on success, <0 on failure
  */
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
struct acpi_iort_node *node;
-   const struct iommu_ops *ops;
+   const struct iommu_ops *ops =

[PATCH v4 6/6] iommu/virtio: Enable x86 support

2021-06-10 Thread Jean-Philippe Brucker

With the VIOT support in place, x86 platforms can now use the
virtio-iommu.

Because the other x86 IOMMU drivers aren't yet ready to use the
acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the
moment. Similarly to Vt-d and AMD IOMMU, call iommu_setup_dma_ops() from
probe_finalize().

Acked-by: Joerg Roedel 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig| 3 ++-
 drivers/iommu/dma-iommu.c| 1 +
 drivers/iommu/virtio-iommu.c | 8 
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index aff8a4830dd1..07b7c25cbed8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -400,8 +400,9 @@ config HYPERV_IOMMU
 config VIRTIO_IOMMU
tristate "Virtio IOMMU driver"
depends on VIRTIO
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA
select INTERVAL_TREE
select ACPI_VIOT if ACPI
help
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 175f8eaeb5b3..46ed43c400cf 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1332,6 +1332,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
 pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA 
ops\n",
 dev_name(dev));
 }
+EXPORT_SYMBOL_GPL(iommu_setup_dma_ops);
 
 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
phys_addr_t msi_addr, struct iommu_domain *domain)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 218fe8560e8d..77aee1207ced 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1026,6 +1026,13 @@ static struct iommu_device *viommu_probe_device(struct 
device *dev)
return ERR_PTR(ret);
 }
 
+static void viommu_probe_finalize(struct device *dev)
+{
+#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
+#endif
+}
+
 static void viommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -1062,6 +1069,7 @@ static struct iommu_ops viommu_ops = {
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
.probe_device   = viommu_probe_device,
+   .probe_finalize = viommu_probe_finalize,
.release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 1/6] ACPI: arm64: Move DMA setup operations out of IORT

2021-06-10 Thread Jean-Philippe Brucker

Extract generic DMA setup code out of IORT, so it can be reused by VIOT.
Keep it in drivers/acpi/arm64 for now, since it could break x86
platforms that haven't run this code so far, if they have invalid
tables.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/Makefile |  1 +
 include/linux/acpi.h|  3 +++
 include/linux/acpi_iort.h   |  6 ++---
 drivers/acpi/arm64/dma.c| 50 ++
 drivers/acpi/arm64/iort.c   | 54 ++---
 drivers/acpi/scan.c |  2 +-
 6 files changed, 66 insertions(+), 50 deletions(-)
 create mode 100644 drivers/acpi/arm64/dma.c

diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..66acbe77f46e 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_ACPI_IORT)+= iort.o
 obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
+obj-y  += dma.o
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c60745f657e9..7aaa9559cc19 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct 
acpi_srat_x2apic_cpu_affinity *pa);
 
 #ifdef CONFIG_ARM64
 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa);
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size);
 #else
 static inline void
 acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
+static inline void
+acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { }
 #endif
 
 int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 1a12baa58e40..f7f054833afd 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, 
u32 id,
 void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
-void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
+int iort_dma_get_ranges(struct device *dev, u64 *size);
 const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
const u32 *id_in);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
@@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain(
 { return NULL; }
 static inline void acpi_configure_pmsi_domain(struct device *dev) { }
 /* IOMMU interface */
-static inline void iort_dma_setup(struct device *dev, u64 *dma_addr,
- u64 *size) { }
+static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
+{ return -ENODEV; }
 static inline const struct iommu_ops *iort_iommu_configure_id(
  struct device *dev, const u32 *id_in)
 { return NULL; }
diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
new file mode 100644
index ..f16739ad3cc0
--- /dev/null
+++ b/drivers/acpi/arm64/dma.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
+{
+   int ret;
+   u64 end, mask;
+   u64 dmaaddr = 0, size = 0, offset = 0;
+
+   /*
+* If @dev is expected to be DMA-capable then the bus code that created
+* it should have initialised its dma_mask pointer by this point. For
+* now, we'll continue the legacy behaviour of coercing it to the
+* coherent mask if not, but we'll no longer do so quietly.
+*/
+   if (!dev->dma_mask) {
+   dev_warn(dev, "DMA mask not set\n");
+   dev->dma_mask = >coherent_dma_mask;
+   }
+
+   if (dev->coherent_dma_mask)
+   size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
+   else
+   size = 1ULL << 32;
+
+   ret = acpi_dma_get_range(dev, , , );
+   if (ret == -ENODEV)
+   ret = iort_dma_get_ranges(dev, );
+   if (!ret) {
+   /*
+* Limit coherent and dma mask based on size retrieved from
+* firmware.
+*/
+   end = dmaaddr + size - 1;
+   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   dev->bus_dma_limit = end;
+   dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask);
+   *dev->dma_mask = min(*dev->dma_mask, mask);
+   }
+
+   *dma_addr = dmaaddr;
+   *dma_size = size;
+
+   ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size);
+
+   dev_dbg(dev, "dma_offset(%#08llx)%s\n", offset, ret ? " failed!" : "");
+}
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acp

[PATCH v4 0/6] Add support for ACPI VIOT

2021-06-10 Thread Jean-Philippe Brucker

Add a driver for the ACPI VIOT table, which provides topology
information for para-virtual IOMMUs. Enable virtio-iommu on
non-devicetree platforms, including x86.

Since v3 [1] I fixed a build bug for !CONFIG_IOMMU_API. Joerg offered to
take this series through the IOMMU tree, which requires Acks for patches
1-3.

You can find a QEMU implementation at [2], with extra support for
testing all VIOT nodes including MMIO-based endpoints and IOMMU.
This series is at [3].

[1] 
https://lore.kernel.org/linux-iommu/2021060215.1077006-1-jean-phili...@linaro.org/
[2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi
[3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi


Jean-Philippe Brucker (6):
  ACPI: arm64: Move DMA setup operations out of IORT
  ACPI: Move IOMMU setup code out of IORT
  ACPI: Add driver for the VIOT table
  iommu/dma: Pass address limit rather than size to
iommu_setup_dma_ops()
  iommu/dma: Simplify calls to iommu_setup_dma_ops()
  iommu/virtio: Enable x86 support

 drivers/acpi/Kconfig |   3 +
 drivers/iommu/Kconfig|   4 +-
 drivers/acpi/Makefile|   2 +
 drivers/acpi/arm64/Makefile  |   1 +
 include/acpi/acpi_bus.h  |   3 +
 include/linux/acpi.h |   3 +
 include/linux/acpi_iort.h|  14 +-
 include/linux/acpi_viot.h|  19 ++
 include/linux/dma-iommu.h|   4 +-
 arch/arm64/mm/dma-mapping.c  |   2 +-
 drivers/acpi/arm64/dma.c |  50 +
 drivers/acpi/arm64/iort.c| 129 ++---
 drivers/acpi/bus.c   |   2 +
 drivers/acpi/scan.c  |  78 +++-
 drivers/acpi/viot.c  | 364 +++
 drivers/iommu/amd/iommu.c|   9 +-
 drivers/iommu/dma-iommu.c|  17 +-
 drivers/iommu/intel/iommu.c  |  10 +-
 drivers/iommu/virtio-iommu.c |   8 +
 MAINTAINERS  |   8 +
 20 files changed, 580 insertions(+), 150 deletions(-)
 create mode 100644 include/linux/acpi_viot.h
 create mode 100644 drivers/acpi/arm64/dma.c
 create mode 100644 drivers/acpi/viot.c

-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] dt-bindings: virtio: Convert virtio-mmio to DT schema

2021-06-08 Thread Jean-Philippe Brucker

On Mon, Jun 07, 2021 at 02:39:28PM -0500, Rob Herring wrote:
> Convert the virtio-mmio binding to DT schema format.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Jean-Philippe Brucker 
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Rob Herring 
> ---
> Jean-Philippe, hopefully you are okay with being listed as the 
> maintainer here. You're the only active person that's touched this 
> binding.

Sure, no problem. I can work on the conversion of virtio/iommu.txt as
well, so I'll learn a bit more about the yaml syntax.

Acked-by: Jean-Philippe Brucker 

> 
>  .../devicetree/bindings/virtio/mmio.txt   | 47 ---
>  .../devicetree/bindings/virtio/mmio.yaml  | 60 +++
>  2 files changed, 60 insertions(+), 47 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/virtio/mmio.txt
>  create mode 100644 Documentation/devicetree/bindings/virtio/mmio.yaml
> 
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
> b/Documentation/devicetree/bindings/virtio/mmio.txt
> deleted file mode 100644
> index 0a575f329f6e..
> --- a/Documentation/devicetree/bindings/virtio/mmio.txt
> +++ /dev/null
> @@ -1,47 +0,0 @@
> -* virtio memory mapped device
> -
> -See https://ozlabs.org/~rusty/virtio-spec/ for more details.
> -
> -Required properties:
> -
> -- compatible:"virtio,mmio" compatibility string
> -- reg:   control registers base address and size including 
> configuration space
> -- interrupts:interrupt generated by the device
> -
> -Required properties for virtio-iommu:
> -
> -- #iommu-cells:  When the node corresponds to a virtio-iommu device, it 
> is
> - linked to DMA masters using the "iommus" or "iommu-map"
> - properties [1][2]. #iommu-cells specifies the size of the
> - "iommus" property. For virtio-iommu #iommu-cells must be
> - 1, each cell describing a single endpoint ID.
> -
> -Optional properties:
> -
> -- iommus:If the device accesses memory through an IOMMU, it should
> - have an "iommus" property [1]. Since virtio-iommu itself
> - does not access memory through an IOMMU, the "virtio,mmio"
> - node cannot have both an "#iommu-cells" and an "iommus"
> - property.
> -
> -Example:
> -
> - virtio_block@3000 {
> - compatible = "virtio,mmio";
> - reg = <0x3000 0x100>;
> - interrupts = <41>;
> -
> - /* Device has endpoint ID 23 */
> - iommus = < 23>
> - }
> -
> - viommu: iommu@3100 {
> - compatible = "virtio,mmio";
> - reg = <0x3100 0x100>;
> - interrupts = <42>;
> -
> - #iommu-cells = <1>
> - }
> -
> -[1] Documentation/devicetree/bindings/iommu/iommu.txt
> -[2] Documentation/devicetree/bindings/pci/pci-iommu.txt
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml 
> b/Documentation/devicetree/bindings/virtio/mmio.yaml
> new file mode 100644
> index 0000..444bfa24affc
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -0,0 +1,60 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/virtio/mmio.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: virtio memory mapped devices
> +
> +maintainers:
> +  - Jean-Philippe Brucker 
> +
> +description:
> +  See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio for
> +  more details.
> +
> +properties:
> +  compatible:
> +const: virtio-mmio
> +
> +  reg:
> +maxItems: 1
> +
> +  interrupts:
> +maxItems: 1
> +
> +  '#iommu-cells':
> +description: Required when the node corresponds to a virtio-iommu device.
> +const: 1
> +
> +  iommus:
> +description: Required for devices making accesses thru an IOMMU.
> +maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +virtio@3000 {
> +compatible = "virtio,mmio";
> +reg = <0x3000 0x100>;
> +interrupts = <41>;
> +
> +/* Device has endpoint ID 23 */
> +iommus = < 23>;
> +};
> +
> +viommu: iommu@3100 {
> +compatible = "virtio,mmio";
> +reg = <0x3100 0x100>;
> +interrupts = <42>;
> +
> +#iommu-cells = <1>;
> +};
> +
> +...
> -- 
> 2.27.0
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 6/6] iommu/virtio: Enable x86 support

2021-06-02 Thread Jean-Philippe Brucker

With the VIOT support in place, x86 platforms can now use the
virtio-iommu.

Because the other x86 IOMMU drivers aren't yet ready to use the
acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the
moment. Similarly to Vt-d and AMD IOMMU, call iommu_setup_dma_ops() from
probe_finalize().

Acked-by: Joerg Roedel 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig| 3 ++-
 drivers/iommu/dma-iommu.c| 1 +
 drivers/iommu/virtio-iommu.c | 8 
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index aff8a4830dd1..07b7c25cbed8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -400,8 +400,9 @@ config HYPERV_IOMMU
 config VIRTIO_IOMMU
tristate "Virtio IOMMU driver"
depends on VIRTIO
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA
select INTERVAL_TREE
select ACPI_VIOT if ACPI
help
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 175f8eaeb5b3..46ed43c400cf 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1332,6 +1332,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
 pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA 
ops\n",
 dev_name(dev));
 }
+EXPORT_SYMBOL_GPL(iommu_setup_dma_ops);
 
 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
phys_addr_t msi_addr, struct iommu_domain *domain)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 218fe8560e8d..77aee1207ced 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1026,6 +1026,13 @@ static struct iommu_device *viommu_probe_device(struct 
device *dev)
return ERR_PTR(ret);
 }
 
+static void viommu_probe_finalize(struct device *dev)
+{
+#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
+#endif
+}
+
 static void viommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -1062,6 +1069,7 @@ static struct iommu_ops viommu_ops = {
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
.probe_device   = viommu_probe_device,
+   .probe_finalize = viommu_probe_finalize,
.release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

1 2 3 4 >

1 - 100 of 385 matches

Mail list logo